17 Oct, 2007

2 commits

  • Implement generic chunk-of-pages isolation method by using page grouping ops.

    This patch add MIGRATE_ISOLATE to MIGRATE_TYPES. By this
    - MIGRATE_TYPES increases.
    - bitmap for migratetype is enlarged.

    pages of MIGRATE_ISOLATE migratetype will not be allocated even if it is free.
    By this, you can isolated *freed* pages from users. How-to-free pages is not
    a purpose of this patch. You may use reclaim and migrate codes to free pages.

    If start_isolate_page_range(start,end) is called,
    - migratetype of the range turns to be MIGRATE_ISOLATE if
    its type is MIGRATE_MOVABLE. (*) this check can be updated if other
    memory reclaiming works make progress.
    - MIGRATE_ISOLATE is not on migratetype fallback list.
    - All free pages and will-be-freed pages are isolated.
    To check all pages in the range are isolated or not, use test_pages_isolated(),
    To cancel isolation, use undo_isolate_page_range().

    Changes V6 -> V7
    - removed unnecessary #ifdef

    There are HOLES_IN_ZONE handling codes...I'm glad if we can remove them..

    Signed-off-by: Yasunori Goto
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • SPARSEMEM is a pretty nice framework that unifies quite a bit of code over all
    the arches. It would be great if it could be the default so that we can get
    rid of various forms of DISCONTIG and other variations on memory maps. So far
    what has hindered this are the additional lookups that SPARSEMEM introduces
    for virt_to_page and page_address. This goes so far that the code to do this
    has to be kept in a separate function and cannot be used inline.

    This patch introduces a virtual memmap mode for SPARSEMEM, in which the memmap
    is mapped into a virtually contigious area, only the active sections are
    physically backed. This allows virt_to_page page_address and cohorts become
    simple shift/add operations. No page flag fields, no table lookups, nothing
    involving memory is required.

    The two key operations pfn_to_page and page_to_page become:

    #define __pfn_to_page(pfn) (vmemmap + (pfn))
    #define __page_to_pfn(page) ((page) - vmemmap)

    By having a virtual mapping for the memmap we allow simple access without
    wasting physical memory. As kernel memory is typically already mapped 1:1
    this introduces no additional overhead. The virtual mapping must be big
    enough to allow a struct page to be allocated and mapped for all valid
    physical pages. This vill make a virtual memmap difficult to use on 32 bit
    platforms that support 36 address bits.

    However, if there is enough virtual space available and the arch already maps
    its 1-1 kernel space using TLBs (f.e. true of IA64 and x86_64) then this
    technique makes SPARSEMEM lookups even more efficient than CONFIG_FLATMEM.
    FLATMEM needs to read the contents of the mem_map variable to get the start of
    the memmap and then add the offset to the required entry. vmemmap is a
    constant to which we can simply add the offset.

    This patch has the potential to allow us to make SPARSMEM the default (and
    even the only) option for most systems. It should be optimal on UP, SMP and
    NUMA on most platforms. Then we may even be able to remove the other memory
    models: FLATMEM, DISCONTIG etc.

    [apw@shadowen.org: config cleanups, resplit code etc]
    [kamezawa.hiroyu@jp.fujitsu.com: Fix sparsemem_vmemmap init]
    [apw@shadowen.org: vmemmap: remove excess debugging]
    [apw@shadowen.org: simplify initialisation code and reduce duplication]
    [apw@shadowen.org: pull out the vmemmap code into its own file]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andy Whitcroft
    Acked-by: Mel Gorman
    Cc: "Luck, Tony"
    Cc: Andi Kleen
    Cc: "David S. Miller"
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

18 Jul, 2007

1 commit

  • The bounce buffer logic is included on systems that do not need it. If a
    system does not have zones like ZONE_DMA and ZONE_HIGHMEM that can lead to
    the use of bounce buffers then there is no need to reserve memory pools etc
    etc. This is true f.e. for SGI Altix.

    Also nicifies the Makefile and gets rid of the tricky "and" there.

    Signed-off-by: Christoph Lameter
    Acked-by: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

08 May, 2007

2 commits

  • On x86_64 this cuts allocation overhead for page table pages down to a
    fraction (kernel compile / editing load. TSC based measurement of times spend
    in each function):

    no quicklist

    pte_alloc 1569048 4.3s(401ns/2.7us/179.7us)
    pmd_alloc 780988 2.1s(337ns/2.7us/86.1us)
    pud_alloc 780072 2.2s(424ns/2.8us/300.6us)
    pgd_alloc 260022 1s(920ns/4us/263.1us)

    quicklist:

    pte_alloc 452436 573.4ms(8ns/1.3us/121.1us)
    pmd_alloc 196204 174.5ms(7ns/889ns/46.1us)
    pud_alloc 195688 172.4ms(7ns/881ns/151.3us)
    pgd_alloc 65228 9.8ms(8ns/150ns/6.1us)

    pgd allocations are the most complex and there we see the most dramatic
    improvement (may be we can cut down the amount of pgds cached somewhat?). But
    even the pte allocations still see a doubling of performance.

    1. Proven code from the IA64 arch.

    The method used here has been fine tuned for years and
    is NUMA aware. It is based on the knowledge that accesses
    to page table pages are sparse in nature. Taking a page
    off the freelists instead of allocating a zeroed pages
    allows a reduction of number of cachelines touched
    in addition to getting rid of the slab overhead. So
    performance improves. This is particularly useful if pgds
    contain standard mappings. We can save on the teardown
    and setup of such a page if we have some on the quicklists.
    This includes avoiding lists operations that are otherwise
    necessary on alloc and free to track pgds.

    2. Light weight alternative to use slab to manage page size pages

    Slab overhead is significant and even page allocator use
    is pretty heavy weight. The use of a per cpu quicklist
    means that we touch only two cachelines for an allocation.
    There is no need to access the page_struct (unless arch code
    needs to fiddle around with it). So the fast past just
    means bringing in one cacheline at the beginning of the
    page. That same cacheline may then be used to store the
    page table entry. Or a second cacheline may be used
    if the page table entry is not in the first cacheline of
    the page. The current code will zero the page which means
    touching 32 cachelines (assuming 128 byte). We get down
    from 32 to 2 cachelines in the fast path.

    3. x86_64 gets lightweight page table page management.

    This will allow x86_64 arch code to faster repopulate pgds
    and other page table entries. The list operations for pgds
    are reduced in the same way as for i386 to the point where
    a pgd is allocated from the page allocator and when it is
    freed back to the page allocator. A pgd can pass through
    the quicklists without having to be reinitialized.

    64 Consolidation of code from multiple arches

    So far arches have their own implementation of quicklist
    management. This patch moves that feature into the core allowing
    an easier maintenance and consistent management of quicklists.

    Page table pages have the characteristics that they are typically zero or in a
    known state when they are freed. This is usually the exactly same state as
    needed after allocation. So it makes sense to build a list of freed page
    table pages and then consume the pages already in use first. Those pages have
    already been initialized correctly (thus no need to zero them) and are likely
    already cached in such a way that the MMU can use them most effectively. Page
    table pages are used in a sparse way so zeroing them on allocation is not too
    useful.

    Such an implementation already exits for ia64. Howver, that implementation
    did not support constructors and destructors as needed by i386 / x86_64. It
    also only supported a single quicklist. The implementation here has
    constructor and destructor support as well as the ability for an arch to
    specify how many quicklists are needed.

    Quicklists are defined by an arch defining CONFIG_QUICKLIST. If more than one
    quicklist is necessary then we can define NR_QUICK for additional lists. F.e.
    i386 needs two and thus has

    config NR_QUICK
    int
    default 2

    If an arch has requested quicklist support then pages can be allocated
    from the quicklist (or from the page allocator if the quicklist is
    empty) via:

    quicklist_alloc(, , )

    Page table pages can be freed using:

    quicklist_free(, , )

    Pages must have a definite state after allocation and before
    they are freed. If no constructor is specified then pages
    will be zeroed on allocation and must be zeroed before they are
    freed.

    If a constructor is used then the constructor will establish
    a definite page state. F.e. the i386 and x86_64 pgd constructors
    establish certain mappings.

    Constructors and destructors can also be used to track the pages.
    i386 and x86_64 use a list of pgds in order to be able to dynamically
    update standard mappings.

    Signed-off-by: Christoph Lameter
    Cc: "David S. Miller"
    Cc: Andi Kleen
    Cc: "Luck, Tony"
    Cc: William Lee Irwin III
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This is a new slab allocator which was motivated by the complexity of the
    existing code in mm/slab.c. It attempts to address a variety of concerns
    with the existing implementation.

    A. Management of object queues

    A particular concern was the complex management of the numerous object
    queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for
    each allocating CPU and use objects from a slab directly instead of
    queueing them up.

    B. Storage overhead of object queues

    SLAB Object queues exist per node, per CPU. The alien cache queue even
    has a queue array that contain a queue for each processor on each
    node. For very large systems the number of queues and the number of
    objects that may be caught in those queues grows exponentially. On our
    systems with 1k nodes / processors we have several gigabytes just tied up
    for storing references to objects for those queues This does not include
    the objects that could be on those queues. One fears that the whole
    memory of the machine could one day be consumed by those queues.

    C. SLAB meta data overhead

    SLAB has overhead at the beginning of each slab. This means that data
    cannot be naturally aligned at the beginning of a slab block. SLUB keeps
    all meta data in the corresponding page_struct. Objects can be naturally
    aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte
    boundaries and can fit tightly into a 4k page with no bytes left over.
    SLAB cannot do this.

    D. SLAB has a complex cache reaper

    SLUB does not need a cache reaper for UP systems. On SMP systems
    the per CPU slab may be pushed back into partial list but that
    operation is simple and does not require an iteration over a list
    of objects. SLAB expires per CPU, shared and alien object queues
    during cache reaping which may cause strange hold offs.

    E. SLAB has complex NUMA policy layer support

    SLUB pushes NUMA policy handling into the page allocator. This means that
    allocation is coarser (SLUB does interleave on a page level) but that
    situation was also present before 2.6.13. SLABs application of
    policies to individual slab objects allocated in SLAB is
    certainly a performance concern due to the frequent references to
    memory policies which may lead a sequence of objects to come from
    one node after another. SLUB will get a slab full of objects
    from one node and then will switch to the next.

    F. Reduction of the size of partial slab lists

    SLAB has per node partial lists. This means that over time a large
    number of partial slabs may accumulate on those lists. These can
    only be reused if allocator occur on specific nodes. SLUB has a global
    pool of partial slabs and will consume slabs from that pool to
    decrease fragmentation.

    G. Tunables

    SLAB has sophisticated tuning abilities for each slab cache. One can
    manipulate the queue sizes in detail. However, filling the queues still
    requires the uses of the spin lock to check out slabs. SLUB has a global
    parameter (min_slab_order) for tuning. Increasing the minimum slab
    order can decrease the locking overhead. The bigger the slab order the
    less motions of pages between per CPU and partial lists occur and the
    better SLUB will be scaling.

    G. Slab merging

    We often have slab caches with similar parameters. SLUB detects those
    on boot up and merges them into the corresponding general caches. This
    leads to more effective memory use. About 50% of all caches can
    be eliminated through slab merging. This will also decrease
    slab fragmentation because partial allocated slabs can be filled
    up again. Slab merging can be switched off by specifying
    slub_nomerge on boot up.

    Note that merging can expose heretofore unknown bugs in the kernel
    because corrupted objects may now be placed differently and corrupt
    differing neighboring objects. Enable sanity checks to find those.

    H. Diagnostics

    The current slab diagnostics are difficult to use and require a
    recompilation of the kernel. SLUB contains debugging code that
    is always available (but is kept out of the hot code paths).
    SLUB diagnostics can be enabled via the "slab_debug" option.
    Parameters can be specified to select a single or a group of
    slab caches for diagnostics. This means that the system is running
    with the usual performance and it is much more likely that
    race conditions can be reproduced.

    I. Resiliency

    If basic sanity checks are on then SLUB is capable of detecting
    common error conditions and recover as best as possible to allow the
    system to continue.

    J. Tracing

    Tracing can be enabled via the slab_debug=T, option
    during boot. SLUB will then protocol all actions on that slabcache
    and dump the object contents on free.

    K. On demand DMA cache creation.

    Generally DMA caches are not needed. If a kmalloc is used with
    __GFP_DMA then just create this single slabcache that is needed.
    For systems that have no ZONE_DMA requirement the support is
    completely eliminated.

    L. Performance increase

    Some benchmarks have shown speed improvements on kernbench in the
    range of 5-10%. The locking overhead of slub is based on the
    underlying base allocation size. If we can reliably allocate
    larger order pages then it is possible to increase slub
    performance much further. The anti-fragmentation patches may
    enable further performance increases.

    Tested on:
    i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator

    SLUB Boot options

    slub_nomerge Disable merging of slabs
    slub_min_order=x Require a minimum order for slab caches. This
    increases the managed chunk size and therefore
    reduces meta data and locking overhead.
    slub_min_objects=x Mininum objects per slab. Default is 8.
    slub_max_order=x Avoid generating slabs larger than order specified.
    slub_debug Enable all diagnostics for all caches
    slub_debug= Enable selective options for all caches
    slub_debug=, Enable selective options for a certain set of
    caches

    Available Debug options
    F Double Free checking, sanity and resiliency
    R Red zoning
    P Object / padding poisoning
    U Track last free / alloc
    T Trace all allocs / frees (only use for individual slabs).

    To use SLUB: Apply this patch and then select SLUB as the default slab
    allocator.

    [hugh@veritas.com: fix an oops-causing locking error]
    [akpm@linux-foundation.org: various stupid cleanups and small fixes]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

21 Oct, 2006

1 commit

  • Separate out the concept of "queue congestion" from "backing-dev congestion".
    Congestion is a backing-dev concept, not a queue concept.

    The blk_* congestion functions are retained, as wrappers around the core
    backing-dev congestion functions.

    This proper layering is needed so that NFS can cleanly use the congestion
    functions, and so that CONFIG_BLOCK=n actually links.

    Cc: "Thomas Maier"
    Cc: "Jens Axboe"
    Cc: Trond Myklebust
    Cc: David Howells
    Cc: Peter Osterlund
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

01 Oct, 2006

2 commits

  • Make it possible to disable the block layer. Not all embedded devices require
    it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
    the block layer to be present.

    This patch does the following:

    (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
    support.

    (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
    an item that uses the block layer. This includes:

    (*) Block I/O tracing.

    (*) Disk partition code.

    (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.

    (*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
    block layer to do scheduling. Some drivers that use SCSI facilities -
    such as USB storage - end up disabled indirectly from this.

    (*) Various block-based device drivers, such as IDE and the old CDROM
    drivers.

    (*) MTD blockdev handling and FTL.

    (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
    taking a leaf out of JFFS2's book.

    (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
    linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
    however, still used in places, and so is still available.

    (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
    parts of linux/fs.h.

    (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.

    (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.

    (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
    is not enabled.

    (*) fs/no-block.c is created to hold out-of-line stubs and things that are
    required when CONFIG_BLOCK is not set:

    (*) Default blockdev file operations (to give error ENODEV on opening).

    (*) Makes some /proc changes:

    (*) /proc/devices does not list any blockdevs.

    (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.

    (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.

    (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
    given command other than Q_SYNC or if a special device is specified.

    (*) In init/do_mounts.c, no reference is made to the blockdev routines if
    CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.

    (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
    error ENOSYS by way of cond_syscall if so).

    (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
    CONFIG_BLOCK is not set, since they can't then happen.

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     
  • Move the bounce buffer code from mm/highmem.c to mm/bounce.c so that it can be
    more easily disabled when the block layer is disabled.

    !!!NOTE!!! There may be a bug in this code: Should init_emergency_pool() be
    contingent on CONFIG_HIGHMEM?

    Signed-Off-By: David Howells
    Signed-off-by: Jens Axboe

    David Howells
     

30 Sep, 2006

1 commit


26 Sep, 2006

1 commit


01 Jul, 2006

1 commit

  • NOTE: ZVC are *not* the lightweight event counters. ZVCs are reliable whereas
    event counters do not need to be.

    Zone based VM statistics are necessary to be able to determine what the state
    of memory in one zone is. In a NUMA system this can be helpful for local
    reclaim and other memory optimizations that may be able to shift VM load in
    order to get more balanced memory use.

    It is also useful to know how the computing load affects the memory
    allocations on various zones. This patchset allows the retrieval of that data
    from userspace.

    The patchset introduces a framework for counters that is a cross between the
    existing page_stats --which are simply global counters split per cpu-- and the
    approach of deferred incremental updates implemented for nr_pagecache.

    Small per cpu 8 bit counters are added to struct zone. If the counter exceeds
    certain thresholds then the counters are accumulated in an array of
    atomic_long in the zone and in a global array that sums up all zone values.
    The small 8 bit counters are next to the per cpu page pointers and so they
    will be in high in the cpu cache when pages are allocated and freed.

    Access to VM counter information for a zone and for the whole machine is then
    possible by simply indexing an array (Thanks to Nick Piggin for pointing out
    that approach). The access to the total number of pages of various types does
    no longer require the summing up of all per cpu counters.

    Benefits of this patchset right now:

    - Ability for UP and SMP configuration to determine how memory
    is balanced between the DMA, NORMAL and HIGHMEM zones.

    - loops over all processors are avoided in writeback and
    reclaim paths. We can avoid caching the writeback information
    because the needed information is directly accessible.

    - Special handling for nr_pagecache removed.

    - zone_reclaim_interval vanishes since VM stats can now determine
    when it is worth to do local reclaim.

    - Fast inline per node page state determination.

    - Accurate counters in /sys/devices/system/node/node*/meminfo. Current
    counters are counting simply which processor allocated a page somewhere
    and guestimate based on that. So the counters were not useful to show
    the actual distribution of page use on a specific zone.

    - The swap_prefetch patch requires per node statistics in order to
    figure out when processors of a node can prefetch. This patch provides
    some of the needed numbers.

    - Detailed VM counters available in more /proc and /sys status files.

    References to earlier discussions:
    V1 http://marc.theaimsgroup.com/?l=linux-kernel&m=113511649910826&w=2
    V2 http://marc.theaimsgroup.com/?l=linux-kernel&m=114980851924230&w=2
    V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115014697910351&w=2
    V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767318740&w=2

    Performance tests with AIM7 did not show any regressions. Seems to be a tad
    faster even. Tested on ia64/NUMA. Builds fine on i386, SMP / UP. Includes
    fixes for s390/arm/uml arch code.

    This patch:

    Move counter code from page_alloc.c/page-flags.h to vmstat.c/h.

    Create vmstat.c/vmstat.h by separating the counter code and the proc
    functions.

    Move the vm_stat_text array before zoneinfo_show.

    [akpm@osdl.org: s390 build fix]
    [akpm@osdl.org: HOTPLUG_CPU build fix]
    Signed-off-by: Christoph Lameter
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

28 Mar, 2006

1 commit

  • Helper functions for for_each_online_pgdat/for_each_zone look too big to be
    inlined. Speed of these helper macro itself is not very important. (inner
    loops are tend to do more work than this)

    This patch make helper function to be out-of-lined.

    inline out-of-line
    .text 005c0680 005bf6a0

    005c0680 - 005bf6a0 = FE0 = 4Kbytes.

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

22 Mar, 2006

1 commit

  • Centralize the page migration functions in anticipation of additional
    tinkering. Creates a new file mm/migrate.c

    1. Extract buffer_migrate_page() from fs/buffer.c

    2. Extract central migration code from vmscan.c

    3. Extract some components from mempolicy.c

    4. Export pageout() and remove_from_swap() from vmscan.c

    5. Make it possible to configure NUMA systems without page migration
    and non-NUMA systems with page migration.

    I had to so some #ifdeffing in mempolicy.c that may need a cleanup.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

09 Jan, 2006

2 commits

  • configurable replacement for slab allocator

    This adds a CONFIG_SLAB option under CONFIG_EMBEDDED. When CONFIG_SLAB is
    disabled, the kernel falls back to using the 'SLOB' allocator.

    SLOB is a traditional K&R/UNIX allocator with a SLAB emulation layer,
    similar to the original Linux kmalloc allocator that SLAB replaced. It's
    signicantly smaller code and is more memory efficient. But like all
    similar allocators, it scales poorly and suffers from fragmentation more
    than SLAB, so it's only appropriate for small systems.

    It's been tested extensively in the Linux-tiny tree. I've also
    stress-tested it with make -j 8 compiles on a 3G SMP+PREEMPT box (not
    recommended).

    Here's a comparison for otherwise identical builds, showing SLOB saving
    nearly half a megabyte of RAM:

    $ size vmlinux*
    text data bss dec hex filename
    3336372 529360 190812 4056544 3de5e0 vmlinux-slab
    3323208 527948 190684 4041840 3dac70 vmlinux-slob

    $ size mm/{slab,slob}.o
    text data bss dec hex filename
    13221 752 48 14021 36c5 mm/slab.o
    1896 52 8 1956 7a4 mm/slob.o

    /proc/meminfo:
    SLAB SLOB delta
    MemTotal: 27964 kB 27980 kB +16 kB
    MemFree: 24596 kB 25092 kB +496 kB
    Buffers: 36 kB 36 kB 0 kB
    Cached: 1188 kB 1188 kB 0 kB
    SwapCached: 0 kB 0 kB 0 kB
    Active: 608 kB 600 kB -8 kB
    Inactive: 808 kB 812 kB +4 kB
    HighTotal: 0 kB 0 kB 0 kB
    HighFree: 0 kB 0 kB 0 kB
    LowTotal: 27964 kB 27980 kB +16 kB
    LowFree: 24596 kB 25092 kB +496 kB
    SwapTotal: 0 kB 0 kB 0 kB
    SwapFree: 0 kB 0 kB 0 kB
    Dirty: 4 kB 12 kB +8 kB
    Writeback: 0 kB 0 kB 0 kB
    Mapped: 560 kB 556 kB -4 kB
    Slab: 1756 kB 0 kB -1756 kB
    CommitLimit: 13980 kB 13988 kB +8 kB
    Committed_AS: 4208 kB 4208 kB 0 kB
    PageTables: 28 kB 28 kB 0 kB
    VmallocTotal: 1007312 kB 1007312 kB 0 kB
    VmallocUsed: 48 kB 48 kB 0 kB
    VmallocChunk: 1007264 kB 1007264 kB 0 kB

    (this work has been sponsored in part by CELF)

    From: Ingo Molnar

    Fix 32-bitness bugs in mm/slob.c.

    Signed-off-by: Matt Mackall
    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     
  • Add mm/util.c for functions common between SLAB and SLOB.

    Signed-off-by: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     

30 Oct, 2005

1 commit

  • This adds generic memory add/remove and supporting functions for memory
    hotplug into a new file as well as a memory hotplug kernel config option.

    Individual architecture patches will follow.

    For now, disable memory hotplug when swsusp is enabled. There's a lot of
    churn there right now. We'll fix it up properly once it calms down.

    Signed-off-by: Matt Tolentino
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     

24 Jun, 2005

2 commits

  • - generic_file* file operations do no longer have a xip/non-xip split
    - filemap_xip.c implements a new set of fops that require get_xip_page
    aop to work proper. all new fops are exported GPL-only (don't like to
    see whatever code use those except GPL modules)
    - __xip_unmap now uses page_check_address, which is no longer static
    in rmap.c, and defined in linux/rmap.h
    - mm/filemap.h is now much more clean, plainly having just Linus'
    inline funcs moved here from filemap.c
    - fix includes in filemap_xip to make it build cleanly on i386

    Signed-off-by: Carsten Otte
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Carsten Otte
     
  • Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of
    mem_map[] is needed by discontiguous memory machines (like in the old
    CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem
    replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
    become a complete replacement.

    A significant advantage over DISCONTIGMEM is that it's completely separated
    from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA
    and DISCONTIG are often confused.

    Another advantage is that sparse doesn't require each NUMA node's ranges to be
    contiguous. It can handle overlapping ranges between nodes with no problems,
    where DISCONTIGMEM currently throws away that memory.

    Sparsemem uses an array to provide different pfn_to_page() translations for
    each SECTION_SIZE area of physical memory. This is what allows the mem_map[]
    to be chopped up.

    In order to do quick pfn_to_page() operations, the section number of the page
    is encoded in page->flags. Part of the sparsemem infrastructure enables
    sharing of these bits more dynamically (at compile-time) between the
    page_zone() and sparsemem operations. However, on 32-bit architectures, the
    number of bits is quite limited, and may require growing the size of the
    page->flags type in certain conditions. Several things might force this to
    occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
    memory), an increase in the physical address space, or an increase in the
    number of used page->flags.

    One thing to note is that, once sparsemem is present, the NUMA node
    information no longer needs to be stored in the page->flags. It might provide
    speed increases on certain platforms and will be stored there if there is
    room. But, if out of room, an alternate (theoretically slower) mechanism is
    used.

    This patch introduces CONFIG_FLATMEM. It is used in almost all cases where
    there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
    often have to compile out the same areas of code.

    Signed-off-by: Andy Whitcroft
    Signed-off-by: Dave Hansen
    Signed-off-by: Martin Bligh
    Signed-off-by: Adrian Bunk
    Signed-off-by: Yasunori Goto
    Signed-off-by: Bob Picco
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds