23 Aug, 2007

9 commits

  • The NUMA layer only supports NUMA policies for the highest zone. When
    ZONE_MOVABLE is configured with kernelcore=, the the highest zone becomes
    ZONE_MOVABLE. The result is that policies are only applied to allocations
    like anonymous pages and page cache allocated from ZONE_MOVABLE when the
    zone is used.

    This patch applies policies to the two highest zones when the highest zone
    is ZONE_MOVABLE. As ZONE_MOVABLE consists of pages from the highest "real"
    zone, it's always functionally equivalent.

    The patch has been tested on a variety of machines both NUMA and non-NUMA
    covering x86, x86_64 and ppc64. No abnormal results were seen in
    kernbench, tbench, dbench or hackbench. It passes regression tests from
    the numactl package with and without kernelcore= once numactl tests are
    patched to wait for vmstat counters to update.

    akpm: this is the nasty hack to fix NUMA mempolicies in the presence of
    ZONE_MOVABLE and kernelcore= in 2.6.23. Christoph says "For .24 either merge
    the mobility or get the other solution that Mel is working on. That solution
    would only use a single zonelist per node and filter on the fly. That may
    help performance and also help to make memory policies work better."

    Signed-off-by: Mel Gorman
    Acked-by: Lee Schermerhorn
    Tested-by: Lee Schermerhorn
    Acked-by: Christoph Lameter
    Cc: Andi Kleen
    Cc: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Print a big fat warning and do what is necessary to continue if a node is
    marked as up (meaning either node is online (upstream) or node has memory
    (Andrew's tree)) but allocations from the node do not succeed.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLUB is using atomic_read() for variables declared atomic_long_t.
    Switch to atomic_long_read().

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • It seems a simple mistake was made when converting follow_hugetlb_page()
    over to the VM_FAULT flags bitmasks (in "mm: fault feedback #2", commit
    83c54070ee1a2d05c89793884bea1a03f2851ed4).

    By using the wrong bitmask, hugetlb_fault() failures are not being
    recognized. This results in an infinite loop whenever follow_hugetlb_page
    is involved in a failed fault.

    Signed-off-by: Adam Litke
    Acked-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adam Litke
     
  • Skip calling cache_free_alien() when the platform is not numa capable.
    This will avoid cache misses that happen while accessing slabp (which is
    per page memory reference) to get nodeid. Instead use a global variable to
    skip the call, which is mostly likely to be present in the cache.

    This gives a 0.8% performance boost with the database oltp workload on a
    quad-core SMP platform and by any means the number is not small :)

    Signed-off-by: Suresh Siddha
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Siddha, Suresh B
     
  • The new exec code inserts an accounted vma into an mm struct which is not
    current->mm. The existing memory check code has a hard coded assumption
    that this does not happen as does the security code.

    As the correct mm is known we pass the mm to the security method and the
    helper function. A new security test is added for the case where we need
    to pass the mm and the existing one is modified to pass current->mm to
    avoid the need to change large amounts of code.

    (Thanks to Tobias for fixing rejects and testing)

    Signed-off-by: Alan Cox
    Cc: WU Fengguang
    Cc: James Morris
    Cc: Tobias Diedrich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Lumpy reclaim works by selecting a lead page from the LRU list and then
    selecting pages for reclaim from the order-aligned area of pages. In the
    situation were all pages in that region are inactive and not referenced by any
    process over time, it works well.

    In the situation where there is even light load on the system, the pages may
    not free quickly. Out of a area of 1024 pages, maybe only 950 of them are
    freed when the allocation attempt occurs because lumpy reclaim returned early.
    This patch alters the behaviour of direct reclaim for large contiguous
    blocks.

    The first attempt to call shrink_page_list() is asynchronous but if it fails,
    the pages are submitted a second time and the calling process waits for the IO
    to complete. This may stall allocators waiting for contiguous memory but that
    should be expected behaviour for high-order users. It is preferable behaviour
    to potentially queueing unnecessary areas for IO. Note that kswapd will not
    stall in this fashion.

    [apw@shadowen.org: update to version 2]
    [apw@shadowen.org: update to version 3]
    Signed-off-by: Mel Gorman
    Signed-off-by: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • As pointed out by Mel when reclaim is applied at higher orders a significant
    amount of IO may be started. As this takes finite time to drain reclaim will
    consider more areas than ultimatly needed to satisfy the request. This leads
    to more reclaim than strictly required and reduced success rates.

    I was able to confirm Mel's test results on systems locally. These show that
    even under light load the success rates drop off far more than expected.
    Testing with a modified version of his patch (which follows) I was able to
    allocate almost all of ZONE_MOVABLE with a near idle system. I ran 5 test
    passes sequentially following system boot (the system has 29 hugepages in
    ZONE_MOVABLE):

    2.6.23-rc1 11 8 6 7 7
    sync_lumpy 28 28 29 29 26

    These show that although hugely better than the near 0% success normally
    expected we can only allocate about a 1/4 of the zone. Using synchronous
    reclaim for these allocations we get close to 100% as expected.

    I have also run our standard high order tests and these show no regressions in
    allocation success rates at rest, and some significant improvements under
    load.

    This patch:

    We are transitioning pages from active to inactive in clear_active_flags,
    those need counting as PGDEACTIVATE vm events.

    Signed-off-by: Andy Whitcroft
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • Booting SPARSEMEM on NUMA systems trips a BUG in page_alloc.c:

    Initializing HighMem for node 0 (00038000:00100000)
    Initializing HighMem for node 1 (00100000:001ffe00)
    ------------[ cut here ]------------
    kernel BUG at /home/apw/git/linux-2.6/mm/page_alloc.c:456!
    [...]

    This occurs because the section to node id mapping is not being
    setup correctly during init under SPARSEMEM_STATIC, leading to an
    attempt to free pages from all nodes into the zones on node 0.

    When the zone_table[] was removed in the following commit, a new
    section to node mapping table was introduced:

    commit 89689ae7f95995723fbcd5c116c47933a3bb8b13
    [PATCH] Get rid of zone_table[]

    That conversion inadvertantly only initialised the node mapping in
    SPARSEMEM_EXTREME. Ensure we initialise the node mapping in
    SPARSEMEM_STATIC.

    [akpm@linux-foundation.org: make the stubs static inline]
    Signed-off-by: Andy Whitcroft
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     

12 Aug, 2007

3 commits


10 Aug, 2007

2 commits

  • The dynamic dma kmalloc creation can run into trouble if a
    GFP_ATOMIC allocation is the first one performed for a certain size
    of dma kmalloc slab.

    - Move the adding of the slab to sysfs into a workqueue
    (sysfs does GFP_KERNEL allocations)
    - Do not call kmem_cache_destroy() (uses slub_lock)
    - Only acquire the slub_lock once and--if we cannot wait--do a trylock.

    This introduces a slight risk of the first kmalloc(x, GFP_DMA|GFP_ATOMIC)
    for a range of sizes failing due to another process holding the slub_lock.
    However, we only need to acquire the spinlock once in order to establish
    each power of two DMA kmalloc cache. The possible conflict is with the
    slub_lock taken during slab management actions (create / remove slab cache).

    It is rather typical that a driver will first fill its buffers using
    GFP_KERNEL allocations which will wait until the slub_lock can be acquired.
    Drivers will also create its slab caches first outside of an atomic
    context before starting to use atomic kmalloc from an interrupt context.

    If there are any failures then they will occur early after boot or when
    loading of multiple drivers concurrently. Drivers can already accomodate
    failures of GFP_ATOMIC for other reasons. Retries will then create the slab.

    Signed-off-by: Christoph Lameter

    Christoph Lameter
     
  • The MAX_PARTIAL checks were supposed to be an optimization. However, slab
    shrinking is a manually triggered process either through running slabinfo
    or by the kernel calling kmem_cache_shrink.

    If one really wants to shrink a slab then all operations should be done
    regardless of the size of the partial list. This also fixes an issue that
    could surface if the number of partial slabs was initially above MAX_PARTIAL
    in kmem_cache_shrink and later drops below MAX_PARTIAL through the
    elimination of empty slabs on the partial list (rare). In that case a few
    slabs may be left off the partial list (and only be put back when they
    are empty).

    Signed-off-by: Christoph Lameter

    Christoph Lameter
     

01 Aug, 2007

3 commits

  • Fix kernel-doc warning:
    Warning(linux-2.6.23-rc1-mm1//mm/filemap.c:864): No description found for parameter 'ra'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • In badness(), the automatic variable 'points' is unsigned long. Print it
    as such.

    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • out_of_memory() may be called when an allocation is failing and the direct
    reclaim is not making any progress. This does not take into account the
    requested order of the allocation. If the request if for an order larger
    than PAGE_ALLOC_COSTLY_ORDER, it is reasonable to fail the allocation
    because the kernel makes no guarantees about those allocations succeeding.

    This false OOM situation can occur if a user is trying to grow the hugepage
    pool in a script like;

    #!/bin/bash
    REQUIRED=$1
    echo 1 > /proc/sys/vm/hugepages_treat_as_movable
    echo $REQUIRED > /proc/sys/vm/nr_hugepages
    ACTUAL=`cat /proc/sys/vm/nr_hugepages`
    while [ $REQUIRED -ne $ACTUAL ]; do
    echo Huge page pool at $ACTUAL growing to $REQUIRED
    echo $REQUIRED > /proc/sys/vm/nr_hugepages
    ACTUAL=`cat /proc/sys/vm/nr_hugepages`
    sleep 1
    done

    This is a reasonable scenario when ZONE_MOVABLE is in use but triggers OOM
    easily on 2.6.23-rc1. This patch will fail an allocation for an order above
    PAGE_ALLOC_COSTLY_ORDER instead of killing processes and retrying.

    Signed-off-by: Mel Gorman
    Acked-by: Andy Whitcroft
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

31 Jul, 2007

2 commits


30 Jul, 2007

3 commits

  • Remove fs.h from mm.h. For this,
    1) Uninline vma_wants_writenotify(). It's pretty huge anyway.
    2) Add back fs.h or less bloated headers (err.h) to files that need it.

    As result, on x86_64 allyesconfig, fs.h dependencies cut down from 3929 files
    rebuilt down to 3444 (-12.3%).

    Cross-compile tested without regressions on my two usual configs and (sigh):

    alpha arm-mx1ads mips-bigsur powerpc-ebony
    alpha-allnoconfig arm-neponset mips-capcella powerpc-g5
    alpha-defconfig arm-netwinder mips-cobalt powerpc-holly
    alpha-up arm-netx mips-db1000 powerpc-iseries
    arm arm-ns9xxx mips-db1100 powerpc-linkstation
    arm-assabet arm-omap_h2_1610 mips-db1200 powerpc-lite5200
    arm-at91rm9200dk arm-onearm mips-db1500 powerpc-maple
    arm-at91rm9200ek arm-picotux200 mips-db1550 powerpc-mpc7448_hpc2
    arm-at91sam9260ek arm-pleb mips-ddb5477 powerpc-mpc8272_ads
    arm-at91sam9261ek arm-pnx4008 mips-decstation powerpc-mpc8313_rdb
    arm-at91sam9263ek arm-pxa255-idp mips-e55 powerpc-mpc832x_mds
    arm-at91sam9rlek arm-realview mips-emma2rh powerpc-mpc832x_rdb
    arm-ateb9200 arm-realview-smp mips-excite powerpc-mpc834x_itx
    arm-badge4 arm-rpc mips-fulong powerpc-mpc834x_itxgp
    arm-carmeva arm-s3c2410 mips-ip22 powerpc-mpc834x_mds
    arm-cerfcube arm-shannon mips-ip27 powerpc-mpc836x_mds
    arm-clps7500 arm-shark mips-ip32 powerpc-mpc8540_ads
    arm-collie arm-simpad mips-jazz powerpc-mpc8544_ds
    arm-corgi arm-spitz mips-jmr3927 powerpc-mpc8560_ads
    arm-csb337 arm-trizeps4 mips-malta powerpc-mpc8568mds
    arm-csb637 arm-versatile mips-mipssim powerpc-mpc85xx_cds
    arm-ebsa110 i386 mips-mpc30x powerpc-mpc8641_hpcn
    arm-edb7211 i386-allnoconfig mips-msp71xx powerpc-mpc866_ads
    arm-em_x270 i386-defconfig mips-ocelot powerpc-mpc885_ads
    arm-ep93xx i386-up mips-pb1100 powerpc-pasemi
    arm-footbridge ia64 mips-pb1500 powerpc-pmac32
    arm-fortunet ia64-allnoconfig mips-pb1550 powerpc-ppc64
    arm-h3600 ia64-bigsur mips-pnx8550-jbs powerpc-prpmc2800
    arm-h7201 ia64-defconfig mips-pnx8550-stb810 powerpc-ps3
    arm-h7202 ia64-gensparse mips-qemu powerpc-pseries
    arm-hackkit ia64-sim mips-rbhma4200 powerpc-up
    arm-integrator ia64-sn2 mips-rbhma4500 s390
    arm-iop13xx ia64-tiger mips-rm200 s390-allnoconfig
    arm-iop32x ia64-up mips-sb1250-swarm s390-defconfig
    arm-iop33x ia64-zx1 mips-sead s390-up
    arm-ixp2000 m68k mips-tb0219 sparc
    arm-ixp23xx m68k-amiga mips-tb0226 sparc-allnoconfig
    arm-ixp4xx m68k-apollo mips-tb0287 sparc-defconfig
    arm-jornada720 m68k-atari mips-workpad sparc-up
    arm-kafa m68k-bvme6000 mips-wrppmc sparc64
    arm-kb9202 m68k-hp300 mips-yosemite sparc64-allnoconfig
    arm-ks8695 m68k-mac parisc sparc64-defconfig
    arm-lart m68k-mvme147 parisc-allnoconfig sparc64-up
    arm-lpd270 m68k-mvme16x parisc-defconfig um-x86_64
    arm-lpd7a400 m68k-q40 parisc-up x86_64
    arm-lpd7a404 m68k-sun3 powerpc x86_64-allnoconfig
    arm-lubbock m68k-sun3x powerpc-cell x86_64-defconfig
    arm-lusl7200 mips powerpc-celleb x86_64-up
    arm-mainstone mips-atlas powerpc-chrp32

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Introduce CONFIG_SUSPEND representing the ability to enter system sleep
    states, such as the ACPI S3 state, and allow the user to choose SUSPEND
    and HIBERNATION independently of each other.

    Make HOTPLUG_CPU be selected automatically if SUSPEND or HIBERNATION has
    been chosen and the kernel is intended for SMP systems.

    Also, introduce CONFIG_PM_SLEEP which is automatically selected if
    CONFIG_SUSPEND or CONFIG_HIBERNATION is set and use it to select the
    code needed for both suspend and hibernation.

    The top-level power management headers and the ACPI code related to
    suspend and hibernation are modified to use the new definitions (the
    changes in drivers/acpi/sleep/main.c are, mostly, moving code to reduce
    the number of ifdefs).

    There are many other files in which CONFIG_PM can be replaced with
    CONFIG_PM_SLEEP or even with CONFIG_SUSPEND, but they can be updated in
    the future.

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION to avoid
    confusion (among other things, with CONFIG_SUSPEND introduced in the
    next patch).

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

27 Jul, 2007

3 commits

  • With the introduction of kernelcore=, a configurable zone is created on
    request. In some cases, this value will be small enough that some nodes
    contain only ZONE_MOVABLE. On some NUMA configurations when this occurs,
    arch-independent zone-sizing will get the size of the memory holes within
    the node incorrect. The value of present_pages goes negative and the boot
    fails.

    This patch fixes the bug in the calculation of the size of the hole. The
    test case is to boot test a NUMA machine with a low value of kernelcore=
    before and after the patch is applied. While this bug exists in early
    kernel it cannot be triggered in practice.

    This patch has been boot-tested on a variety machines with and without
    kernelcore= set.

    Signed-off-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • release_pages() in mm/swap.c changes page_count() to be 0 without removing
    PageLRU flag...

    This means isolate_lru_page() can see a page, PageLRU() &&
    page_count(page)==0.. This is BUG. (get_page() will be called against
    count=0 page.)

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • In usual, migrate_pages(page,,) is called with holding mm->sem by system call.
    (mm here is a mm_struct which maps the migration target page.)
    This semaphore helps avoiding some race conditions.

    But, if we want to migrate a page by some kernel codes, we have to avoid
    some races. This patch adds check code for following race condition.

    1. A page which page->mapping==NULL can be target of migration. Then, we have
    to check page->mapping before calling try_to_unmap().

    2. anon_vma can be freed while page is unmapped, but page->mapping remains as
    it was. We drop page->mapcount to be 0. Then we cannot trust page->mapping.
    So, use rcu_read_lock() to prevent anon_vma pointed by page->mapping from
    being freed during migration.

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

25 Jul, 2007

3 commits

  • * 'request-queue-t' of git://git.kernel.dk/linux-2.6-block:
    [BLOCK] Add request_queue_t and mark it deprecated
    [BLOCK] Get rid of request_queue_t typedef

    Linus Torvalds
     
  • Use the correct local variable when calling into the page allocator. Local
    `flags' can have __GFP_ZERO set, which causes us to pass __GFP_ZERO into the
    page allocator, possibly from illegal contexts. The page allocator will later
    do prep_zero_page()->kmap_atomic(..., KM_USER0) from irq contexts and will
    then go BUG.

    Cc: Mike Galbraith
    Acked-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • dequeue_huge_page() has a serious memory leak upon hugetlb page
    allocation. The for loop continues on allocating hugetlb pages out of
    all allowable zone, where this function is supposedly only dequeue one
    and only one pages.

    Fixed it by breaking out of the for loop once a hugetlb page is found.

    Signed-off-by: Ken Chen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ken Chen
     

24 Jul, 2007

1 commit

  • Some of the code has been gradually transitioned to using the proper
    struct request_queue, but there's lots left. So do a full sweet of
    the kernel and get rid of this typedef and replace its uses with
    the proper type.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

23 Jul, 2007

1 commit

  • Fix following warning:
    WARNING: vmlinux.o(.text+0x188ea): Section mismatch: reference to .init.text:__alloc_bootmem_core (between 'alloc_bootmem_high_node' and 'get_gate_vma')

    alloc_bootmem_high_node() is only used from __init scope so declare it __init.
    And in addition declare the weak variant __init too.

    Signed-off-by: Sam Ravnborg
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Sam Ravnborg
     

22 Jul, 2007

3 commits

  • The version of SLOB in -mm always scans its free list from the beginning,
    which results in small allocations and free segments clustering at the
    beginning of the list over time. This causes the average search to scan
    over a large stretch at the beginning on each allocation.

    By starting each page search where the last one left off, we evenly
    distribute the allocations and greatly shorten the average search.

    Without this patch, kernel compiles on a 1.5G machine take a large amount
    of system time for list scanning. With this patch, compiles are within a
    few seconds of performance of a SLAB kernel with no notable change in
    system time.

    Signed-off-by: Matt Mackall
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     
  • Now that arch/powerpc/platforms/cell/spufs/fault.c is always built in
    the kernel there is no need to export handle_mm_fault anymore.

    Signed-off-by: Christoph Hellwig
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Trying to survive an allmodconfig on a nommu platform results in many
    screen lengths of module unhappiness. Many of the mmap related things that
    binfmt_flat hooks in to are never exported despite being global, and there
    are also missing definitions for vmalloc_32_user() and vm_insert_page().

    I've implemented vmalloc_32_user() trying to stick as close to the
    mm/vmalloc.c implementation as possible, though we don't have any need for
    VM_USERMAP, so groveling for the VMA can be skipped. vm_insert_page() has
    been stubbed for now in order to keep the build happy.

    Signed-off-by: Paul Mundt
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Mundt
     

20 Jul, 2007

7 commits

  • zone_movable_pfn is presently marked as __initdata and referenced from
    adjust_zone_range_for_zone_movable(), which in turn is referenced by
    zone_spanned_pages_in_node(). Both of these are __meminit annotated. When
    memory hotplug is enabled, this will oops on a hot-add, due to
    zone_movable_pfn having been freed.

    __meminitdata annotation gives the desired behaviour.

    This will only impact platforms that enable both memory hotplug
    and ARCH_POPULATES_NODE_MAP.

    Signed-off-by: Paul Mundt
    Acked-by: Mel Gorman
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Mundt
     
  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     
  • The slab and slob allocators already did this right, but slub would call
    "get_object_page()" on the magic ZERO_SIZE_PTR, with all kinds of nasty
    end results.

    Noted by Ingo Molnar.

    Cc: Ingo Molnar
    Cc: Christoph Lameter
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • I suspect Christoph tested his code only in the NUMA configuration, for
    the combination of SLAB+non-NUMA the zero-sized kmalloc's would not work.

    Of course, this would only trigger in configurations where those zero-
    sized allocations happen (not very common), so that may explain why it
    wasn't more widely noticed.

    Seen by by Andi Kleen under qemu, and there seems to be a report by
    Michael Tsirkin on it too.

    Cc: Andi Kleen
    Cc: Roland Dreier
    Cc: Michael S. Tsirkin
    Cc: Pekka Enberg
    Cc: Christoph Lameter
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • lguest does some fairly lowlevel things to support a host, which
    normal modules don't need:

    math_state_restore:
    When the guest triggers a Device Not Available fault, we need
    to be able to restore the FPU

    __put_task_struct:
    We need to hold a reference to another task for inter-guest
    I/O, and put_task_struct() is an inline function which calls
    __put_task_struct.

    access_process_vm:
    We need to access another task for inter-guest I/O.

    map_vm_area & __get_vm_area:
    We need to map the switcher shim (ie. monitor) at 0xFFC01000.

    Signed-off-by: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • page-writeback accounting is presently performed in the page-flags macros.
    This is inconsistent and a bit ugly and makes it awkward to implement
    per-backing_dev under-writeback page accounting.

    So move this accounting down to the callsite(s).

    Acked-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Use appropriate accessor function to set compound page destructor
    function.

    Cc: William Irwin
    Signed-off-by: Akinobu Mita
    Acked-by: Adam Litke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita