09 Oct, 2007
3 commits
-
find_lock_page increases page's usage count, we should decrease it
before return VM_FAULT_SIGBUSSigned-off-by: Yan Zheng
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The test for VM_CAN_NONLINEAR always fails
Signed-off-by: Yan Zheng
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
All the current page_mkwrite() implementations also set the page dirty. Which
results in the set_page_dirty_balance() call to _not_ call balance, because the
page is already found dirty.This allows us to dirty a _lot_ of pages without ever hitting
balance_dirty_pages(). Not good (tm).Force a balance call if ->page_mkwrite() was successful.
Signed-off-by: Peter Zijlstra
Signed-off-by: Linus Torvalds
07 Oct, 2007
1 commit
-
When pinning and unpinning pagetables, we must protect them against
being used by other CPUs, lest they see the pagetable in an
intermediate read-only-but-not-pinned state.When using split pte locks, doing this properly would require taking
all the pte locks for the pagetable while pinning, but this may overflow
the PREEMPT_BITS part of the preempt counter if the process has mapped
more than about 512M of memory.However, failing to take the pte locks causes write-protect faults when
the pageout code is trying to clear the Access bit on a pte which is part
of a freshy created and still being pinned process after fork.This is a short-term fix until the problem is solved properly.
Signed-off-by: Jeremy Fitzhardinge
Acked-by: Rik van Riel
Acked-by: Hugh Dickins
Cc: David Rientjes
Cc: Andrew Morton
Cc: Andi Kleen
Cc: Keir Fraser
Cc: Jan Beulich
Signed-off-by: Linus Torvalds
05 Oct, 2007
1 commit
-
Gurudas Pai reports kernel BUG at arch/i386/mm/highmem.c:15! below
sys_remap_file_pages, while running Oracle database test on x86 in 6GB
RAM: kunmap thinks we're in_interrupt because the preempt count has
wrapped.That's because __do_fault expected to unmap page_table, but one of its
two callers do_nonlinear_fault already unmapped it: let do_linear_fault
unmap it first too, and then there's no need to pass the page_table arg
down.Why have we been so slow to notice this? Probably through forgetting
that the mapping_cap_account_dirty test means that sys_remap_file_pages
nowadays only goes the full nonlinear vma route on a few memory-backed
filesystems like ramfs, tmpfs and hugetlbfs.[ It also depends on CONFIG_HIGHPTE, so it becomes even harder to
trigger in practice. Many who have need of large memory have probably
migrated to x86-64..Problem introduced by commit d0217ac04ca6591841e5665f518e38064f4e65bd
("mm: fault feedback #1") -- Linus ]Signed-off-by: Hugh Dickins
Cc: gurudas pai
Cc: Nick Piggin
Cc: Andrew Morton
Signed-off-by: Linus Torvalds
01 Oct, 2007
1 commit
-
The virtual address space argument of clear_user_highpage is supposed to be
the virtual address where the page being cleared will eventually be mapped.
This allows architectures with virtually indexed caches a few clever
tricks. That sort of trick falls over in painful ways if the virtual
address argument is wrong.Signed-off-by: Ralf Baechle
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Sep, 2007
1 commit
-
This patch proposes fixes to the reference counting of memory policy in the
page allocation paths and in show_numa_map(). Extracted from my "Memory
Policy Cleanups and Enhancements" series as stand-alone.Shared policy lookup [shmem] has always added a reference to the policy,
but this was never unrefed after page allocation or after formatting the
numa map data.Default system policy should not require additional ref counting, nor
should the current task's task policy. However, show_numa_map() calls
get_vma_policy() to examine what may be [likely is] another task's policy.
The latter case needs protection against freeing of the policy.This patch adds a reference count to a mempolicy returned by
get_vma_policy() when the policy is a vma policy or another task's
mempolicy. Again, shared policy is already reference counted on lookup. A
matching "unref" [__mpol_free()] is performed in alloc_page_vma() for
shared and vma policies, and in show_numa_map() for shared and another
task's mempolicy. We can call __mpol_free() directly, saving an admittedly
inexpensive inline NULL test, because we know we have a non-NULL policy.Handling policy ref counts for hugepages is a bit trickier.
huge_zonelist() returns a zone list that might come from a shared or vma
'BIND policy. In this case, we should hold the reference until after the
huge page allocation in dequeue_hugepage(). The patch modifies
huge_zonelist() to return a pointer to the mempolicy if it needs to be
unref'd after allocation.Kernel Build [16cpu, 32GB, ia64] - average of 10 runs:
w/o patch w/ refcount patch
Avg Std Devn Avg Std Devn
Real: 100.59 0.38 100.63 0.43
User: 1209.60 0.37 1209.91 0.31
System: 81.52 0.42 81.64 0.34Signed-off-by: Lee Schermerhorn
Acked-by: Andi Kleen
Cc: Christoph Lameter
Acked-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Sep, 2007
1 commit
-
This was posted on Aug 28 and fixes an issue that could cause troubles
when slab caches >=128k are created.http://marc.info/?l=linux-mm&m=118798149918424&w=2
Currently we simply add the debug flags unconditional when checking for a
matching slab. This creates issues for sysfs processing when slabs exist
that are exempt from debugging due to their huge size or because only a
subset of slabs was selected for debugging.We need to only add the flags if kmem_cache_open() would also add them.
Create a function to calculate the flags that would be set
if the cache would be opened and use that function to determine
the flags before looking for a compatible slab.[akpm@linux-foundation.org: fixlets]
Signed-off-by: Christoph Lameter
Cc: Chuck Ebbert
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
31 Aug, 2007
4 commits
-
Page migration currently does not check if the target of the move contains
nodes that that are invalid (if root attempts to migrate pages)
and may try to allocate from invalid nodes if these are specified
leading to oopses.Return -EINVAL if an offline node is specified.
Signed-off-by: Christoph Lameter
Cc: Shaohua Li
Cc: Andrew Morton
Signed-off-by: Linus Torvalds -
Do not BUG() if we cannot register a slab with sysfs. Just print an error.
The only consequence of not registering is that the slab cache is not
visible via /sys/slab. A BUG() may not be visible that early during boot
and we have had multiple issues here already.Signed-off-by: Christoph Lameter
Acked-by: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In migration fallback path, write_page() or lock_page() will be called.
This causes sleep with holding rcu_read_lock().
For avoding that, just do rcu_lock if the page is Anon.(this is enough.)Signed-off-by: KAMEZAWA Hiroyuki
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Don't try to free memory which we didn't allocate.
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Aug, 2007
9 commits
-
The NUMA layer only supports NUMA policies for the highest zone. When
ZONE_MOVABLE is configured with kernelcore=, the the highest zone becomes
ZONE_MOVABLE. The result is that policies are only applied to allocations
like anonymous pages and page cache allocated from ZONE_MOVABLE when the
zone is used.This patch applies policies to the two highest zones when the highest zone
is ZONE_MOVABLE. As ZONE_MOVABLE consists of pages from the highest "real"
zone, it's always functionally equivalent.The patch has been tested on a variety of machines both NUMA and non-NUMA
covering x86, x86_64 and ppc64. No abnormal results were seen in
kernbench, tbench, dbench or hackbench. It passes regression tests from
the numactl package with and without kernelcore= once numactl tests are
patched to wait for vmstat counters to update.akpm: this is the nasty hack to fix NUMA mempolicies in the presence of
ZONE_MOVABLE and kernelcore= in 2.6.23. Christoph says "For .24 either merge
the mobility or get the other solution that Mel is working on. That solution
would only use a single zonelist per node and filter on the fly. That may
help performance and also help to make memory policies work better."Signed-off-by: Mel Gorman
Acked-by: Lee Schermerhorn
Tested-by: Lee Schermerhorn
Acked-by: Christoph Lameter
Cc: Andi Kleen
Cc: Paul Mundt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Print a big fat warning and do what is necessary to continue if a node is
marked as up (meaning either node is online (upstream) or node has memory
(Andrew's tree)) but allocations from the node do not succeed.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
SLUB is using atomic_read() for variables declared atomic_long_t.
Switch to atomic_long_read().Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It seems a simple mistake was made when converting follow_hugetlb_page()
over to the VM_FAULT flags bitmasks (in "mm: fault feedback #2", commit
83c54070ee1a2d05c89793884bea1a03f2851ed4).By using the wrong bitmask, hugetlb_fault() failures are not being
recognized. This results in an infinite loop whenever follow_hugetlb_page
is involved in a failed fault.Signed-off-by: Adam Litke
Acked-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Skip calling cache_free_alien() when the platform is not numa capable.
This will avoid cache misses that happen while accessing slabp (which is
per page memory reference) to get nodeid. Instead use a global variable to
skip the call, which is mostly likely to be present in the cache.This gives a 0.8% performance boost with the database oltp workload on a
quad-core SMP platform and by any means the number is not small :)Signed-off-by: Suresh Siddha
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The new exec code inserts an accounted vma into an mm struct which is not
current->mm. The existing memory check code has a hard coded assumption
that this does not happen as does the security code.As the correct mm is known we pass the mm to the security method and the
helper function. A new security test is added for the case where we need
to pass the mm and the existing one is modified to pass current->mm to
avoid the need to change large amounts of code.(Thanks to Tobias for fixing rejects and testing)
Signed-off-by: Alan Cox
Cc: WU Fengguang
Cc: James Morris
Cc: Tobias Diedrich
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Lumpy reclaim works by selecting a lead page from the LRU list and then
selecting pages for reclaim from the order-aligned area of pages. In the
situation were all pages in that region are inactive and not referenced by any
process over time, it works well.In the situation where there is even light load on the system, the pages may
not free quickly. Out of a area of 1024 pages, maybe only 950 of them are
freed when the allocation attempt occurs because lumpy reclaim returned early.
This patch alters the behaviour of direct reclaim for large contiguous
blocks.The first attempt to call shrink_page_list() is asynchronous but if it fails,
the pages are submitted a second time and the calling process waits for the IO
to complete. This may stall allocators waiting for contiguous memory but that
should be expected behaviour for high-order users. It is preferable behaviour
to potentially queueing unnecessary areas for IO. Note that kswapd will not
stall in this fashion.[apw@shadowen.org: update to version 2]
[apw@shadowen.org: update to version 3]
Signed-off-by: Mel Gorman
Signed-off-by: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
As pointed out by Mel when reclaim is applied at higher orders a significant
amount of IO may be started. As this takes finite time to drain reclaim will
consider more areas than ultimatly needed to satisfy the request. This leads
to more reclaim than strictly required and reduced success rates.I was able to confirm Mel's test results on systems locally. These show that
even under light load the success rates drop off far more than expected.
Testing with a modified version of his patch (which follows) I was able to
allocate almost all of ZONE_MOVABLE with a near idle system. I ran 5 test
passes sequentially following system boot (the system has 29 hugepages in
ZONE_MOVABLE):2.6.23-rc1 11 8 6 7 7
sync_lumpy 28 28 29 29 26These show that although hugely better than the near 0% success normally
expected we can only allocate about a 1/4 of the zone. Using synchronous
reclaim for these allocations we get close to 100% as expected.I have also run our standard high order tests and these show no regressions in
allocation success rates at rest, and some significant improvements under
load.This patch:
We are transitioning pages from active to inactive in clear_active_flags,
those need counting as PGDEACTIVATE vm events.Signed-off-by: Andy Whitcroft
Acked-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Booting SPARSEMEM on NUMA systems trips a BUG in page_alloc.c:
Initializing HighMem for node 0 (00038000:00100000)
Initializing HighMem for node 1 (00100000:001ffe00)
------------[ cut here ]------------
kernel BUG at /home/apw/git/linux-2.6/mm/page_alloc.c:456!
[...]This occurs because the section to node id mapping is not being
setup correctly during init under SPARSEMEM_STATIC, leading to an
attempt to free pages from all nodes into the zones on node 0.When the zone_table[] was removed in the following commit, a new
section to node mapping table was introduced:commit 89689ae7f95995723fbcd5c116c47933a3bb8b13
[PATCH] Get rid of zone_table[]That conversion inadvertantly only initialised the node mapping in
SPARSEMEM_EXTREME. Ensure we initialise the node mapping in
SPARSEMEM_STATIC.[akpm@linux-foundation.org: make the stubs static inline]
Signed-off-by: Andy Whitcroft
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Aug, 2007
3 commits
-
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
BLOCK: Hide the contents of linux/bio.h if CONFIG_BLOCK=n
sysace: HDIO_GETGEO has it's own method for ages
drivers/block/cpqarray.c: better error handling and kmalloc + memset conversion to k[cz]alloc
drivers/block/cciss.c: kmalloc + memset conversion to kzalloc
Clean up duplicate includes in drivers/block/
Fix remap handling by blktrace
[PATCH] remove mm/filemap.c:file_send_actor() -
Minor docbook error since argument name in comment doesn't match function
Signed-off-by: Stephen Hemminger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch removes the no longer used file_send_actor().
Signed-off-by: Adrian Bunk
Signed-off-by: Jens Axboe
10 Aug, 2007
2 commits
-
The dynamic dma kmalloc creation can run into trouble if a
GFP_ATOMIC allocation is the first one performed for a certain size
of dma kmalloc slab.- Move the adding of the slab to sysfs into a workqueue
(sysfs does GFP_KERNEL allocations)
- Do not call kmem_cache_destroy() (uses slub_lock)
- Only acquire the slub_lock once and--if we cannot wait--do a trylock.This introduces a slight risk of the first kmalloc(x, GFP_DMA|GFP_ATOMIC)
for a range of sizes failing due to another process holding the slub_lock.
However, we only need to acquire the spinlock once in order to establish
each power of two DMA kmalloc cache. The possible conflict is with the
slub_lock taken during slab management actions (create / remove slab cache).It is rather typical that a driver will first fill its buffers using
GFP_KERNEL allocations which will wait until the slub_lock can be acquired.
Drivers will also create its slab caches first outside of an atomic
context before starting to use atomic kmalloc from an interrupt context.If there are any failures then they will occur early after boot or when
loading of multiple drivers concurrently. Drivers can already accomodate
failures of GFP_ATOMIC for other reasons. Retries will then create the slab.Signed-off-by: Christoph Lameter
-
The MAX_PARTIAL checks were supposed to be an optimization. However, slab
shrinking is a manually triggered process either through running slabinfo
or by the kernel calling kmem_cache_shrink.If one really wants to shrink a slab then all operations should be done
regardless of the size of the partial list. This also fixes an issue that
could surface if the number of partial slabs was initially above MAX_PARTIAL
in kmem_cache_shrink and later drops below MAX_PARTIAL through the
elimination of empty slabs on the partial list (rare). In that case a few
slabs may be left off the partial list (and only be put back when they
are empty).Signed-off-by: Christoph Lameter
01 Aug, 2007
3 commits
-
Fix kernel-doc warning:
Warning(linux-2.6.23-rc1-mm1//mm/filemap.c:864): No description found for parameter 'ra'Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In badness(), the automatic variable 'points' is unsigned long. Print it
as such.Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
out_of_memory() may be called when an allocation is failing and the direct
reclaim is not making any progress. This does not take into account the
requested order of the allocation. If the request if for an order larger
than PAGE_ALLOC_COSTLY_ORDER, it is reasonable to fail the allocation
because the kernel makes no guarantees about those allocations succeeding.This false OOM situation can occur if a user is trying to grow the hugepage
pool in a script like;#!/bin/bash
REQUIRED=$1
echo 1 > /proc/sys/vm/hugepages_treat_as_movable
echo $REQUIRED > /proc/sys/vm/nr_hugepages
ACTUAL=`cat /proc/sys/vm/nr_hugepages`
while [ $REQUIRED -ne $ACTUAL ]; do
echo Huge page pool at $ACTUAL growing to $REQUIRED
echo $REQUIRED > /proc/sys/vm/nr_hugepages
ACTUAL=`cat /proc/sys/vm/nr_hugepages`
sleep 1
doneThis is a reasonable scenario when ZONE_MOVABLE is in use but triggers OOM
easily on 2.6.23-rc1. This patch will fail an allocation for an order above
PAGE_ALLOC_COSTLY_ORDER instead of killing processes and retrying.Signed-off-by: Mel Gorman
Acked-by: Andy Whitcroft
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
31 Jul, 2007
2 commits
-
We ClearSlabDebug() before the last SlabDebug() check. Clear it later.
Signed-off-by: Peter Zijlstra
Signed-off-by: Christoph Lameter -
Ingo noticed that the SLUB code does include the lock debugging free
check.Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Acked-by: Pekka Enberg
Signed-off-by: Christoph Lameter
30 Jul, 2007
3 commits
-
Remove fs.h from mm.h. For this,
1) Uninline vma_wants_writenotify(). It's pretty huge anyway.
2) Add back fs.h or less bloated headers (err.h) to files that need it.As result, on x86_64 allyesconfig, fs.h dependencies cut down from 3929 files
rebuilt down to 3444 (-12.3%).Cross-compile tested without regressions on my two usual configs and (sigh):
alpha arm-mx1ads mips-bigsur powerpc-ebony
alpha-allnoconfig arm-neponset mips-capcella powerpc-g5
alpha-defconfig arm-netwinder mips-cobalt powerpc-holly
alpha-up arm-netx mips-db1000 powerpc-iseries
arm arm-ns9xxx mips-db1100 powerpc-linkstation
arm-assabet arm-omap_h2_1610 mips-db1200 powerpc-lite5200
arm-at91rm9200dk arm-onearm mips-db1500 powerpc-maple
arm-at91rm9200ek arm-picotux200 mips-db1550 powerpc-mpc7448_hpc2
arm-at91sam9260ek arm-pleb mips-ddb5477 powerpc-mpc8272_ads
arm-at91sam9261ek arm-pnx4008 mips-decstation powerpc-mpc8313_rdb
arm-at91sam9263ek arm-pxa255-idp mips-e55 powerpc-mpc832x_mds
arm-at91sam9rlek arm-realview mips-emma2rh powerpc-mpc832x_rdb
arm-ateb9200 arm-realview-smp mips-excite powerpc-mpc834x_itx
arm-badge4 arm-rpc mips-fulong powerpc-mpc834x_itxgp
arm-carmeva arm-s3c2410 mips-ip22 powerpc-mpc834x_mds
arm-cerfcube arm-shannon mips-ip27 powerpc-mpc836x_mds
arm-clps7500 arm-shark mips-ip32 powerpc-mpc8540_ads
arm-collie arm-simpad mips-jazz powerpc-mpc8544_ds
arm-corgi arm-spitz mips-jmr3927 powerpc-mpc8560_ads
arm-csb337 arm-trizeps4 mips-malta powerpc-mpc8568mds
arm-csb637 arm-versatile mips-mipssim powerpc-mpc85xx_cds
arm-ebsa110 i386 mips-mpc30x powerpc-mpc8641_hpcn
arm-edb7211 i386-allnoconfig mips-msp71xx powerpc-mpc866_ads
arm-em_x270 i386-defconfig mips-ocelot powerpc-mpc885_ads
arm-ep93xx i386-up mips-pb1100 powerpc-pasemi
arm-footbridge ia64 mips-pb1500 powerpc-pmac32
arm-fortunet ia64-allnoconfig mips-pb1550 powerpc-ppc64
arm-h3600 ia64-bigsur mips-pnx8550-jbs powerpc-prpmc2800
arm-h7201 ia64-defconfig mips-pnx8550-stb810 powerpc-ps3
arm-h7202 ia64-gensparse mips-qemu powerpc-pseries
arm-hackkit ia64-sim mips-rbhma4200 powerpc-up
arm-integrator ia64-sn2 mips-rbhma4500 s390
arm-iop13xx ia64-tiger mips-rm200 s390-allnoconfig
arm-iop32x ia64-up mips-sb1250-swarm s390-defconfig
arm-iop33x ia64-zx1 mips-sead s390-up
arm-ixp2000 m68k mips-tb0219 sparc
arm-ixp23xx m68k-amiga mips-tb0226 sparc-allnoconfig
arm-ixp4xx m68k-apollo mips-tb0287 sparc-defconfig
arm-jornada720 m68k-atari mips-workpad sparc-up
arm-kafa m68k-bvme6000 mips-wrppmc sparc64
arm-kb9202 m68k-hp300 mips-yosemite sparc64-allnoconfig
arm-ks8695 m68k-mac parisc sparc64-defconfig
arm-lart m68k-mvme147 parisc-allnoconfig sparc64-up
arm-lpd270 m68k-mvme16x parisc-defconfig um-x86_64
arm-lpd7a400 m68k-q40 parisc-up x86_64
arm-lpd7a404 m68k-sun3 powerpc x86_64-allnoconfig
arm-lubbock m68k-sun3x powerpc-cell x86_64-defconfig
arm-lusl7200 mips powerpc-celleb x86_64-up
arm-mainstone mips-atlas powerpc-chrp32Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds -
Introduce CONFIG_SUSPEND representing the ability to enter system sleep
states, such as the ACPI S3 state, and allow the user to choose SUSPEND
and HIBERNATION independently of each other.Make HOTPLUG_CPU be selected automatically if SUSPEND or HIBERNATION has
been chosen and the kernel is intended for SMP systems.Also, introduce CONFIG_PM_SLEEP which is automatically selected if
CONFIG_SUSPEND or CONFIG_HIBERNATION is set and use it to select the
code needed for both suspend and hibernation.The top-level power management headers and the ACPI code related to
suspend and hibernation are modified to use the new definitions (the
changes in drivers/acpi/sleep/main.c are, mostly, moving code to reduce
the number of ifdefs).There are many other files in which CONFIG_PM can be replaced with
CONFIG_PM_SLEEP or even with CONFIG_SUSPEND, but they can be updated in
the future.Signed-off-by: Rafael J. Wysocki
Signed-off-by: Linus Torvalds -
Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION to avoid
confusion (among other things, with CONFIG_SUSPEND introduced in the
next patch).Signed-off-by: Rafael J. Wysocki
Signed-off-by: Linus Torvalds
27 Jul, 2007
3 commits
-
With the introduction of kernelcore=, a configurable zone is created on
request. In some cases, this value will be small enough that some nodes
contain only ZONE_MOVABLE. On some NUMA configurations when this occurs,
arch-independent zone-sizing will get the size of the memory holes within
the node incorrect. The value of present_pages goes negative and the boot
fails.This patch fixes the bug in the calculation of the size of the hole. The
test case is to boot test a NUMA machine with a low value of kernelcore=
before and after the patch is applied. While this bug exists in early
kernel it cannot be triggered in practice.This patch has been boot-tested on a variety machines with and without
kernelcore= set.Signed-off-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
release_pages() in mm/swap.c changes page_count() to be 0 without removing
PageLRU flag...This means isolate_lru_page() can see a page, PageLRU() &&
page_count(page)==0.. This is BUG. (get_page() will be called against
count=0 page.)Signed-off-by: KAMEZAWA Hiroyuki
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In usual, migrate_pages(page,,) is called with holding mm->sem by system call.
(mm here is a mm_struct which maps the migration target page.)
This semaphore helps avoiding some race conditions.But, if we want to migrate a page by some kernel codes, we have to avoid
some races. This patch adds check code for following race condition.1. A page which page->mapping==NULL can be target of migration. Then, we have
to check page->mapping before calling try_to_unmap().2. anon_vma can be freed while page is unmapped, but page->mapping remains as
it was. We drop page->mapcount to be 0. Then we cannot trust page->mapping.
So, use rcu_read_lock() to prevent anon_vma pointed by page->mapping from
being freed during migration.Signed-off-by: KAMEZAWA Hiroyuki
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Jul, 2007
3 commits
-
* 'request-queue-t' of git://git.kernel.dk/linux-2.6-block:
[BLOCK] Add request_queue_t and mark it deprecated
[BLOCK] Get rid of request_queue_t typedef -
Use the correct local variable when calling into the page allocator. Local
`flags' can have __GFP_ZERO set, which causes us to pass __GFP_ZERO into the
page allocator, possibly from illegal contexts. The page allocator will later
do prep_zero_page()->kmap_atomic(..., KM_USER0) from irq contexts and will
then go BUG.Cc: Mike Galbraith
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
dequeue_huge_page() has a serious memory leak upon hugetlb page
allocation. The for loop continues on allocating hugetlb pages out of
all allowable zone, where this function is supposedly only dequeue one
and only one pages.Fixed it by breaking out of the for loop once a hugetlb page is found.
Signed-off-by: Ken Chen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds