Eric Lee / smarc-fsl-linux-kernel

23 Nov, 2005

10 commits

0bd0f9fb1 [PATCH] hugetlb: fix race in set_max_huge_pages for multiple updaters of nr_huge_pages ... Browse Code »

If there are multiple updaters to /proc/sys/vm/nr_hugepages simultaneously
it is possible for the nr_huge_pages variable to become incorrect. There
is no locking in the set_max_huge_pages function around
alloc_fresh_huge_page which is able to update nr_huge_pages. Two callers
to alloc_fresh_huge_page could race against each other as could a call to
alloc_fresh_huge_page and a call to update_and_free_page. This patch just
expands the area covered by the hugetlb_lock to cover the call into
alloc_fresh_huge_page. I'm not sure how we could say that a sysctl section
is performance critical where more specific locking would be needed.

My reproducer was to run a couple copies of the following script
simultaneously

while [ true ]; do
echo 1000 > /proc/sys/vm/nr_hugepages
echo 500 > /proc/sys/vm/nr_hugepages
echo 750 > /proc/sys/vm/nr_hugepages
echo 100 > /proc/sys/vm/nr_hugepages
echo 0 > /proc/sys/vm/nr_hugepages
done

and then watch /proc/meminfo and eventually you will see things like

HugePages_Total: 100
HugePages_Free: 109

After applying the patch all seemed well.

Signed-off-by: Eric Paris
Acked-by: William Irwin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Paris
2005-11-23 01:13:43 +0800
689bcebfd [PATCH] unpaged: PG_reserved bad_page ... Browse Code »

It used to be the case that PG_reserved pages were silently never freed, but
in 2.6.15-rc1 they may be freed with a "Bad page state" message. We should
work through such cases as they appear, fixing the code; but for now it's
safer to issue the message without freeing the page, leaving PG_reserved set.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-11-23 01:13:42 +0800
f57e88a8d [PATCH] unpaged: ZERO_PAGE in VM_UNPAGED ... Browse Code »

It's strange enough to be looking out for anonymous pages in VM_UNPAGED areas,
let's not insert the ZERO_PAGE there - though whether it would matter will
depend on what we decide about ZERO_PAGE refcounting.

But whereas do_anonymous_page may (exceptionally) be called on a VM_UNPAGED
area, do_no_page should never be: just BUG_ON.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-11-23 01:13:42 +0800
ee498ed73 [PATCH] unpaged: anon in VM_UNPAGED ... Browse Code »

copy_one_pte needs to copy the anonymous COWed pages in a VM_UNPAGED area,
zap_pte_range needs to free them, do_wp_page needs to COW them: just like
ordinary pages, not like the unpaged.

But recognizing them is a little subtle: because PageReserved is no longer a
condition for remap_pfn_range, we can now mmap all of /dev/mem (whether the
distro permits, and whether it's advisable on this or that architecture, is
another matter). So if we can see a PageAnon, it may not be ours to mess with
(or may be ours from elsewhere in the address space). I suspect there's an
entertaining insoluble self-referential problem here, but the page_is_anon
function does a good practical job, and MAP_PRIVATE PROT_WRITE VM_UNPAGED will
always be an odd choice.

In updating the comment on page_address_in_vma, noticed a potential NULL
dereference, in a path we don't actually take, but fixed it.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-11-23 01:13:42 +0800
920fc356f [PATCH] unpaged: COW on VM_UNPAGED ... Browse Code »

Remove the BUG_ON(vma->vm_flags & VM_UNPAGED) from do_wp_page, and let it do
Copy-On-Write without touching the VM_UNPAGED's page counts - but this is
incomplete, because the anonymous page it inserts will itself need to be
handled, here and in other functions - next patch.

We still don't copy the page if the pfn is invalid, because the
copy_user_highpage interface does not allow it. But that's not been a problem
in the past: can be added in later if the need arises.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-11-23 01:13:42 +0800
101d2be76 [PATCH] unpaged: VM_NONLINEAR VM_RESERVED ... Browse Code »

There's one peculiar use of VM_RESERVED which the previous patch left behind:
because VM_NONLINEAR's try_to_unmap_cluster uses vm_private_data as a swapout
cursor, but should never meet VM_RESERVED vmas, it was a way of extending
VM_NONLINEAR to VM_RESERVED vmas using vm_private_data for some other purpose.
But that's an empty set - they don't have the populate function required. So
just throw away those VM_RESERVED tests.

But one more interesting in rmap.c has to go too: try_to_unmap_one will want
to swap out an anonymous page from VM_RESERVED or VM_UNPAGED area.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-11-23 01:13:42 +0800
0b14c179a [PATCH] unpaged: VM_UNPAGED ... Browse Code »

Although we tend to associate VM_RESERVED with remap_pfn_range, quite a few
drivers set VM_RESERVED on areas which are then populated by nopage. The
PageReserved removal in 2.6.15-rc1 changed VM_RESERVED not to free pages in
zap_pte_range, without changing those drivers not to set it: so their pages
just leak away.

Let's not change miscellaneous drivers now: introduce VM_UNPAGED at the core,
to flag the special areas where the ptes may have no struct page, or if they
have then it's not to be touched. Replace most instances of VM_RESERVED in
core mm by VM_UNPAGED. Force it on in remap_pfn_range, and the sparc and
sparc64 io_remap_pfn_range.

Revert addition of VM_RESERVED to powerpc vdso, it's not needed there. Is it
needed anywhere? It still governs the mm->reserved_vm statistic, and special
vmas not to be merged, and areas not to be core dumped; but could probably be
eliminated later (the drivers are probably specifying it because in 2.4 it
kept swapout off the vma, but in 2.6 we work from the LRU, which these pages
don't get on).

Use the VM_SHM slot for VM_UNPAGED, and define VM_SHM to 0: it serves no
purpose whatsoever, and should be removed from drivers when we clean up.

Signed-off-by: Hugh Dickins
Acked-by: William Irwin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-11-23 01:13:42 +0800
664beed01 [PATCH] unpaged: unifdefed PageCompound ... Browse Code »

It looks like snd_xxx is not the only nopage to be using PageReserved as a way
of holding a high-order page together: which no longer works, but is masked by
our failure to free from VM_RESERVED areas. We cannot fix that bug without
first substituting another way to hold the high-order page together, while
farming out the 0-order pages from within it.

That's just what PageCompound is designed for, but it's been kept under
CONFIG_HUGETLB_PAGE. Remove the #ifdefs: which saves some space (out- of-line
put_page), doesn't slow down what most needs to be fast (already using
hugetlb), and unifies the way we handle high-order pages.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-11-23 01:13:42 +0800
83e9b7e92 [PATCH] unpaged: private write VM_RESERVED ... Browse Code »

The PageReserved removal in 2.6.15-rc1 issued a "deprecated" message when you
tried to mmap or mprotect MAP_PRIVATE PROT_WRITE a VM_RESERVED, and failed
with -EACCES: because do_wp_page lacks the refinement to COW pages in those
areas, nor do we expect to find anonymous pages in them; and it seemed just
bloat to add code for handling such a peculiar case. But immediately it
caused vbetool and ddcprobe (using lrmi) to fail.

So revert the "deprecated" messages, letting mmap and mprotect succeed. But
leave do_wp_page's BUG_ON(vma->vm_flags & VM_RESERVED) in place until we've
added the code to do it right: so this particular patch is only good if the
app doesn't really need to write to that private area.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-11-23 01:13:42 +0800
ed5297a94 [PATCH] unpaged: get_user_pages VM_RESERVED ... Browse Code »

The PageReserved removal in 2.6.15-rc1 prohibited get_user_pages on the areas
flagged VM_RESERVED in place of PageReserved. That is correct in theory - we
ought not to interfere with struct pages in such a reserved area; but in
practice it broke BTTV for one.

So revert to prohibiting only on VM_IO: if someone gets into trouble with
get_user_pages on VM_RESERVED, it'll just be a "don't do that".

You can argue that videobuf_mmap_mapper shouldn't set VM_RESERVED in the first
place, but now's not the time for breaking drivers without notice.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-11-23 01:13:41 +0800

19 Nov, 2005

2 commits

2161558fa Merge branch 'master' Browse Code »

Kyle McMartin
2005-11-19 05:39:20 +0800
9ab885154 [PARISC] Fix compile warning caused by conflicting types of expand_upwards() ... Browse Code »

Fix compile warning caused by conflicting types of expand_upwards. IA64
requires it to not be static inline, as it's used outside mm/mmap.c

Signed-off-by: Matthew Wilcox
Signed-off-by: Kyle McMartin

Matthew Wilcox
2005-11-19 05:16:42 +0800

18 Nov, 2005

2 commits

58bb01a9c [PATCH] re-export clear_page_dirty_for_io() ... Browse Code »

2.6.14 has this exported, and reiser4 (at least) uses it. Put things back
the way they were.

Signed-off-by: Vladimir V. Saveliev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hans Reiser
2005-11-18 23:49:45 +0800
6b1de9161 [PATCH] VM: fix zone list restart in page allocatate ... Browse Code »

We must reassign z before looping through the zones kicking kswapd,
since it will be NULL if we hit an OOM condition and jump back to the
beginning again. 'z' is initially assigned before the restart: label. So
move the restart label up a little.

Signed-off-by: Jens Axboe

Jens Axboe
2005-11-18 04:43:01 +0800

15 Nov, 2005

4 commits

4060994c3 Merge x86-64 update from Andi Browse Code »

Linus Torvalds
2005-11-15 11:56:02 +0800
07808b74e [PATCH] x86_64: Remove obsolete ARCH_HAS_ATOMIC_UNSIGNED and page_flags_t ... Browse Code »

Has been introduced for x86-64 at some point to save memory
in struct page, but has been obsolete for some time. Just
remove it.

Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds

Andi Kleen
2005-11-15 11:55:14 +0800
b0d416932 [PATCH] x86_64: When cpu_up fails clean up page allocator properly ... Browse Code »

Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds

Andi Kleen
2005-11-15 11:55:13 +0800
a2f1b4249 [PATCH] x86_64: Add 4GB DMA32 zone ... Browse Code »

Add a new 4GB GFP_DMA32 zone between the GFP_DMA and GFP_NORMAL zones.

As a bit of historical background: when the x86-64 port
was originally designed we had some discussion if we should
use a 16MB DMA zone like i386 or a 4GB DMA zone like IA64 or
both. Both was ruled out at this point because it was in early
2.4 when VM is still quite shakey and had bad troubles even
dealing with one DMA zone. We settled on the 16MB DMA zone mainly
because we worried about older soundcards and the floppy.

But this has always caused problems since then because
device drivers had trouble getting enough DMA able memory. These days
the VM works much better and the wide use of NUMA has proven
it can deal with many zones successfully.

So this patch adds both zones.

This helps drivers who need a lot of memory below 4GB because
their hardware is not accessing more (graphic drivers - proprietary
and free ones, video frame buffer drivers, sound drivers etc.).
Previously they could only use IOMMU+16MB GFP_DMA, which
was not enough memory.

Another common problem is that hardware who has full memory
addressing for >4GB misses it for some control structures in memory
(like transmit rings or other metadata). They tended to allocate memory
in the 16MB GFP_DMA or the IOMMU/swiotlb then using pci_alloc_consistent,
but that can tie up a lot of precious 16MB GFPDMA/IOMMU/swiotlb memory
(even on AMD systems the IOMMU tends to be quite small) especially if you have
many devices. With the new zone pci_alloc_consistent can just put
this stuff into memory below 4GB which works better.

One argument was still if the zone should be 4GB or 2GB. The main
motivation for 2GB would be an unnamed not so unpopular hardware
raid controller (mostly found in older machines from a particular four letter
company) who has a strange 2GB restriction in firmware. But
that one works ok with swiotlb/IOMMU anyways, so it doesn't really
need GFP_DMA32. I chose 4GB to be compatible with IA64 and because
it seems to be the most common restriction.

The new zone is so far added only for x86-64.

For other architectures who don't set up this
new zone nothing changes. Architectures can set a compatibility
define in Kconfig CONFIG_DMA_IS_DMA32 that will define GFP_DMA32
as GFP_DMA. Otherwise it's a nop because on 32bit architectures
it's normally not needed because GFP_NORMAL (=0) is DMA able
enough.

One problem is still that GFP_DMA means different things on different
architectures. e.g. some drivers used to have #ifdef ia64 use GFP_DMA
(trusting it to be 4GB) #elif __x86_64__ (use other hacks like
the swiotlb because 16MB is not enough) ... . This was quite
ugly and is now obsolete.

These should be now converted to use GFP_DMA32 unconditionally. I haven't done
this yet. Or best only use pci_alloc_consistent/dma_alloc_coherent
which will use GFP_DMA32 transparently.

Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds

Andi Kleen
2005-11-15 11:55:13 +0800

14 Nov, 2005

6 commits

50c85a19e [PATCH] slab: remove alloc_pages() calls ... Browse Code »

The slab allocator never uses alloc_pages since kmem_getpages() is always
called with a valid nodeid. Remove the branch and the code from
kmem_getpages()

Signed-off-by: Christoph Lameter
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2005-11-14 10:14:12 +0800
065d41cb2 [PATCH] slab: convert cache to page mapping macros ... Browse Code »

This patch converts object cache page mapping macros to static inline
functions to make the more explicit and readable.

Signed-off-by: Pekka Enberg
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pekka Enberg
2005-11-14 10:14:12 +0800
669ed1752 [PATCH] mm: highmem watermarks ... Browse Code »

The pages_high - pages_low and pages_low - pages_min deltas are the asynch
reclaim watermarks. As such, the should be in the same ratios as any other
zone for highmem zones. It is the pages_min - 0 delta which is the
PF_MEMALLOC reserve, and this is the region that isn't very useful for
highmem.

This patch ensures highmem systems have similar characteristics as non highmem
ones with the same amount of memory, and also that highmem zones get similar
reclaim pressures to other zones.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-11-14 10:14:12 +0800
7fb1d9fca [PATCH] mm: __alloc_pages cleanup ... Browse Code »

Clean up of __alloc_pages.

Restoration of previous behaviour, plus further cleanups by introducing an
'alloc_flags', removing the last of should_reclaim_zone.

Signed-off-by: Rohit Seth
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rohit Seth
2005-11-14 10:14:12 +0800
51c6f666f [PATCH] mm: ZAP_BLOCK causes redundant work ... Browse Code »

The address based work estimate for unmapping (for lockbreak) is and always
was horribly inefficient for sparse mappings. The problem is most simply
explained with an example:

If we find a pgd is clear, we still have to call into unmap_page_range
PGDIR_SIZE / ZAP_BLOCK_SIZE times, each time checking the clear pgd, in
order to progress the working address to the next pgd.

The fundamental way to solve the problem is to keep track of the end
address we've processed and pass it back to the higher layers.

From: Nick Piggin

Modification to completely get away from address based work estimate
and instead use an abstract count, with a very small cost for empty
entries as opposed to present pages.

On 2.6.14-git2, ppc64, and CONFIG_PREEMPT=y, mapping and unmapping 1TB
of virtual address space takes 1.69s; with the following patch applied,
this operation can be done 1000 times in less than 0.01s

From: Andrew Morton

With CONFIG_HUTETLB_PAGE=n:

mm/memory.c: In function `unmap_vmas':
mm/memory.c:779: warning: division by zero

Due to

zap_work -= (end - start) /
(HPAGE_SIZE / PAGE_SIZE);

So make the dummy HPAGE_SIZE non-zero

Signed-off-by: Robin Holt
Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robin Holt
2005-11-14 10:14:12 +0800
885036d32 [PATCH] mm: __GFP_NOFAIL fix ... Browse Code »

In __alloc_pages():

if ((p->flags & (PF_MEMALLOC | PF_MEMDIE)) && !in_interrupt()) {
/* go through the zonelist yet again, ignoring mins */
for (i = 0; zones[i] != NULL; i++) {
struct zone *z = zones[i];

page = buffered_rmqueue(z, order, gfp_mask);
if (page) {
zone_statistics(zonelist, z);
goto got_pg;
}
}
goto nopage; <<<< HERE!!! FAIL...
}

kswapd (which has PF_MEMALLOC flag) can fail to allocate memory even when
it allocates it with __GFP_NOFAIL flag.

Signed-Off-By: Pavel Emelianov
Signed-Off-By: Denis Lunev
Signed-Off-By: Kirill Korotaev
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill Korotaev
2005-11-14 10:14:12 +0800

12 Nov, 2005

1 commit

63f45b809 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial Browse Code »

Linus Torvalds
2005-11-12 08:29:22 +0800

11 Nov, 2005

1 commit

6b482c677 [PATCH] Don't print per-cpu vm stats for offline cpus. ... Browse Code »

I just hit a page allocation error on a kernel configured to support
64 CPUs. It spewed 60 completely useless unnecessary lines of info.

Signed-off-by: Dave Jones
Signed-off-by: Linus Torvalds

Dave Jones
2005-11-11 05:25:53 +0800

08 Nov, 2005

1 commit

dc6f3f276 mm/slab.c: fix a comment typo Browse Code »

Adrian Bunk
2005-11-08 23:44:08 +0800

07 Nov, 2005

13 commits

493696737 [PATCH] mm/swap_state.c: unexport swapper_space ... Browse Code »

I didn't find any possible modular usage in the kernel.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2005-11-07 23:54:07 +0800
e2de22571 [PATCH] mm/swapfile.c: unexport total_swap_pages ... Browse Code »

I didn't find any possible modular usage in the kernel.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2005-11-07 23:54:07 +0800
1b09d1648 [PATCH] mm/swap.c: unexport vm_acct_memory ... Browse Code »

I didn't find any possible modular usage in the kernel.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2005-11-07 23:54:07 +0800
f8b8db77b [PATCH] unexport nr_swap_pages ... Browse Code »

I didn't find any possible modular usage in the kernel.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2005-11-07 23:54:07 +0800
e6a7e0e7c [PATCH] unexport clear_page_dirty_for_io ... Browse Code »

I didn't find any possible modular usage in the kernel.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2005-11-07 23:54:06 +0800
99697dc02 [PATCH] unexport hugetlb_total_pages ... Browse Code »

I didn't find any possible modular usage in the kernel.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2005-11-07 23:54:06 +0800
55be570c5 [PATCH] mm/{mmap,nommu}.c: several unexports ... Browse Code »

I didn't find any possible modular usage in the kernel.

This patch was already ACK'ed by Christoph Hellwig.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2005-11-07 23:54:06 +0800
d44e0780b [PATCH] kernel-doc: fix warnings in vmalloc.c ... Browse Code »

Fix new kernel-doc errors in vmalloc.c.

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2005-11-07 23:53:56 +0800
1e5d53314 [PATCH] more kernel-doc cleanups, additions ... Browse Code »

Various core kernel-doc cleanups:
- add missing function parameters in ipc, irq/manage, kernel/sys,
kernel/sysctl, and mm/slab;
- move description to just above function for kernel_restart()

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2005-11-07 23:53:55 +0800
7361f4d8c [PATCH] readahead commentary ... Browse Code »

Add a few comments surrounding the generic readahead API.

Also convert some ulongs into pgoff_t: the identifier for PAGE_CACHE_SIZE
offsets into pagecache.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2005-11-07 23:53:37 +0800
cd61ef626 [PATCH] slab: Use same schedule timeout for all cpus in cache_reap ... Browse Code »

Chen noticed that cache_reap uses REAPTIMEOUT_CPUC+smp_processor_id() as
the timeout for rescheduling.

The "+smp_processor_id()" part is wrong, the timeout should be identical
for all cpus: start_cpu_timer already adds a cpu dependant offset to avoid
any clustering.

The attached patch removes smp_processor_id().

Signed-Off-By: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2005-11-07 23:53:24 +0800
2109a2d1b [PATCH] mm: rename kmem_cache_s to kmem_cache ... Browse Code »

This patch renames struct kmem_cache_s to kmem_cache so we can start using
it instead of kmem_cache_t typedef.

Signed-off-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pekka J Enberg
2005-11-07 23:53:24 +0800
4f12bb4f7 [PATCH] slab: don't BUG on duplicated cache ... Browse Code »

slab presently goes BUG if someone tries to register an already-registered
cache.

But this can happen if the user accidentally loads a module which is already
statically linked into the kernel. Nuking the kernel is rather a harsh
reaction.

Change it into a warning, and just fail the kmem_cache_alloc() attempt. If
the module is well-behaved, the modprobe will fail and all is well.

Notes:

- Swaps the ranking of cache_chain_sem and lock_cpu_hotplug(). Doesn't seem
important.

Acked-by: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2005-11-07 23:53:24 +0800