Eric Lee / smarc-fsl-linux-kernel

08 Dec, 2006

40 commits

e18b890bb [PATCH] slab: remove kmem_cache_t ... Browse Code »

Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#

set -e

for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done

The script was run like this

sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:25 +0800
441e143e9 [PATCH] slab: remove SLAB_DMA ... Browse Code »

SLAB_DMA is an alias of GFP_DMA. This is the last one so we
remove the leftover comment too.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:24 +0800
e94b17660 [PATCH] slab: remove SLAB_KERNEL ... Browse Code »

SLAB_KERNEL is an alias of GFP_KERNEL.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:24 +0800
54e6ecb23 [PATCH] slab: remove SLAB_ATOMIC ... Browse Code »

SLAB_ATOMIC is an alias of GFP_ATOMIC

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:24 +0800
f7267c0c0 [PATCH] slab: remove SLAB_USER ... Browse Code »

SLAB_USER is an alias of GFP_USER

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:24 +0800
e6b4f8da3 [PATCH] slab: remove SLAB_NOFS ... Browse Code »

SLAB_NOFS is an alias of GFP_NOFS.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:23 +0800
55acbda09 [PATCH] slab: remove SLAB_NOIO ... Browse Code »

SLAB_NOIO is an alias of GFP_NOIO with a single instance of use.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:23 +0800
a06d72c1d [PATCH] slab: remove SLAB_LEVEL_MASK ... Browse Code »

SLAB_LEVEL_MASK is only used internally to the slab and is
and alias of GFP_LEVEL_MASK.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:23 +0800
6e0eaa4b0 [PATCH] slab: remove SLAB_NO_GROW ... Browse Code »

It is only used internally in the slab.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:23 +0800
2d4d862f7 [PATCH] kill install_file_pte's pte_val ... Browse Code »

David Binderman and his Intel C compiler rightly observe that
install_file_pte no longer has any use for its pte_val.

Signed-off-by: Hugh Dickins
Cc: d binderman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2006-12-08 00:39:23 +0800
ce421c799 [PATCH] mm: cleanup indentation on switch for CPU operations ... Browse Code »

These patches introduced new switch statements which are indented contrary
to the concensus in mm/*.c. Fix them up to match that concensus.

[PATCH] node local per-cpu-pages
[PATCH] ZVC: Scale thresholds depending on the size of the system
commit e7c8d5c9955a4d2e88e36b640563f5d6d5aba48a
commit df9ecaba3f152d1ea79f2a5e0b87505e03f47590

Signed-off-by: Andy Whitcroft
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Whitcroft
2006-12-08 00:39:23 +0800
5d1854e15 [PATCH] reject corrupt swapfiles earlier ... Browse Code »

The fsfuzzer found this; with a corrupt small swapfile that claims to have
many pages:

[root]# file swap.741.img
swap.741.img: Linux/i386 swap file (new style) 1 (4K pages) size 1040191487 pages
[root]# ls -l swap.741.img
-rw-r--r-- 1 root root 16777216 Nov 22 05:18 swap.741.img

sys_swapon() will try to vmalloc all those pages, and -then- check to see if
the file is actually that large:

if (!(p->swap_map = vmalloc(maxpages * sizeof(short)))) {

if (swapfilesize && maxpages > swapfilesize) {
printk(KERN_WARNING
"Swap area shorter than signature indicates\n");

It seems to me that it would make more sense to move this test up before
the vmalloc, with the other checks, to avoid the OOM-killer in this
situation...

Signed-off-by: Eric Sandeen
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Sandeen
2006-12-08 00:39:23 +0800
4af2bfc12 [PATCH] silence unused pgdat warning from alloc_bootmem_node and friends ... Browse Code »

x86 NUMA systems only define bootmem for node 0. alloc_bootmem_node() and
friends therefore ignore the passed pgdat and use NODE_DATA(0) in all
cases. This leads to the following warnings as we are not using the passed
parameter:

.../mm/page_alloc.c: In function 'zone_wait_table_init':
.../mm/page_alloc.c:2259: warning: unused variable 'pgdat'

One option would be to define all variables used with these macros
__attribute__ ((unused)), but this would leave us exposed should these
become genuinely unused.

The key here is that we _are_ using the value, we ignore it but that is a
deliberate action. This patch adds a nested local variable within the
alloc_bootmem_node helper to which the pgdat parameter is assigned making
it 'used'. The nested local is marked __attribute__ ((unused)) to silence
this same warning for it.

Signed-off-by: Andy Whitcroft
Cc: Christoph Lameter
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Whitcroft
2006-12-08 00:39:23 +0800
25ba77c14 [PATCH] numa node ids are int, page_to_nid and zone_to_nid should return int ... Browse Code »

NUMA node ids are passed as either int or unsigned int almost exclusivly
page_to_nid and zone_to_nid both return unsigned long. This is a throw
back to when page_to_nid was a #define and was thus exposing the real type
of the page flags field.

In addition to fixing up the definitions of page_to_nid and zone_to_nid I
audited the users of these functions identifying the following incorrect
uses:

1) mm/page_alloc.c show_node() -- printk dumping the node id,
2) include/asm-ia64/pgalloc.h pgtable_quicklist_free() -- comparison
against numa_node_id() which returns an int from cpu_to_node(), and
3) mm/mpolicy.c check_pte_range -- used as an index in node_isset which
uses bit_set which in generic code takes an int.

Signed-off-by: Andy Whitcroft
Cc: Christoph Lameter
Cc: "Luck, Tony"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Whitcroft
2006-12-08 00:39:23 +0800
bc4ba393c [PATCH] drain_node_page(): Drain pages in batch units ... Browse Code »

drain_node_pages() currently drains the complete pageset of all pages. If
there are a large number of pages in the queues then we may hold off
interrupts for too long.

Duplicate the method used in free_hot_cold_page. Only drain pcp->batch
pages at one time.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:23 +0800
ebe29738f [PATCH] Remove uses of kmem_cache_t from mm/* and include/linux/slab.h ... Browse Code »

Remove all uses of kmem_cache_t (the most were left in slab.h). The
typedef for kmem_cache_t is then only necessary for other kernel
subsystems. Add a comment to that effect.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:23 +0800
b86c089b8 [PATCH] Move names_cachep to linux/fs.h ... Browse Code »

The names_cachep is used for getname() and putname(). So lets put it into
fs.h near those two definitions.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:23 +0800
aa362a83e [PATCH] Move fs_cachep to linux/fs_struct.h ... Browse Code »

fs_cachep is only used in kernel/exit.c and in kernel/fork.c.

It is used to store fs_struct items so it should be placed in linux/fs_struct.h

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:23 +0800
8b7d91eb7 [PATCH] Move filep_cachep to include/file.h ... Browse Code »

filp_cachep is only used in fs/file_table.c and in fs/dcache.c where
it is defined.

Move it to related definitions in linux/file.h.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:23 +0800
5d6538fcf [PATCH] Move files_cachep to include/file.h ... Browse Code »

Proper place is in file.h since files_cachep uses are rated to file I/O.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:22 +0800
c43692e85 [PATCH] Move vm_area_cachep to include/mm.h ... Browse Code »

vm_area_cachep is used to store vm_area_structs. So move to mm.h.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:22 +0800
298ec1e2a [PATCH] Move sighand_cachep to include/signal.h ... Browse Code »

Move sighand_cachep definitioni to linux/signal.h

The sighand cache is only used in fs/exec.c and kernel/fork.c. It is defined
in kernel/fork.c but only used in fs/exec.c.

The sighand_cachep is related to signal processing. So add the definition to
signal.h.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:22 +0800
54cc211ce [PATCH] Remove bio_cachep from slab.h ... Browse Code »

Remove bio_cachep from slab.h - it no longer exists.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:22 +0800
e30500557 [PATCH] make mm/thrash.c:global_faults static ... Browse Code »

This patch makes the needlessly global "global_faults" static.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2006-12-08 00:39:22 +0800
7c309a64d [PATCH] enable booting a NUMA system where some nodes have no memory ... Browse Code »

When booting a NUMA system with nodes that have no memory (eg by limiting
memory), bootmem_alloc_core tried to find pages in an uninitialized
bootmem_map. This caused a null pointer access. This fix adds a check, so
that NULL is returned. That will enable the caller (bootmem_alloc_nopanic)
to alloc memory on other without a panic.

Signed-off-by: Christian Krafft
Cc: Christoph Lameter
Cc: Andy Whitcroft
Cc: Martin Bligh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christian Krafft
2006-12-08 00:39:22 +0800
a12058687 [PATCH] Allow NULL pointers in percpu_free ... Browse Code »

The patch (as824b) makes percpu_free() ignore NULL arguments, as one would
expect for a deallocation routine. (Note that free_percpu is #defined as
percpu_free in include/linux/percpu.h.) A few callers are updated to remove
now-unneeded tests for NULL. A few other callers already seem to assume
that passing a NULL pointer to percpu_free() is okay!

The patch also removes an unnecessary NULL check in percpu_depopulate().

Signed-off-by: Alan Stern
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alan Stern
2006-12-08 00:39:22 +0800
b30973f87 [PATCH] node-aware skb allocation ... Browse Code »

Node-aware allocation of skbs for the receive path.

Details:

- __alloc_skb gets a new node argument and cals the node-aware
slab functions with it.
- netdev_alloc_skb passed the node number it gets from dev_to_node
to it, everyone else passes -1 (any node)

Signed-off-by: Christoph Hellwig
Cc: Christoph Lameter
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2006-12-08 00:39:22 +0800
873481367 [PATCH] add numa node information to struct device ... Browse Code »

For node-aware skb allocations we need information about the node in struct
net_device or struct device. Davem suggested to put it into struct device
which this patch does.

In particular:

- struct device gets a new int numa_node member if CONFIG_NUMA is set
- there are two new helpers, dev_to_node and set_dev_node to
transparently deal with the non-numa case
- for pci devices the node-info is set to the value we get from
pcibus_to_node.

Note that for some architectures pcibus_to_node doesn't work yet at the time
we call it currently. This is harmless and will just mean skb allocations
aren't node-local on this architectures until the implementation of
pcibus_to_node on these architectures have been updated (There are patches for
x86 and x86_64 floating around)

[akpm@osdl.org: cleanup]
Signed-off-by: Christoph Hellwig
Cc: Christoph Lameter
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2006-12-08 00:39:22 +0800
8b98c1699 [PATCH] leak tracking for kmalloc_node ... Browse Code »

We have variants of kmalloc and kmem_cache_alloc that leave leak tracking to
the caller. This is used for subsystem-specific allocators like skb_alloc.

To make skb_alloc node-aware we need similar routines for the node-aware slab
allocator, which this patch adds.

Note that the code is rather ugly, but it mirrors the non-node-aware code 1:1:

[akpm@osdl.org: add module export]
Signed-off-by: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2006-12-08 00:39:22 +0800
881e4aabe [PATCH] Always print out the header line in /proc/swaps ... Browse Code »

It would be possible for /proc/swaps to not always print out the header:

swapon /dev/hdc2
swapon /dev/hde2
swapoff /dev/hdc2

At this point /proc/swaps would not have a header.

Signed-off-by: Suleiman Souhlal
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Suleiman Souhlal
2006-12-08 00:39:22 +0800
b43a57bb4 [PATCH] OOM can panic due to processes stuck in __alloc_pages() ... Browse Code »

OOM can panic due to the processes stuck in __alloc_pages() doing infinite
rebalance loop while no memory can be reclaimed. OOM killer tries to kill
some processes, but unfortunetaly, rebalance label was moved by someone
below the TIF_MEMDIE check, so buddy allocator doesn't see that process is
OOM-killed and it can simply fail the allocation :/

Observed in reality on RHEL4(2.6.9)+OpenVZ kernel when a user doing some
memory allocation tricks triggered OOM panic.

Signed-off-by: Denis Lunev
Signed-off-by: Kirill Korotaev
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill Korotaev
2006-12-08 00:39:22 +0800
a3eea484f [PATCH] mlock cleanup ... Browse Code »

mm is defined as vma->vm_mm, so use that.

Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik Bobbaers
2006-12-08 00:39:22 +0800
8fb4fc68c [PATCH] Allow user processes to raise their oom_adj value ... Browse Code »

Currently a user process cannot rise its own oom_adj value (i.e.
unprotecting itself from the OOM killer). As this value is stored in the
task structure it gets inherited and the unprivileged childs will be unable
to rise it.

The EPERM will be handled by the generic proc fs layer, as only processes
with the proper caps or the owner of the process will be able to write to
the file. So we allow only the processes with CAP_SYS_RESOURCE to lower
the value, otherwise it will get an EACCES which seems more appropriate
than EPERM.

Signed-off-by: Guillem Jover
Acked-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Guillem Jover
2006-12-08 00:39:21 +0800
3b17979bd [PATCH] Fix kunmap_atomic's use of kpte_clear_flush() ... Browse Code »

kunmap_atomic() will call kpte_clear_flush with vaddr/ptep arguments which
don't correspond if the vaddr is just a normal lowmem address (ie, not in
the KMAP area). This patch makes sure that the pte is only cleared if kmap
area was actually used for the mapping.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Rusty Russell
Cc: Zachary Amsden
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeremy Fitzhardinge
2006-12-08 00:39:21 +0800
ad76fb6b5 [PATCH] mm: k{,um}map_atomic() vs in_atomic() ... Browse Code »

Make kmap_atomic/kunmap_atomic denote a pagefault disabled scope. All non
trivial implementations already do this anyway.

Signed-off-by: Peter Zijlstra
Acked-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2006-12-08 00:39:21 +0800
a866374ae [PATCH] mm: pagefault_{disable,enable}() ... Browse Code »

Introduce pagefault_{disable,enable}() and use these where previously we did
manual preempt increments/decrements to make the pagefault handler do the
atomic thing.

Currently they still rely on the increased preempt count, but do not rely on
the disabled preemption, this might go away in the future.

(NOTE: the extra barrier() in pagefault_disable might fix some holes on
machines which have too many registers for their own good)

[heiko.carstens@de.ibm.com: s390 fix]
Signed-off-by: Peter Zijlstra
Acked-by: Nick Piggin
Cc: Martin Schwidefsky
Signed-off-by: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2006-12-08 00:39:21 +0800
6edaf68a8 [PATCH] mm: arch do_page_fault() vs in_atomic() ... Browse Code »

In light of the recent pagefault and filemap_copy_from_user work I've gone
through all the arch pagefault handlers to make sure the inc_preempt_count()
'feature' works as expected.

Several sections of code (including the new filemap_copy_from_user) rely on
the fact that faults do not take locks under increased preempt count.

arch/x86_64 - good
arch/powerpc - good
arch/cris - fixed
arch/i386 - good
arch/parisc - fixed
arch/sh - good
arch/sparc - good
arch/s390 - good
arch/m68k - fixed
arch/ppc - good
arch/alpha - fixed
arch/mips - good
arch/sparc64 - good
arch/ia64 - good
arch/arm - fixed
arch/um - good
arch/avr32 - good
arch/h8300 - NA
arch/m32r - good
arch/v850 - good
arch/frv - fixed
arch/m68knommu - NA
arch/arm26 - fixed
arch/sh64 - fixed
arch/xtensa - good

Signed-off-by: Peter Zijlstra
Acked-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2006-12-08 00:39:21 +0800
3395ee058 [PATCH] mm: add noaliencache boot option to disable numa alien caches ... Browse Code »

When using numa=fake on non-NUMA hardware there is no benefit to having the
alien caches, and they consume much memory.

Add a kernel boot option to disable them.

Christoph sayeth "This is good to have even on large NUMA. The problem is
that the alien caches grow by the square of the size of the system in terms of
nodes."

Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2006-12-08 00:39:21 +0800
8f5be20bf [PATCH] mm: slab: eliminate lock_cpu_hotplug from slab ... Browse Code »

Here's an attempt towards doing away with lock_cpu_hotplug in the slab
subsystem. This approach also fixes a bug which shows up when cpus are
being offlined/onlined and slab caches are being tuned simultaneously.

http://marc.theaimsgroup.com/?l=linux-kernel&m=116098888100481&w=2

The patch has been stress tested overnight on a 2 socket 4 core AMD box with
repeated cpu online and offline, while dbench and kernbench process are
running, and slab caches being tuned at the same time.
There were no lockdep warnings either. (This test on 2,6.18 as 2.6.19-rc
crashes at __drain_pages
http://marc.theaimsgroup.com/?l=linux-kernel&m=116172164217678&w=2 )

The approach here is to hold cache_chain_mutex from CPU_UP_PREPARE until
CPU_ONLINE (similar in approach as worqueue_mutex) . Slab code sensitive
to cpu_online_map (kmem_cache_create, kmem_cache_destroy, slabinfo_write,
__cache_shrink) is already serialized with cache_chain_mutex. (This patch
lengthens cache_chain_mutex hold time at kmem_cache_destroy to cover this).
This patch also takes the cache_chain_sem at kmem_cache_shrink to protect
sanity of cpu_online_map at __cache_shrink, as viewed by slab.
(kmem_cache_shrink->__cache_shrink->drain_cpu_caches). But, really,
kmem_cache_shrink is used at just one place in the acpi subsystem! Do we
really need to keep kmem_cache_shrink at all?

Another note. Looks like a cpu hotplug event can send CPU_UP_CANCELED to
a registered subsystem even if the subsystem did not receive CPU_UP_PREPARE.
This could be due to a subsystem registered for notification earlier than
the current subsystem crapping out with NOTIFY_BAD. Badness can occur with
in the CPU_UP_CANCELED code path at slab if this happens (The same would
apply for workqueue.c as well). To overcome this, we might have to use either
a) a per subsystem flag and avoid handling of CPU_UP_CANCELED, or
b) Use a special notifier events like LOCK_ACQUIRE/RELEASE as Gautham was
using in his experiments, or
c) Do not send CPU_UP_CANCELED to a subsystem which did not receive
CPU_UP_PREPARE.

I would prefer c).

Signed-off-by: Ravikiran Thirumalai
Signed-off-by: Shai Fultheim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ravikiran G Thirumalai
2006-12-08 00:39:21 +0800
a44b56d35 [PATCH] slab debug and ARCH_SLAB_MINALIGN don't get along ... Browse Code »

When CONFIG_SLAB_DEBUG is used in combination with ARCH_SLAB_MINALIGN, some
debug flags should be disabled which depend on BYTES_PER_WORD alignment.

The disabling of these debug flags is not properly handled when
BYTES_PER_WORD < ARCH_SLAB_MEMALIGN < cache_line_size()

This patch fixes that and also adds an alignment check to
cache_alloc_debugcheck_after() when ARCH_SLAB_MINALIGN is used.

Signed-off-by: Kevin Hilman
Cc: Pekka Enberg
Cc: Christoph Lameter
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kevin Hilman
2006-12-08 00:39:21 +0800