Eric Lee / smarc-fsl-linux-kernel

23 Mar, 2011

2 commits

a42931bf9 vmalloc: remove confusing comment on vwrite() ... Browse Code »

KM_USER1 is never used for vwrite() path so the caller doesn't need to
guarantee it is not used. Only the caller should guarantee is KM_USER0
and it is commented already.

Signed-off-by: Namhyung Kim
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
2011-03-23 08:44:09 +0800
89699605f mm: vmap area cache ... Browse Code »

Provide a free area cache for the vmalloc virtual address allocator, based
on the algorithm used by the user virtual memory allocator.

This reduces the number of rbtree operations and linear traversals over
the vmap extents in order to find a free area, by starting off at the last
point that a free area was found.

The free area cache is reset if areas are freed behind it, or if we are
searching for a smaller area or alignment than last time. So allocation
patterns are not changed (verified by corner-case and random test cases in
userspace testing).

This solves a regression caused by lazy vunmap TLB purging introduced in
db64fe02 (mm: rewrite vmap layer). That patch will leave extents in the
vmap allocator after they are vunmapped, and until a significant number
accumulate that can be flushed in a single batch. So in a workload that
vmalloc/vfree frequently, a chain of extents will build up from
VMALLOC_START address, which have to be iterated over each time (giving an
O(n) type of behaviour).

After this patch, the search will start from where it left off, giving
closer to an amortized O(1).

This is verified to solve regressions reported Steven in GFS2, and Avi in
KVM.

Hugh's update:

: I tried out the recent mmotm, and on one machine was fortunate to hit
: the BUG_ON(first->va_start < addr) which seems to have been stalling
: your vmap area cache patch ever since May.

: I can get you addresses etc, I did dump a few out; but once I stared
: at them, it was easier just to look at the code: and I cannot see how
: you would be so sure that first->va_start < addr, once you've done
: that addr = ALIGN(max(...), align) above, if align is over 0x1000
: (align was 0x8000 or 0x4000 in the cases I hit: ioremaps like Steve).

: I originally got around it by just changing the
: if (first->va_start < addr) {
: to
: while (first->va_start < addr) {
: without thinking about it any further; but that seemed unsatisfactory,
: why would we want to loop here when we've got another very similar
: loop just below it?

: I am never going to admit how long I've spent trying to grasp your
: "while (n)" rbtree loop just above this, the one with the peculiar
: if (!first && tmp->va_start < addr + size)
: in. That's unfamiliar to me, I'm guessing it's designed to save a
: subsequent rb_next() in a few circumstances (at risk of then setting
: a wrong cached_hole_size?); but they did appear few to me, and I didn't
: feel I could sign off something with that in when I don't grasp it,
: and it seems responsible for extra code and mistaken BUG_ON below it.

: I've reverted to the familiar rbtree loop that find_vma() does (but
: with va_end >= addr as you had, to respect the additional guard page):
: and then (given that cached_hole_size starts out 0) I don't see the
: need for any complications below it. If you do want to keep that loop
: as you had it, please add a comment to explain what it's trying to do,
: and where addr is relative to first when you emerge from it.

: Aren't your tests "size first->va_start" forgetting the guard page we want
: before the next area? I've changed those.

: I have not changed your many "addr + size - 1 < addr" overflow tests,
: but have since come to wonder, shouldn't they be "addr + size < addr"
: tests - won't the vend checks go wrong if addr + size is 0?

: I have added a few comments - Wolfgang Wander's 2.6.13 description of
: 1363c3cd8603a913a27e2995dccbd70d5312d8e6 Avoiding mmap fragmentation
: helped me a lot, perhaps a pointer to that would be good too. And I found
: it easier to understand when I renamed cached_start slightly and moved the
: overflow label down.

: This patch would go after your mm-vmap-area-cache.patch in mmotm.
: Trivially, nobody is going to get that BUG_ON with this patch, and it
: appears to work fine on my machines; but I have not given it anything like
: the testing you did on your original, and may have broken all the
: performance you were aiming for. Please take a look and test it out
: integrate with yours if you're satisfied - thanks.

[akpm@linux-foundation.org: add locking comment]
Signed-off-by: Nick Piggin
Signed-off-by: Hugh Dickins
Reviewed-by: Minchan Kim
Reported-and-tested-by: Steven Whitehouse
Reported-and-tested-by: Avi Kivity
Tested-by: "Barry J. Marson"
Cc: Prarit Bhargava
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2011-03-23 08:44:00 +0800

14 Jan, 2011

6 commits

52cfd503a Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 ... Browse Code »

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (59 commits)
ACPI / PM: Fix build problems for !CONFIG_ACPI related to NVS rework
ACPI: fix resource check message
ACPI / Battery: Update information on info notification and resume
ACPI: Drop device flag wake_capable
ACPI: Always check if _PRW is present before trying to evaluate it
ACPI / PM: Check status of power resources under mutexes
ACPI / PM: Rename acpi_power_off_device()
ACPI / PM: Drop acpi_power_nocheck
ACPI / PM: Drop acpi_bus_get_power()
Platform / x86: Make fujitsu_laptop use acpi_bus_update_power()
ACPI / Fan: Rework the handling of power resources
ACPI / PM: Register power resource devices as soon as they are needed
ACPI / PM: Register acpi_power_driver early
ACPI / PM: Add function for updating device power state consistently
ACPI / PM: Add function for device power state initialization
ACPI / PM: Introduce __acpi_bus_get_power()
ACPI / PM: Introduce function for refcounting device power resources
ACPI / PM: Add functions for manipulating lists of power resources
ACPI / PM: Prevent acpi_power_get_inferred_state() from making changes
ACPICA: Update version to 20101209
...

Linus Torvalds
2011-01-14 12:15:35 +0800
ddf9c6d47 vmalloc: remove redundant unlikely() ... Browse Code »

IS_ERR() already implies unlikely(), so it can be omitted here.

Signed-off-by: Tobias Klauser
Reviewed-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tobias Klauser
2011-01-14 09:32:36 +0800
d0a21265d mm: unify module_alloc code for vmalloc ... Browse Code »

Four architectures (arm, mips, sparc, x86) use __vmalloc_area() for
module_init(). Much of the code is duplicated and can be generalized in a
globally accessible function, __vmalloc_node_range().

__vmalloc_node() now calls into __vmalloc_node_range() with a range of
[VMALLOC_START, VMALLOC_END) for functionally equivalent behavior.

Each architecture may then use __vmalloc_node_range() directly to remove
the duplication of code.

Signed-off-by: David Rientjes
Cc: Christoph Lameter
Cc: Russell King
Cc: Ralf Baechle
Cc: "David S. Miller"
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-01-14 09:32:34 +0800
ec3f64fc9 mm: remove gfp mask from pcpu_get_vm_areas ... Browse Code »

pcpu_get_vm_areas() only uses GFP_KERNEL allocations, so remove the gfp_t
formal and use the mask internally.

Signed-off-by: David Rientjes
Cc: Christoph Lameter
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-01-14 09:32:34 +0800
e5a5623b2 mm: remove unused get_vm_area_node ... Browse Code »

get_vm_area_node() is unused in the kernel and can thus be removed.

Signed-off-by: David Rientjes
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-01-14 09:32:34 +0800
62c70bce8 mm: convert sprintf_symbol to %pS ... Browse Code »

Signed-off-by: Joe Perches
Acked-by: Pekka Enberg
Cc: Jiri Kosina
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2011-01-14 09:32:33 +0800

12 Jan, 2011

1 commit

81e88fdc4 ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support ... Browse Code »

Generic Hardware Error Source provides a way to report platform
hardware errors (such as that from chipset). It works in so called
"Firmware First" mode, that is, hardware errors are reported to
firmware firstly, then reported to Linux by firmware. This way, some
non-standard hardware error registers or non-standard hardware link
can be checked by firmware to produce more valuable hardware error
information for Linux.

This patch adds POLL/IRQ/NMI notification types support.

Because the memory area used to transfer hardware error information
from BIOS to Linux can be determined only in NMI, IRQ or timer
handler, but general ioremap can not be used in atomic context, so a
special version of atomic ioremap is implemented for that.

Known issue:

- Error information can not be printed for recoverable errors notified
via NMI, because printk is not NMI-safe. Will fix this via delay
printing to IRQ context via irq_work or make printk NMI-safe.

v2:

- adjust printk format per comments.

Signed-off-by: Huang Ying
Reviewed-by: Andi Kleen
Signed-off-by: Len Brown

Huang Ying
2011-01-12 16:06:19 +0800

03 Dec, 2010

1 commit

64141da58 vmalloc: eagerly clear ptes on vunmap ... Browse Code »

On stock 2.6.37-rc4, running:

# mount lilith:/export /mnt/lilith
# find /mnt/lilith/ -type f -print0 | xargs -0 file

crashes the machine fairly quickly under Xen. Often it results in oops
messages, but the couple of times I tried just now, it just hung quietly
and made Xen print some rude messages:

(XEN) mm.c:2389:d80 Bad type (saw 7400000000000001 != exp
3000000000000000) for mfn 1d7058 (pfn 18fa7)
(XEN) mm.c:964:d80 Attempt to create linear p.t. with write perms
(XEN) mm.c:2389:d80 Bad type (saw 7400000000000010 != exp
1000000000000000) for mfn 1d2e04 (pfn 1d1fb)
(XEN) mm.c:2965:d80 Error while pinning mfn 1d2e04

Which means the domain tried to map a pagetable page RW, which would
allow it to map arbitrary memory, so Xen stopped it. This is because
vm_unmap_ram() left some pages mapped in the vmalloc area after NFS had
finished with them, and those pages got recycled as pagetable pages
while still having these RW aliases.

Removing those mappings immediately removes the Xen-visible aliases, and
so it has no problem with those pages being reused as pagetable pages.
Deferring the TLB flush doesn't upset Xen because it can flush the TLB
itself as needed to maintain its invariants.

When unmapping a region in the vmalloc space, clear the ptes
immediately. There's no point in deferring this because there's no
amortization benefit.

The TLBs are left dirty, and they are flushed lazily to amortize the
cost of the IPIs.

This specific motivation for this patch is an oops-causing regression
since 2.6.36 when using NFS under Xen, triggered by the NFS client's use
of vm_map_ram() introduced in 56e4ebf877b60 ("NFS: readdir with vmapped
pages") . XFS also uses vm_map_ram() and could cause similar problems.

Signed-off-by: Jeremy Fitzhardinge
Cc: Nick Piggin
Cc: Bryan Schumaker
Cc: Trond Myklebust
Cc: Alex Elder
Cc: Dave Chinner
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeremy Fitzhardinge
2010-12-03 06:51:15 +0800

27 Oct, 2010

3 commits

e1ca7788d mm: add vzalloc() and vzalloc_node() helpers ... Browse Code »

Add vzalloc() and vzalloc_node() to encapsulate the
vmalloc-then-memset-zero operation.

Use __GFP_ZERO to zero fill the allocated memory.

Signed-off-by: Dave Young
Cc: Christoph Lameter
Acked-by: Greg Ungerer
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Young
2010-10-27 07:52:10 +0800
e199b5d1f vmalloc: annotate lock context change on s_start/stop() ... Browse Code »

s_start() and s_stop() grab/release vmlist_lock but were missing proper
annotations. Add them.

Signed-off-by: Namhyung Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
2010-10-27 07:52:10 +0800
170168d0a vmalloc: rename temporary variable in __insert_vmap_area() ... Browse Code »

Rename redundant 'tmp' to fix following sparse warnings:

mm/vmalloc.c:296:34: warning: symbol 'tmp' shadows an earlier one
mm/vmalloc.c:293:24: originally declared here

Signed-off-by: Namhyung Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
2010-10-27 07:52:10 +0800

23 Oct, 2010

1 commit

0fc0531e0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: update comments to reflect that percpu allocations are always zero-filled
percpu: Optimize __get_cpu_var()
x86, percpu: Optimize this_cpu_ptr
percpu: clear memory allocated with the km allocator
percpu: fix build breakage on s390 and cleanup build configuration tests
percpu: use percpu allocator on UP too
percpu: reduce PCPU_MIN_UNIT_SIZE to 32k
vmalloc: pcpu_get/free_vm_areas() aren't needed on UP

Fixed up trivial conflicts in include/linux/percpu.h

Linus Torvalds
2010-10-23 08:31:36 +0800

17 Sep, 2010

1 commit

3ee48b6af mm, x86: Saving vmcore with non-lazy freeing of vmas ... Browse Code »

During the reading of /proc/vmcore the kernel is doing
ioremap()/iounmap() repeatedly. And the buildup of un-flushed
vm_area_struct's is causing a great deal of overhead. (rb_next()
is chewing up most of that time).

This solution is to provide function set_iounmap_nonlazy(). It
causes a subsequent call to iounmap() to immediately purge the
vma area (with try_purge_vmap_area_lazy()).

With this patch we have seen the time for writing a 250MB
compressed dump drop from 71 seconds to 44 seconds.

Signed-off-by: Cliff Wickman
Cc: Andrew Morton
Cc: kexec@lists.infradead.org
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar

Cliff Wickman
2010-09-17 15:11:56 +0800

08 Sep, 2010

1 commit

4f8b02b4e vmalloc: pcpu_get/free_vm_areas() aren't needed on UP ... Browse Code »

These functions are used only by percpu memory allocator on SMP.
Don't build them on UP.

Signed-off-by: Tejun Heo
Cc: Nick Piggin
Reviewed-by: Chrsitoph Lameter

Tejun Heo
2010-09-08 17:10:47 +0800

13 Aug, 2010

1 commit

26f0cf918 Merge branch 'stable/xen-swiotlb-0.8.6' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen ... Browse Code »

* 'stable/xen-swiotlb-0.8.6' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
x86: Detect whether we should use Xen SWIOTLB.
pci-swiotlb-xen: Add glue code to setup dma_ops utilizing xen_swiotlb_* functions.
swiotlb-xen: SWIOTLB library for Xen PV guest with PCI passthrough.
xen/mmu: inhibit vmap aliases rather than trying to clear them out
vmap: add flag to allow lazy unmap to be disabled at runtime
xen: Add xen_create_contiguous_region
xen: Rename the balloon lock
xen: Allow unprivileged Xen domains to create iomap pages
xen: use _PAGE_IOMAP in ioremap to do machine mappings

Fix up trivial conflicts (adding both xen swiotlb and xen pci platform
driver setup close to each other) in drivers/xen/{Kconfig,Makefile} and
include/xen/xen-ops.h

Linus Torvalds
2010-08-13 00:09:41 +0800

10 Aug, 2010

2 commits

51980ac9e mm/vmalloc.c: check kmalloc() return value ... Browse Code »

kmalloc() may fail, if so return -ENOMEM.

Signed-off-by: Kulikov Vasiliy
Acked-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kulikov Vasiliy
2010-08-10 11:45:03 +0800
e7d863407 mm: use ERR_CAST ... Browse Code »

Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more
clear what is the purpose of the operation, which otherwise looks like a
no-op.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

//
@@
type T;
T x;
identifier f;
@@

T f (...) { }

@@
expression x;
@@

- ERR_PTR(PTR_ERR(x))
+ ERR_CAST(x)
//

Signed-off-by: Julia Lawall
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Julia Lawall
2010-08-10 11:44:54 +0800

27 Jul, 2010

1 commit

a0d40c802 vmap: add flag to allow lazy unmap to be disabled at runtime ... Browse Code »

Add a flag to force lazy_max_pages() to zero to prevent any outstanding
mapped pages. We'll need this for Xen.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Konrad Rzeszutek Wilk
Acked-by: Nick Piggin

Jeremy Fitzhardinge
2010-07-27 23:49:09 +0800

10 Jul, 2010

1 commit

ffa71f33a x86, ioremap: Fix incorrect physical address handling in PAE mode ... Browse Code »

Current x86 ioremap() doesn't handle physical address higher than
32-bit properly in X86_32 PAE mode. When physical address higher than
32-bit is passed to ioremap(), higher 32-bits in physical address is
cleared wrongly. Due to this bug, ioremap() can map wrong address to
linear address space.

In my case, 64-bit MMIO region was assigned to a PCI device (ioat
device) on my system. Because of the ioremap()'s bug, wrong physical
address (instead of MMIO region) was mapped to linear address space.
Because of this, loading ioatdma driver caused unexpected behavior
(kernel panic, kernel hangup, ...).

Signed-off-by: Kenji Kaneshige
LKML-Reference:
Signed-off-by: H. Peter Anvin

Kenji Kaneshige
2010-07-10 02:42:03 +0800

03 Feb, 2010

2 commits

02b709df8 mm: purge fragmented percpu vmap blocks ... Browse Code »

Improve handling of fragmented per-CPU vmaps. We previously don't free
up per-CPU maps until all its addresses have been used and freed. So
fragmented blocks could fill up vmalloc space even if they actually had
no active vmap regions within them.

Add some logic to allow all CPUs to have these blocks purged in the case
of failure to allocate a new vm area, and also put some logic to trim
such blocks of a current CPU if we hit them in the allocation path (so
as to avoid a large build up of them).

Christoph reported some vmap allocation failures when using the per CPU
vmap APIs in XFS, which cannot be reproduced after this patch and the
previous bug fix.

Cc: linux-mm@kvack.org
Cc: stable@kernel.org
Tested-by: Christoph Hellwig
Signed-off-by: Nick Piggin
--
Signed-off-by: Linus Torvalds

Nick Piggin
2010-02-03 04:50:47 +0800
de5604231 mm: percpu-vmap fix RCU list walking ... Browse Code »

RCU list walking of the per-cpu vmap cache was broken. It did not use
RCU primitives, and also the union of free_list and rcu_head is
obviously wrong (because free_list is indeed the list we are RCU
walking).

While we are there, remove a couple of unused fields from an earlier
iteration.

These APIs aren't actually used anywhere, because of problems with the
XFS conversion. Christoph has now verified that the problems are solved
with these patches. Also it is an exported interface, so I think it
will be good to be merged now (and Christoph wants to get the XFS
changes into their local tree).

Cc: stable@kernel.org
Cc: linux-mm@kvack.org
Tested-by: Christoph Hellwig
Signed-off-by: Nick Piggin
--
Signed-off-by: Linus Torvalds

Nick Piggin
2010-02-03 04:50:47 +0800

21 Jan, 2010

1 commit

88f500443 vmalloc: remove BUG_ON due to racy counting of VM_LAZY_FREE ... Browse Code »

In free_unmap_area_noflush(), va->flags is marked as VM_LAZY_FREE first, and
then vmap_lazy_nr is increased atomically.

But, in __purge_vmap_area_lazy(), while traversing of vmap_are_list, nr
is counted by checking VM_LAZY_FREE is set to va->flags. After counting
the variable nr, kernel reads vmap_lazy_nr atomically and checks a
BUG_ON condition whether nr is greater than vmap_lazy_nr to prevent
vmap_lazy_nr from being negative.

The problem is that, if interrupted right after marking VM_LAZY_FREE,
increment of vmap_lazy_nr can be delayed. Consequently, BUG_ON
condition can be met because nr is counted more than vmap_lazy_nr.

It is highly probable when vmalloc/vfree are called frequently. This
scenario have been verified by adding delay between marking VM_LAZY_FREE
and increasing vmap_lazy_nr in free_unmap_area_noflush().

Even the vmap_lazy_nr is for checking high watermark, it never be the
strict watermark. Although the BUG_ON condition is to prevent
vmap_lazy_nr from being negative, vmap_lazy_nr is signed variable. So,
it could go down to negative value temporarily.

Consequently, removing the BUG_ON condition is proper.

A possible BUG_ON message is like the below.

kernel BUG at mm/vmalloc.c:517!
invalid opcode: 0000 [#1] SMP
EIP: 0060:[] EFLAGS: 00010297 CPU: 3
EIP is at __purge_vmap_area_lazy+0x144/0x150
EAX: ee8a8818 EBX: c08e77d4 ECX: e7c7ae40 EDX: c08e77ec
ESI: 000081fe EDI: e7c7ae60 EBP: e7c7ae64 ESP: e7c7ae3c
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Call Trace:
[] free_unmap_vmap_area_noflush+0x69/0x70
[] remove_vm_area+0x22/0x70
[] __vunmap+0x45/0xe0
[] vmalloc+0x2c/0x30
Code: 8d 59 e0 eb 04 66 90 89 cb 89 d0 e8 87 fe ff ff 8b 43 20 89 da 8d 48 e0 8d 43 20 3b 04 24 75 e7 fe 05 a8 a5 a3 c0 e9 78 ff ff ff 0b eb fe 90 8d b4 26 00 00 00 00 56 89 c6 b8 ac a5 a3 c0 31
EIP: [] __purge_vmap_area_lazy+0x144/0x150 SS:ESP 0068:e7c7ae3c

[ See also http://marc.info/?l=linux-kernel&m=126335856228090&w=2 ]

Signed-off-by: Yongseok Koh
Reviewed-by: Minchan Kim
Cc: Nick Piggin
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Yongseok Koh
2010-01-21 23:20:06 +0800

16 Dec, 2009

1 commit

976d6dfbb vmalloc(): adjust gfp mask passed on nested vmalloc() invocation ... Browse Code »

- avoid wasting more precious resources (DMA or DMA32 pools), when
being called through vmalloc_32{,_user}()
- explicitly allow using high memory here even if the outer allocation
request doesn't allow it

Signed-off-by: Jan Beulich
Acked-by: Hugh Dickins
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Beulich
2009-12-16 00:53:13 +0800

15 Dec, 2009

1 commit

d0316554d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
m68k: rename global variable vmalloc_end to m68k_vmalloc_end
percpu: add missing per_cpu_ptr_to_phys() definition for UP
percpu: Fix kdump failure if booted with percpu_alloc=page
percpu: make misc percpu symbols unique
percpu: make percpu symbols in ia64 unique
percpu: make percpu symbols in powerpc unique
percpu: make percpu symbols in x86 unique
percpu: make percpu symbols in xen unique
percpu: make percpu symbols in cpufreq unique
percpu: make percpu symbols in oprofile unique
percpu: make percpu symbols in tracer unique
percpu: make percpu symbols under kernel/ and mm/ unique
percpu: remove some sparse warnings
percpu: make alloc_percpu() handle array types
vmalloc: fix use of non-existent percpu variable in put_cpu_var()
this_cpu: Use this_cpu_xx in trace_functions_graph.c
this_cpu: Use this_cpu_xx for ftrace
this_cpu: Use this_cpu_xx in nmi handling
this_cpu: Use this_cpu operations in RCU
this_cpu: Use this_cpu ops for VM statistics
...

Fix up trivial (famous last words) global per-cpu naming conflicts in
arch/x86/kvm/svm.c
mm/slab.c

Linus Torvalds
2009-12-15 01:58:24 +0800

29 Oct, 2009

1 commit

3f04ba859 vmalloc: fix use of non-existent percpu variable in put_cpu_var() ... Browse Code »

vmalloc used non-existent percpu variable vmap_cpu_blocks instead of
the intended vmap_block_queue. This went unnoticed because
put_cpu_var() didn't evaluate the parameter. Fix it.

Signed-off-by: Tejun Heo
Cc: Nick Piggin

Tejun Heo
2009-10-29 21:34:12 +0800

12 Oct, 2009

1 commit

d43c36dc6 headers: remove sched.h from interrupt.h ... Browse Code »

After m68k's task_thread_info() doesn't refer to current,
it's possible to remove sched.h from interrupt.h and not break m68k!
Many thanks to Heiko Carstens for allowing this.

Signed-off-by: Alexey Dobriyan

Alexey Dobriyan
2009-10-12 02:20:58 +0800

09 Oct, 2009

1 commit

b924f9599 Merge branch 'sparc-perf-events-fixes-for-linus' of git://git.kernel.org/pub/scm… ... Browse Code »

…/linux/kernel/git/tip/linux-2.6-tip

* 'sparc-perf-events-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
mm, perf_event: Make vmalloc_user() align base kernel virtual address to SHMLBA
perf_event: Provide vmalloc() based mmap() backing

Linus Torvalds
2009-10-09 03:05:50 +0800

08 Oct, 2009

2 commits

2dca6999e mm, perf_event: Make vmalloc_user() align base kernel virtual address to SHMLBA ... Browse Code »

When a vmalloc'd area is mmap'd into userspace, some kind of
co-ordination is necessary for this to work on platforms with cpu
D-caches which can have aliases.

Otherwise kernel side writes won't be seen properly in userspace
and vice versa.

If the kernel side mapping and the user side one have the same
alignment, modulo SHMLBA, this can work as long as VM_SHARED is
shared of VMA and for all current users this is true. VM_SHARED
will force SHMLBA alignment of the user side mmap on platforms with
D-cache aliasing matters.

The bulk of this patch is just making it so that a specific
alignment can be passed down into __get_vm_area_node(). All
existing callers pass in '1' which preserves existing behavior.
vmalloc_user() gives SHMLBA for the alignment.

As a side effect this should get the video media drivers and other
vmalloc_user() users into more working shape on such systems.

Signed-off-by: David S. Miller
Acked-by: Peter Zijlstra
Cc: Jens Axboe
Cc: Nick Piggin
Signed-off-by: Andrew Morton
LKML-Reference:
Signed-off-by: Ingo Molnar

David Miller
2009-10-08 23:02:31 +0800
3700c155a mm: includecheck fix: vmalloc.c ... Browse Code »

fix the following 'make includecheck' warning:

mm/vmalloc.c: linux/highmem.h is included more than once.

Signed-off-by: Jaswinder Singh Rajput
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jaswinder Singh Rajput
2009-10-08 22:36:38 +0800

23 Sep, 2009

1 commit

81ac3ad90 kcore: register module area in generic way ... Browse Code »

Some archs define MODULED_VADDR/MODULES_END which is not in VMALLOC area.
This is handled only in x86-64. This patch make it more generic. And we
can use vread/vwrite to access the area. Fix it.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Jiri Slaby
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: WANG Cong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-09-23 22:39:42 +0800

22 Sep, 2009

4 commits

4481374ce mm: replace various uses of num_physpages by totalram_pages ... Browse Code »

Sizing of memory allocations shouldn't depend on the number of physical
pages found in a system, as that generally includes (perhaps a huge amount
of) non-RAM pages. The amount of what actually is usable as storage
should instead be used as a basis here.

Some of the calculations (i.e. those not intending to use high memory)
should likely even use (totalram_pages - totalhigh_pages).

Signed-off-by: Jan Beulich
Acked-by: Rusty Russell
Acked-by: Ingo Molnar
Cc: Dave Airlie
Cc: Kyle McMartin
Cc: Jeremy Fitzhardinge
Cc: Pekka Enberg
Cc: Hugh Dickins
Cc: "David S. Miller"
Cc: Patrick McHardy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Beulich
2009-09-22 22:17:38 +0800
d0107eb07 kcore: fix vread/vwrite to be aware of holes ... Browse Code »

vread/vwrite access vmalloc area without checking there is a page or not.
In most case, this works well.

In old ages, the caller of get_vm_ara() is only IOREMAP and there is no
memory hole within vm_struct's [addr...addr + size - PAGE_SIZE] (
-PAGE_SIZE is for a guard page.)

After per-cpu-alloc patch, it uses get_vm_area() for reserve continuous
virtual address but remap _later_. There tend to be a hole in valid
vmalloc area in vm_struct lists. Then, skip the hole (not mapped page) is
necessary. This patch updates vread/vwrite() for avoiding memory hole.

Routines which access vmalloc area without knowing for which addr is used
are
- /proc/kcore
- /dev/kmem

kcore checks IOREMAP, /dev/kmem doesn't. After this patch, IOREMAP is
checked and /dev/kmem will avoid to read/write it. Fixes to /proc/kcore
will be in the next patch in series.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: WANG Cong
Cc: Mike Smith
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-09-22 22:17:34 +0800
dd32c2799 vmalloc: unmap vmalloc area after hiding it ... Browse Code »

vmap area should be purged after vm_struct is removed from the list
because vread/vwrite etc...believes the range is valid while it's on
vm_struct list.

Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: WANG Cong
Cc: Mike Smith
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-09-22 22:17:33 +0800
bf88c8c83 vmalloc.c: fix double error checking ... Browse Code »

There is no need for double error checking.

Signed-off-by: Figo.zhang
Acked-by: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Figo.zhang
2009-09-22 22:17:30 +0800

14 Aug, 2009

2 commits

ca23e405e vmalloc: implement pcpu_get_vm_areas() ... Browse Code »

To directly use spread NUMA memories for percpu units, percpu
allocator will be updated to allow sparsely mapping units in a chunk.
As the distances between units can be very large, this makes
allocating single vmap area for each chunk undesirable. This patch
implements pcpu_get_vm_areas() and pcpu_free_vm_areas() which
allocates and frees sparse congruent vmap areas.

pcpu_get_vm_areas() take @offsets and @sizes array which define
distances and sizes of vmap areas. It scans down from the top of
vmalloc area looking for the top-most address which can accomodate all
the areas. The top-down scan is to avoid interacting with regular
vmallocs which can push up these congruent areas up little by little
ending up wasting address space and page table.

To speed up top-down scan, the highest possible address hint is
maintained. Although the scan is linear from the hint, given the
usual large holes between memory addresses between NUMA nodes, the
scanning is highly likely to finish after finding the first hole for
the last unit which is scanned first.

Signed-off-by: Tejun Heo
Cc: Nick Piggin

Tejun Heo
2009-08-14 14:00:52 +0800
cf88c7900 vmalloc: separate out insert_vmalloc_vm() ... Browse Code »

Separate out insert_vmalloc_vm() from __get_vm_area_node().
insert_vmalloc_vm() initializes vm_struct from vmap_area and inserts
it into vmlist. insert_vmalloc_vm() only initializes fields which can
be determined from @vm, @flags and @caller The rest should be
initialized by the caller. For __get_vm_area_node(), all other fields
just need to be cleared and this is done by using kzalloc instead of
kmalloc.

This will be used to implement pcpu_get_vm_areas().

Signed-off-by: Tejun Heo
Cc: Nick Piggin

Tejun Heo
2009-08-14 14:00:52 +0800

12 Jun, 2009

2 commits

512626a04 Merge branch 'for-linus' of git://linux-arm.org/linux-2.6 ... Browse Code »

* 'for-linus' of git://linux-arm.org/linux-2.6:
kmemleak: Add the corresponding MAINTAINERS entry
kmemleak: Simple testing module for kmemleak
kmemleak: Enable the building of the memory leak detector
kmemleak: Remove some of the kmemleak false positives
kmemleak: Add modules support
kmemleak: Add kmemleak_alloc callback from alloc_large_system_hash
kmemleak: Add the vmalloc memory allocation/freeing hooks
kmemleak: Add the slub memory allocation/freeing hooks
kmemleak: Add the slob memory allocation/freeing hooks
kmemleak: Add the slab memory allocation/freeing hooks
kmemleak: Add documentation on the memory leak detector
kmemleak: Add the base support

Manual conflict resolution (with the slab/earlyboot changes) in:
drivers/char/vt.c
init/main.c
mm/slab.c

Linus Torvalds
2009-06-12 05:15:57 +0800
43ebdac42 vmalloc: use kzalloc() instead of alloc_bootmem() ... Browse Code »

We can call vmalloc_init() after kmem_cache_init() and use kzalloc() instead of
the bootmem allocator when initializing vmalloc data structures.

Acked-by: Johannes Weiner
Acked-by: Linus Torvalds
Acked-by: Nick Piggin
Cc: Ingo Molnar
Cc: Yinghai Lu
Signed-off-by: Pekka Enberg

Pekka Enberg
2009-06-12 00:17:05 +0800