16 Dec, 2009

1 commit


15 Dec, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
    m68k: rename global variable vmalloc_end to m68k_vmalloc_end
    percpu: add missing per_cpu_ptr_to_phys() definition for UP
    percpu: Fix kdump failure if booted with percpu_alloc=page
    percpu: make misc percpu symbols unique
    percpu: make percpu symbols in ia64 unique
    percpu: make percpu symbols in powerpc unique
    percpu: make percpu symbols in x86 unique
    percpu: make percpu symbols in xen unique
    percpu: make percpu symbols in cpufreq unique
    percpu: make percpu symbols in oprofile unique
    percpu: make percpu symbols in tracer unique
    percpu: make percpu symbols under kernel/ and mm/ unique
    percpu: remove some sparse warnings
    percpu: make alloc_percpu() handle array types
    vmalloc: fix use of non-existent percpu variable in put_cpu_var()
    this_cpu: Use this_cpu_xx in trace_functions_graph.c
    this_cpu: Use this_cpu_xx for ftrace
    this_cpu: Use this_cpu_xx in nmi handling
    this_cpu: Use this_cpu operations in RCU
    this_cpu: Use this_cpu ops for VM statistics
    ...

    Fix up trivial (famous last words) global per-cpu naming conflicts in
    arch/x86/kvm/svm.c
    mm/slab.c

    Linus Torvalds
     

29 Oct, 2009

1 commit


12 Oct, 2009

1 commit


09 Oct, 2009

1 commit


08 Oct, 2009

2 commits

  • When a vmalloc'd area is mmap'd into userspace, some kind of
    co-ordination is necessary for this to work on platforms with cpu
    D-caches which can have aliases.

    Otherwise kernel side writes won't be seen properly in userspace
    and vice versa.

    If the kernel side mapping and the user side one have the same
    alignment, modulo SHMLBA, this can work as long as VM_SHARED is
    shared of VMA and for all current users this is true. VM_SHARED
    will force SHMLBA alignment of the user side mmap on platforms with
    D-cache aliasing matters.

    The bulk of this patch is just making it so that a specific
    alignment can be passed down into __get_vm_area_node(). All
    existing callers pass in '1' which preserves existing behavior.
    vmalloc_user() gives SHMLBA for the alignment.

    As a side effect this should get the video media drivers and other
    vmalloc_user() users into more working shape on such systems.

    Signed-off-by: David S. Miller
    Acked-by: Peter Zijlstra
    Cc: Jens Axboe
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    David Miller
     
  • fix the following 'make includecheck' warning:

    mm/vmalloc.c: linux/highmem.h is included more than once.

    Signed-off-by: Jaswinder Singh Rajput
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jaswinder Singh Rajput
     

23 Sep, 2009

1 commit

  • Some archs define MODULED_VADDR/MODULES_END which is not in VMALLOC area.
    This is handled only in x86-64. This patch make it more generic. And we
    can use vread/vwrite to access the area. Fix it.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Jiri Slaby
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

22 Sep, 2009

4 commits

  • Sizing of memory allocations shouldn't depend on the number of physical
    pages found in a system, as that generally includes (perhaps a huge amount
    of) non-RAM pages. The amount of what actually is usable as storage
    should instead be used as a basis here.

    Some of the calculations (i.e. those not intending to use high memory)
    should likely even use (totalram_pages - totalhigh_pages).

    Signed-off-by: Jan Beulich
    Acked-by: Rusty Russell
    Acked-by: Ingo Molnar
    Cc: Dave Airlie
    Cc: Kyle McMartin
    Cc: Jeremy Fitzhardinge
    Cc: Pekka Enberg
    Cc: Hugh Dickins
    Cc: "David S. Miller"
    Cc: Patrick McHardy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • vread/vwrite access vmalloc area without checking there is a page or not.
    In most case, this works well.

    In old ages, the caller of get_vm_ara() is only IOREMAP and there is no
    memory hole within vm_struct's [addr...addr + size - PAGE_SIZE] (
    -PAGE_SIZE is for a guard page.)

    After per-cpu-alloc patch, it uses get_vm_area() for reserve continuous
    virtual address but remap _later_. There tend to be a hole in valid
    vmalloc area in vm_struct lists. Then, skip the hole (not mapped page) is
    necessary. This patch updates vread/vwrite() for avoiding memory hole.

    Routines which access vmalloc area without knowing for which addr is used
    are
    - /proc/kcore
    - /dev/kmem

    kcore checks IOREMAP, /dev/kmem doesn't. After this patch, IOREMAP is
    checked and /dev/kmem will avoid to read/write it. Fixes to /proc/kcore
    will be in the next patch in series.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: WANG Cong
    Cc: Mike Smith
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • vmap area should be purged after vm_struct is removed from the list
    because vread/vwrite etc...believes the range is valid while it's on
    vm_struct list.

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: WANG Cong
    Cc: Mike Smith
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • There is no need for double error checking.

    Signed-off-by: Figo.zhang
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Figo.zhang
     

14 Aug, 2009

2 commits

  • To directly use spread NUMA memories for percpu units, percpu
    allocator will be updated to allow sparsely mapping units in a chunk.
    As the distances between units can be very large, this makes
    allocating single vmap area for each chunk undesirable. This patch
    implements pcpu_get_vm_areas() and pcpu_free_vm_areas() which
    allocates and frees sparse congruent vmap areas.

    pcpu_get_vm_areas() take @offsets and @sizes array which define
    distances and sizes of vmap areas. It scans down from the top of
    vmalloc area looking for the top-most address which can accomodate all
    the areas. The top-down scan is to avoid interacting with regular
    vmallocs which can push up these congruent areas up little by little
    ending up wasting address space and page table.

    To speed up top-down scan, the highest possible address hint is
    maintained. Although the scan is linear from the hint, given the
    usual large holes between memory addresses between NUMA nodes, the
    scanning is highly likely to finish after finding the first hole for
    the last unit which is scanned first.

    Signed-off-by: Tejun Heo
    Cc: Nick Piggin

    Tejun Heo
     
  • Separate out insert_vmalloc_vm() from __get_vm_area_node().
    insert_vmalloc_vm() initializes vm_struct from vmap_area and inserts
    it into vmlist. insert_vmalloc_vm() only initializes fields which can
    be determined from @vm, @flags and @caller The rest should be
    initialized by the caller. For __get_vm_area_node(), all other fields
    just need to be cleared and this is done by using kzalloc instead of
    kmalloc.

    This will be used to implement pcpu_get_vm_areas().

    Signed-off-by: Tejun Heo
    Cc: Nick Piggin

    Tejun Heo
     

12 Jun, 2009

3 commits

  • * 'for-linus' of git://linux-arm.org/linux-2.6:
    kmemleak: Add the corresponding MAINTAINERS entry
    kmemleak: Simple testing module for kmemleak
    kmemleak: Enable the building of the memory leak detector
    kmemleak: Remove some of the kmemleak false positives
    kmemleak: Add modules support
    kmemleak: Add kmemleak_alloc callback from alloc_large_system_hash
    kmemleak: Add the vmalloc memory allocation/freeing hooks
    kmemleak: Add the slub memory allocation/freeing hooks
    kmemleak: Add the slob memory allocation/freeing hooks
    kmemleak: Add the slab memory allocation/freeing hooks
    kmemleak: Add documentation on the memory leak detector
    kmemleak: Add the base support

    Manual conflict resolution (with the slab/earlyboot changes) in:
    drivers/char/vt.c
    init/main.c
    mm/slab.c

    Linus Torvalds
     
  • We can call vmalloc_init() after kmem_cache_init() and use kzalloc() instead of
    the bootmem allocator when initializing vmalloc data structures.

    Acked-by: Johannes Weiner
    Acked-by: Linus Torvalds
    Acked-by: Nick Piggin
    Cc: Ingo Molnar
    Cc: Yinghai Lu
    Signed-off-by: Pekka Enberg

    Pekka Enberg
     
  • This patch adds the callbacks to kmemleak_(alloc|free) functions from
    vmalloc/vfree.

    Signed-off-by: Catalin Marinas

    Catalin Marinas
     

07 May, 2009

1 commit

  • If alloc_vmap_area() fails the allocated struct vmap_area has to be freed.

    Signed-off-by: Ralph Wuerthner
    Reviewed-by: Christoph Lameter
    Reviewed-by: Minchan Kim
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ralph Wuerthner
     

01 Apr, 2009

1 commit

  • vmap's dirty_list is unused. It's for optimizing flushing. but Nick
    didn't write the code yet. so, we don't need it until time as it is
    needed.

    This patch removes vmap_block's dirty_list and codes related to it.

    Signed-off-by: MinChan Kim
    Acked-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    MinChan Kim
     

04 Mar, 2009

1 commit


01 Mar, 2009

1 commit


28 Feb, 2009

2 commits

  • I just got this new warning from kmemcheck:

    WARNING: kmemcheck: Caught 32-bit read from freed memory (c7806a60)
    a06a80c7ecde70c1a04080c700000000a06709c1000000000000000000000000
    f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f
    ^

    Pid: 0, comm: swapper Not tainted (2.6.29-rc4 #230)
    EIP: 0060:[] EFLAGS: 00000286 CPU: 0
    EIP is at __purge_vmap_area_lazy+0x117/0x140
    EAX: 00070f43 EBX: c7806a40 ECX: c1677080 EDX: 00027b66
    ESI: 00002001 EDI: c170df0c EBP: c170df00 ESP: c178830c
    DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    CR0: 80050033 CR2: c7806b14 CR3: 01775000 CR4: 00000690
    DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    DR6: 00004000 DR7: 00000000
    [] free_unmap_vmap_area_noflush+0x6e/0x70
    [] remove_vm_area+0x2a/0x70
    [] __vunmap+0x45/0xe0
    [] vunmap+0x1e/0x30
    [] text_poke+0x95/0x150
    [] alternatives_smp_unlock+0x49/0x60
    [] alternative_instructions+0x11b/0x124
    [] check_bugs+0xbd/0xdc
    [] start_kernel+0x2ed/0x360
    [] __init_begin+0x9e/0xa9
    [] 0xffffffff

    It happened here:

    $ addr2line -e vmlinux -i c1096df7
    mm/vmalloc.c:540

    Code:

    list_for_each_entry(va, &valist, purge_list)
    __free_vmap_area(va);

    It's this instruction:

    mov 0x20(%ebx),%edx

    Which corresponds to a dereference of va->purge_list.next:

    (gdb) p ((struct vmap_area *) 0)->purge_list.next
    Cannot access memory at address 0x20

    It seems that we should use "safe" list traversal here, as the element
    is freed inside the loop. Please verify that this is the right fix.

    Acked-by: Nick Piggin
    Signed-off-by: Vegard Nossum
    Cc: Pekka Enberg
    Cc: Ingo Molnar
    Cc: "Paul E. McKenney"
    Cc: [2.6.28.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vegard Nossum
     
  • The new vmap allocator can wrap the address and get confused in the case
    of large allocations or VMALLOC_END near the end of address space.

    Problem reported by Christoph Hellwig on a 32-bit XFS workload.

    Signed-off-by: Nick Piggin
    Reported-by: Christoph Hellwig
    Cc: [2.6.28.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

26 Feb, 2009

1 commit


25 Feb, 2009

2 commits


24 Feb, 2009

1 commit

  • Impact: allow larger alignment for early vmalloc area allocation

    Some early vmalloc users might want larger alignment, for example, for
    custom large page mapping. Add @align to vm_area_register_early().
    While at it, drop docbook comment on non-existent @size.

    Signed-off-by: Tejun Heo
    Cc: Nick Piggin
    Cc: Ivan Kokshaysky

    Tejun Heo
     

21 Feb, 2009

1 commit


20 Feb, 2009

3 commits

  • Impact: two more public map/unmap functions

    Implement map_kernel_range_noflush() and unmap_kernel_range_noflush().
    These functions respectively map and unmap address range in kernel VM
    area but doesn't do any vcache or tlb flushing. These will be used by
    new percpu allocator.

    Signed-off-by: Tejun Heo
    Cc: Nick Piggin

    Tejun Heo
     
  • Impact: allow multiple early vm areas

    There are places where kernel VM area needs to be allocated before
    vmalloc is initialized. This is done by allocating static vm_struct,
    initializing several fields and linking it to vmlist and later vmalloc
    initialization picking up these from vmlist. This is currently done
    manually and if there's more than one such areas, there's no defined
    way to arbitrate who gets which address.

    This patch implements vm_area_register_early(), which takes vm_area
    struct with flags and size initialized, assigns address to it and puts
    it on the vmlist. This way, multiple early vm areas can determine
    which addresses they should use. The only current user - alpha mm
    init - is converted to use it.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Impact: proper vcache flush on unmap_kernel_range()

    flush_cache_vunmap() should be called before pages are unmapped. Add
    a call to it in unmap_kernel_range().

    Signed-off-by: Tejun Heo

    Tejun Heo
     

19 Feb, 2009

1 commit

  • We have get_vm_area_caller() and __get_vm_area() but not
    __get_vm_area_caller()

    On powerpc, I use __get_vm_area() to separate the ranges of addresses
    given to vmalloc vs. ioremap (various good reasons for that) so in order
    to be able to implement the new caller tracking in /proc/vmallocinfo, I
    need a "_caller" variant of it.

    (akpm: needed for ongoing powerpc development, so merge it early)

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Benjamin Herrenschmidt
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     

16 Jan, 2009

2 commits

  • Revert commit e97a630eb0f5b8b380fd67504de6cedebb489003 ("mm: vmalloc use
    mutex for purge")

    Bryan Donlan reports:

    : After testing 2.6.29-rc1 on xen-x86 with a btrfs root filesystem, I
    : got the OOPS quoted below and a hard freeze shortly after boot.
    : Boot messages and config are attached.
    :
    : ------------[ cut here ]------------
    : Kernel BUG at c05ef80d [verbose debug info unavailable]
    : invalid opcode: 0000 [#1] SMP
    : last sysfs file: /sys/block/xvdc/size
    : Modules linked in:
    :
    : Pid: 0, comm: swapper Not tainted (2.6.29-rc1 #6)
    : EIP: 0061:[] EFLAGS: 00010087 CPU: 2
    : EIP is at schedule+0x7cd/0x950
    : EAX: d5aeca80 EBX: 00000002 ECX: 00000000 EDX: d4cb9a40
    : ESI: c12f5600 EDI: d4cb9a40 EBP: d6033fa4 ESP: d6033ef4
    : DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
    : Process swapper (pid: 0, ti=d6032000 task=d6020b70 task.ti=d6032000)
    : Stack:
    : 000d85bc 00000000 000186a0 00000000 0dd11410 c0105417 c12efe00 0dc367c3
    : 00000011 c0105d46 d5a5d310 deadbeef d4cb9a40 c07cc600 c05f1340 c12e0060
    : deadbeef d6020b70 d6020d08 00000002 c014377d 00000000 c12f5600 00002c22
    : Call Trace:
    : [] xen_force_evtchn_callback+0x17/0x30
    : [] check_events+0x8/0x12
    : [] _spin_unlock_irqrestore+0x20/0x40
    : [] hrtimer_start_range_ns+0x12d/0x2e0
    : [] tick_nohz_restart_sched_tick+0x146/0x160
    : [] cpu_idle+0xa5/0xc0

    and bisected it to this commit.

    Let's remove it now while we have a think about the problem.

    Reported-by: Bryan Donlan
    Tested-by: Christophe Saout
    Cc: Nick Piggin
    Cc: Ingo Molnar
    Cc: Jeremy Fitzhardinge
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • On alpha, we have to map some stuff in the VMALLOC space very early in the
    boot process (to make SRM console callbacks work and so on, see
    arch/alpha/mm/init.c). For old VM allocator, we just manually placed a
    vm_struct onto the global vmlist and this worked for ages.

    Unfortunately, the new allocator isn't aware of this, so it constantly
    tries to allocate the VM space which is already in use, making vmalloc on
    alpha defunct.

    This patch forces KVA to import vmlist entries on init.

    [akpm@linux-foundation.org: remove unneeded check (per Johannes)]
    Signed-off-by: Ivan Kokshaysky
    Cc: Nick Piggin
    Cc: Johannes Weiner
    Cc: Richard Henderson
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ivan Kokshaysky
     

07 Jan, 2009

4 commits

  • Lazy unmapping in the vmalloc code has now opened the possibility for use
    after free bugs to go undetected. We can catch those by forcing an unmap
    and flush (which is going to be slow, but that's what happens).

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • The vmalloc purge lock can be a mutex so we can sleep while a purge is
    going on (purge involves a global kernel TLB invalidate, so it can take
    quite a while).

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • If we do that, output of files like /proc/vmallocinfo will show things
    like "vmalloc_32", "vmalloc_user", or whomever the caller was as the
    caller. This info is not as useful as the real caller of the allocation.

    So, proposal is to call __vmalloc_node node directly, with matching
    parameters to save the caller information

    Signed-off-by: Glauber Costa
    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     
  • If we can't service a vmalloc allocation, show size of the allocation that
    actually failed. Useful for debugging.

    Signed-off-by: Glauber Costa
    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Glauber Costa
     

05 Jan, 2009

1 commit


11 Dec, 2008

1 commit

  • Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked
    to my 966c8c12dc9e77f931e2281ba25d2f0244b06949 sprint_symbol(): use
    less stack exposing a bug in slub's list_locations() -
    kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was
    beyond the end of page provided.

    The 100 slop which list_locations() allows at end of page looks roughly
    enough for all the other stuff it might print after the symbol before
    it checks again: break out KSYM_SYMBOL_LEN earlier than before.

    Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they
    need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer
    where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies
    them.

    [akpm@linux-foundation.org: ftrace.h needs module.h]
    Signed-off-by: Hugh Dickins
    Cc: Christoph Lameter
    Cc Miles Lane
    Acked-by: Pekka Enberg
    Acked-by: Steven Rostedt
    Acked-by: Frederic Weisbecker
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins