13 Aug, 2010

1 commit

  • * 'stable/xen-swiotlb-0.8.6' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    x86: Detect whether we should use Xen SWIOTLB.
    pci-swiotlb-xen: Add glue code to setup dma_ops utilizing xen_swiotlb_* functions.
    swiotlb-xen: SWIOTLB library for Xen PV guest with PCI passthrough.
    xen/mmu: inhibit vmap aliases rather than trying to clear them out
    vmap: add flag to allow lazy unmap to be disabled at runtime
    xen: Add xen_create_contiguous_region
    xen: Rename the balloon lock
    xen: Allow unprivileged Xen domains to create iomap pages
    xen: use _PAGE_IOMAP in ioremap to do machine mappings

    Fix up trivial conflicts (adding both xen swiotlb and xen pci platform
    driver setup close to each other) in drivers/xen/{Kconfig,Makefile} and
    include/xen/xen-ops.h

    Linus Torvalds
     

10 Aug, 2010

2 commits

  • kmalloc() may fail, if so return -ENOMEM.

    Signed-off-by: Kulikov Vasiliy
    Acked-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kulikov Vasiliy
     
  • Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more
    clear what is the purpose of the operation, which otherwise looks like a
    no-op.

    The semantic patch that makes this change is as follows:
    (http://coccinelle.lip6.fr/)

    //
    @@
    type T;
    T x;
    identifier f;
    @@

    T f (...) { }

    @@
    expression x;
    @@

    - ERR_PTR(PTR_ERR(x))
    + ERR_CAST(x)
    //

    Signed-off-by: Julia Lawall
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Julia Lawall
     

27 Jul, 2010

1 commit


10 Jul, 2010

1 commit

  • Current x86 ioremap() doesn't handle physical address higher than
    32-bit properly in X86_32 PAE mode. When physical address higher than
    32-bit is passed to ioremap(), higher 32-bits in physical address is
    cleared wrongly. Due to this bug, ioremap() can map wrong address to
    linear address space.

    In my case, 64-bit MMIO region was assigned to a PCI device (ioat
    device) on my system. Because of the ioremap()'s bug, wrong physical
    address (instead of MMIO region) was mapped to linear address space.
    Because of this, loading ioatdma driver caused unexpected behavior
    (kernel panic, kernel hangup, ...).

    Signed-off-by: Kenji Kaneshige
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Kenji Kaneshige
     

03 Feb, 2010

2 commits

  • Improve handling of fragmented per-CPU vmaps. We previously don't free
    up per-CPU maps until all its addresses have been used and freed. So
    fragmented blocks could fill up vmalloc space even if they actually had
    no active vmap regions within them.

    Add some logic to allow all CPUs to have these blocks purged in the case
    of failure to allocate a new vm area, and also put some logic to trim
    such blocks of a current CPU if we hit them in the allocation path (so
    as to avoid a large build up of them).

    Christoph reported some vmap allocation failures when using the per CPU
    vmap APIs in XFS, which cannot be reproduced after this patch and the
    previous bug fix.

    Cc: linux-mm@kvack.org
    Cc: stable@kernel.org
    Tested-by: Christoph Hellwig
    Signed-off-by: Nick Piggin
    --
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • RCU list walking of the per-cpu vmap cache was broken. It did not use
    RCU primitives, and also the union of free_list and rcu_head is
    obviously wrong (because free_list is indeed the list we are RCU
    walking).

    While we are there, remove a couple of unused fields from an earlier
    iteration.

    These APIs aren't actually used anywhere, because of problems with the
    XFS conversion. Christoph has now verified that the problems are solved
    with these patches. Also it is an exported interface, so I think it
    will be good to be merged now (and Christoph wants to get the XFS
    changes into their local tree).

    Cc: stable@kernel.org
    Cc: linux-mm@kvack.org
    Tested-by: Christoph Hellwig
    Signed-off-by: Nick Piggin
    --
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

21 Jan, 2010

1 commit

  • In free_unmap_area_noflush(), va->flags is marked as VM_LAZY_FREE first, and
    then vmap_lazy_nr is increased atomically.

    But, in __purge_vmap_area_lazy(), while traversing of vmap_are_list, nr
    is counted by checking VM_LAZY_FREE is set to va->flags. After counting
    the variable nr, kernel reads vmap_lazy_nr atomically and checks a
    BUG_ON condition whether nr is greater than vmap_lazy_nr to prevent
    vmap_lazy_nr from being negative.

    The problem is that, if interrupted right after marking VM_LAZY_FREE,
    increment of vmap_lazy_nr can be delayed. Consequently, BUG_ON
    condition can be met because nr is counted more than vmap_lazy_nr.

    It is highly probable when vmalloc/vfree are called frequently. This
    scenario have been verified by adding delay between marking VM_LAZY_FREE
    and increasing vmap_lazy_nr in free_unmap_area_noflush().

    Even the vmap_lazy_nr is for checking high watermark, it never be the
    strict watermark. Although the BUG_ON condition is to prevent
    vmap_lazy_nr from being negative, vmap_lazy_nr is signed variable. So,
    it could go down to negative value temporarily.

    Consequently, removing the BUG_ON condition is proper.

    A possible BUG_ON message is like the below.

    kernel BUG at mm/vmalloc.c:517!
    invalid opcode: 0000 [#1] SMP
    EIP: 0060:[] EFLAGS: 00010297 CPU: 3
    EIP is at __purge_vmap_area_lazy+0x144/0x150
    EAX: ee8a8818 EBX: c08e77d4 ECX: e7c7ae40 EDX: c08e77ec
    ESI: 000081fe EDI: e7c7ae60 EBP: e7c7ae64 ESP: e7c7ae3c
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    Call Trace:
    [] free_unmap_vmap_area_noflush+0x69/0x70
    [] remove_vm_area+0x22/0x70
    [] __vunmap+0x45/0xe0
    [] vmalloc+0x2c/0x30
    Code: 8d 59 e0 eb 04 66 90 89 cb 89 d0 e8 87 fe ff ff 8b 43 20 89 da 8d 48 e0 8d 43 20 3b 04 24 75 e7 fe 05 a8 a5 a3 c0 e9 78 ff ff ff 0b eb fe 90 8d b4 26 00 00 00 00 56 89 c6 b8 ac a5 a3 c0 31
    EIP: [] __purge_vmap_area_lazy+0x144/0x150 SS:ESP 0068:e7c7ae3c

    [ See also http://marc.info/?l=linux-kernel&m=126335856228090&w=2 ]

    Signed-off-by: Yongseok Koh
    Reviewed-by: Minchan Kim
    Cc: Nick Piggin
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yongseok Koh
     

16 Dec, 2009

1 commit


15 Dec, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
    m68k: rename global variable vmalloc_end to m68k_vmalloc_end
    percpu: add missing per_cpu_ptr_to_phys() definition for UP
    percpu: Fix kdump failure if booted with percpu_alloc=page
    percpu: make misc percpu symbols unique
    percpu: make percpu symbols in ia64 unique
    percpu: make percpu symbols in powerpc unique
    percpu: make percpu symbols in x86 unique
    percpu: make percpu symbols in xen unique
    percpu: make percpu symbols in cpufreq unique
    percpu: make percpu symbols in oprofile unique
    percpu: make percpu symbols in tracer unique
    percpu: make percpu symbols under kernel/ and mm/ unique
    percpu: remove some sparse warnings
    percpu: make alloc_percpu() handle array types
    vmalloc: fix use of non-existent percpu variable in put_cpu_var()
    this_cpu: Use this_cpu_xx in trace_functions_graph.c
    this_cpu: Use this_cpu_xx for ftrace
    this_cpu: Use this_cpu_xx in nmi handling
    this_cpu: Use this_cpu operations in RCU
    this_cpu: Use this_cpu ops for VM statistics
    ...

    Fix up trivial (famous last words) global per-cpu naming conflicts in
    arch/x86/kvm/svm.c
    mm/slab.c

    Linus Torvalds
     

29 Oct, 2009

1 commit


12 Oct, 2009

1 commit


09 Oct, 2009

1 commit


08 Oct, 2009

2 commits

  • When a vmalloc'd area is mmap'd into userspace, some kind of
    co-ordination is necessary for this to work on platforms with cpu
    D-caches which can have aliases.

    Otherwise kernel side writes won't be seen properly in userspace
    and vice versa.

    If the kernel side mapping and the user side one have the same
    alignment, modulo SHMLBA, this can work as long as VM_SHARED is
    shared of VMA and for all current users this is true. VM_SHARED
    will force SHMLBA alignment of the user side mmap on platforms with
    D-cache aliasing matters.

    The bulk of this patch is just making it so that a specific
    alignment can be passed down into __get_vm_area_node(). All
    existing callers pass in '1' which preserves existing behavior.
    vmalloc_user() gives SHMLBA for the alignment.

    As a side effect this should get the video media drivers and other
    vmalloc_user() users into more working shape on such systems.

    Signed-off-by: David S. Miller
    Acked-by: Peter Zijlstra
    Cc: Jens Axboe
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    David Miller
     
  • fix the following 'make includecheck' warning:

    mm/vmalloc.c: linux/highmem.h is included more than once.

    Signed-off-by: Jaswinder Singh Rajput
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jaswinder Singh Rajput
     

23 Sep, 2009

1 commit

  • Some archs define MODULED_VADDR/MODULES_END which is not in VMALLOC area.
    This is handled only in x86-64. This patch make it more generic. And we
    can use vread/vwrite to access the area. Fix it.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Jiri Slaby
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

22 Sep, 2009

4 commits

  • Sizing of memory allocations shouldn't depend on the number of physical
    pages found in a system, as that generally includes (perhaps a huge amount
    of) non-RAM pages. The amount of what actually is usable as storage
    should instead be used as a basis here.

    Some of the calculations (i.e. those not intending to use high memory)
    should likely even use (totalram_pages - totalhigh_pages).

    Signed-off-by: Jan Beulich
    Acked-by: Rusty Russell
    Acked-by: Ingo Molnar
    Cc: Dave Airlie
    Cc: Kyle McMartin
    Cc: Jeremy Fitzhardinge
    Cc: Pekka Enberg
    Cc: Hugh Dickins
    Cc: "David S. Miller"
    Cc: Patrick McHardy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • vread/vwrite access vmalloc area without checking there is a page or not.
    In most case, this works well.

    In old ages, the caller of get_vm_ara() is only IOREMAP and there is no
    memory hole within vm_struct's [addr...addr + size - PAGE_SIZE] (
    -PAGE_SIZE is for a guard page.)

    After per-cpu-alloc patch, it uses get_vm_area() for reserve continuous
    virtual address but remap _later_. There tend to be a hole in valid
    vmalloc area in vm_struct lists. Then, skip the hole (not mapped page) is
    necessary. This patch updates vread/vwrite() for avoiding memory hole.

    Routines which access vmalloc area without knowing for which addr is used
    are
    - /proc/kcore
    - /dev/kmem

    kcore checks IOREMAP, /dev/kmem doesn't. After this patch, IOREMAP is
    checked and /dev/kmem will avoid to read/write it. Fixes to /proc/kcore
    will be in the next patch in series.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: WANG Cong
    Cc: Mike Smith
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • vmap area should be purged after vm_struct is removed from the list
    because vread/vwrite etc...believes the range is valid while it's on
    vm_struct list.

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: WANG Cong
    Cc: Mike Smith
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • There is no need for double error checking.

    Signed-off-by: Figo.zhang
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Figo.zhang
     

14 Aug, 2009

2 commits

  • To directly use spread NUMA memories for percpu units, percpu
    allocator will be updated to allow sparsely mapping units in a chunk.
    As the distances between units can be very large, this makes
    allocating single vmap area for each chunk undesirable. This patch
    implements pcpu_get_vm_areas() and pcpu_free_vm_areas() which
    allocates and frees sparse congruent vmap areas.

    pcpu_get_vm_areas() take @offsets and @sizes array which define
    distances and sizes of vmap areas. It scans down from the top of
    vmalloc area looking for the top-most address which can accomodate all
    the areas. The top-down scan is to avoid interacting with regular
    vmallocs which can push up these congruent areas up little by little
    ending up wasting address space and page table.

    To speed up top-down scan, the highest possible address hint is
    maintained. Although the scan is linear from the hint, given the
    usual large holes between memory addresses between NUMA nodes, the
    scanning is highly likely to finish after finding the first hole for
    the last unit which is scanned first.

    Signed-off-by: Tejun Heo
    Cc: Nick Piggin

    Tejun Heo
     
  • Separate out insert_vmalloc_vm() from __get_vm_area_node().
    insert_vmalloc_vm() initializes vm_struct from vmap_area and inserts
    it into vmlist. insert_vmalloc_vm() only initializes fields which can
    be determined from @vm, @flags and @caller The rest should be
    initialized by the caller. For __get_vm_area_node(), all other fields
    just need to be cleared and this is done by using kzalloc instead of
    kmalloc.

    This will be used to implement pcpu_get_vm_areas().

    Signed-off-by: Tejun Heo
    Cc: Nick Piggin

    Tejun Heo
     

12 Jun, 2009

3 commits

  • * 'for-linus' of git://linux-arm.org/linux-2.6:
    kmemleak: Add the corresponding MAINTAINERS entry
    kmemleak: Simple testing module for kmemleak
    kmemleak: Enable the building of the memory leak detector
    kmemleak: Remove some of the kmemleak false positives
    kmemleak: Add modules support
    kmemleak: Add kmemleak_alloc callback from alloc_large_system_hash
    kmemleak: Add the vmalloc memory allocation/freeing hooks
    kmemleak: Add the slub memory allocation/freeing hooks
    kmemleak: Add the slob memory allocation/freeing hooks
    kmemleak: Add the slab memory allocation/freeing hooks
    kmemleak: Add documentation on the memory leak detector
    kmemleak: Add the base support

    Manual conflict resolution (with the slab/earlyboot changes) in:
    drivers/char/vt.c
    init/main.c
    mm/slab.c

    Linus Torvalds
     
  • We can call vmalloc_init() after kmem_cache_init() and use kzalloc() instead of
    the bootmem allocator when initializing vmalloc data structures.

    Acked-by: Johannes Weiner
    Acked-by: Linus Torvalds
    Acked-by: Nick Piggin
    Cc: Ingo Molnar
    Cc: Yinghai Lu
    Signed-off-by: Pekka Enberg

    Pekka Enberg
     
  • This patch adds the callbacks to kmemleak_(alloc|free) functions from
    vmalloc/vfree.

    Signed-off-by: Catalin Marinas

    Catalin Marinas
     

07 May, 2009

1 commit

  • If alloc_vmap_area() fails the allocated struct vmap_area has to be freed.

    Signed-off-by: Ralph Wuerthner
    Reviewed-by: Christoph Lameter
    Reviewed-by: Minchan Kim
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ralph Wuerthner
     

01 Apr, 2009

1 commit

  • vmap's dirty_list is unused. It's for optimizing flushing. but Nick
    didn't write the code yet. so, we don't need it until time as it is
    needed.

    This patch removes vmap_block's dirty_list and codes related to it.

    Signed-off-by: MinChan Kim
    Acked-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    MinChan Kim
     

04 Mar, 2009

1 commit


01 Mar, 2009

1 commit


28 Feb, 2009

2 commits

  • I just got this new warning from kmemcheck:

    WARNING: kmemcheck: Caught 32-bit read from freed memory (c7806a60)
    a06a80c7ecde70c1a04080c700000000a06709c1000000000000000000000000
    f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f
    ^

    Pid: 0, comm: swapper Not tainted (2.6.29-rc4 #230)
    EIP: 0060:[] EFLAGS: 00000286 CPU: 0
    EIP is at __purge_vmap_area_lazy+0x117/0x140
    EAX: 00070f43 EBX: c7806a40 ECX: c1677080 EDX: 00027b66
    ESI: 00002001 EDI: c170df0c EBP: c170df00 ESP: c178830c
    DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    CR0: 80050033 CR2: c7806b14 CR3: 01775000 CR4: 00000690
    DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    DR6: 00004000 DR7: 00000000
    [] free_unmap_vmap_area_noflush+0x6e/0x70
    [] remove_vm_area+0x2a/0x70
    [] __vunmap+0x45/0xe0
    [] vunmap+0x1e/0x30
    [] text_poke+0x95/0x150
    [] alternatives_smp_unlock+0x49/0x60
    [] alternative_instructions+0x11b/0x124
    [] check_bugs+0xbd/0xdc
    [] start_kernel+0x2ed/0x360
    [] __init_begin+0x9e/0xa9
    [] 0xffffffff

    It happened here:

    $ addr2line -e vmlinux -i c1096df7
    mm/vmalloc.c:540

    Code:

    list_for_each_entry(va, &valist, purge_list)
    __free_vmap_area(va);

    It's this instruction:

    mov 0x20(%ebx),%edx

    Which corresponds to a dereference of va->purge_list.next:

    (gdb) p ((struct vmap_area *) 0)->purge_list.next
    Cannot access memory at address 0x20

    It seems that we should use "safe" list traversal here, as the element
    is freed inside the loop. Please verify that this is the right fix.

    Acked-by: Nick Piggin
    Signed-off-by: Vegard Nossum
    Cc: Pekka Enberg
    Cc: Ingo Molnar
    Cc: "Paul E. McKenney"
    Cc: [2.6.28.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vegard Nossum
     
  • The new vmap allocator can wrap the address and get confused in the case
    of large allocations or VMALLOC_END near the end of address space.

    Problem reported by Christoph Hellwig on a 32-bit XFS workload.

    Signed-off-by: Nick Piggin
    Reported-by: Christoph Hellwig
    Cc: [2.6.28.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

26 Feb, 2009

1 commit


25 Feb, 2009

2 commits


24 Feb, 2009

1 commit

  • Impact: allow larger alignment for early vmalloc area allocation

    Some early vmalloc users might want larger alignment, for example, for
    custom large page mapping. Add @align to vm_area_register_early().
    While at it, drop docbook comment on non-existent @size.

    Signed-off-by: Tejun Heo
    Cc: Nick Piggin
    Cc: Ivan Kokshaysky

    Tejun Heo
     

21 Feb, 2009

1 commit


20 Feb, 2009

3 commits

  • Impact: two more public map/unmap functions

    Implement map_kernel_range_noflush() and unmap_kernel_range_noflush().
    These functions respectively map and unmap address range in kernel VM
    area but doesn't do any vcache or tlb flushing. These will be used by
    new percpu allocator.

    Signed-off-by: Tejun Heo
    Cc: Nick Piggin

    Tejun Heo
     
  • Impact: allow multiple early vm areas

    There are places where kernel VM area needs to be allocated before
    vmalloc is initialized. This is done by allocating static vm_struct,
    initializing several fields and linking it to vmlist and later vmalloc
    initialization picking up these from vmlist. This is currently done
    manually and if there's more than one such areas, there's no defined
    way to arbitrate who gets which address.

    This patch implements vm_area_register_early(), which takes vm_area
    struct with flags and size initialized, assigns address to it and puts
    it on the vmlist. This way, multiple early vm areas can determine
    which addresses they should use. The only current user - alpha mm
    init - is converted to use it.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Impact: proper vcache flush on unmap_kernel_range()

    flush_cache_vunmap() should be called before pages are unmapped. Add
    a call to it in unmap_kernel_range().

    Signed-off-by: Tejun Heo

    Tejun Heo
     

19 Feb, 2009

1 commit

  • We have get_vm_area_caller() and __get_vm_area() but not
    __get_vm_area_caller()

    On powerpc, I use __get_vm_area() to separate the ranges of addresses
    given to vmalloc vs. ioremap (various good reasons for that) so in order
    to be able to implement the new caller tracking in /proc/vmallocinfo, I
    need a "_caller" variant of it.

    (akpm: needed for ongoing powerpc development, so merge it early)

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Benjamin Herrenschmidt
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt