23 Mar, 2011

2 commits

  • KM_USER1 is never used for vwrite() path so the caller doesn't need to
    guarantee it is not used. Only the caller should guarantee is KM_USER0
    and it is commented already.

    Signed-off-by: Namhyung Kim
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namhyung Kim
     
  • Provide a free area cache for the vmalloc virtual address allocator, based
    on the algorithm used by the user virtual memory allocator.

    This reduces the number of rbtree operations and linear traversals over
    the vmap extents in order to find a free area, by starting off at the last
    point that a free area was found.

    The free area cache is reset if areas are freed behind it, or if we are
    searching for a smaller area or alignment than last time. So allocation
    patterns are not changed (verified by corner-case and random test cases in
    userspace testing).

    This solves a regression caused by lazy vunmap TLB purging introduced in
    db64fe02 (mm: rewrite vmap layer). That patch will leave extents in the
    vmap allocator after they are vunmapped, and until a significant number
    accumulate that can be flushed in a single batch. So in a workload that
    vmalloc/vfree frequently, a chain of extents will build up from
    VMALLOC_START address, which have to be iterated over each time (giving an
    O(n) type of behaviour).

    After this patch, the search will start from where it left off, giving
    closer to an amortized O(1).

    This is verified to solve regressions reported Steven in GFS2, and Avi in
    KVM.

    Hugh's update:

    : I tried out the recent mmotm, and on one machine was fortunate to hit
    : the BUG_ON(first->va_start < addr) which seems to have been stalling
    : your vmap area cache patch ever since May.

    : I can get you addresses etc, I did dump a few out; but once I stared
    : at them, it was easier just to look at the code: and I cannot see how
    : you would be so sure that first->va_start < addr, once you've done
    : that addr = ALIGN(max(...), align) above, if align is over 0x1000
    : (align was 0x8000 or 0x4000 in the cases I hit: ioremaps like Steve).

    : I originally got around it by just changing the
    : if (first->va_start < addr) {
    : to
    : while (first->va_start < addr) {
    : without thinking about it any further; but that seemed unsatisfactory,
    : why would we want to loop here when we've got another very similar
    : loop just below it?

    : I am never going to admit how long I've spent trying to grasp your
    : "while (n)" rbtree loop just above this, the one with the peculiar
    : if (!first && tmp->va_start < addr + size)
    : in. That's unfamiliar to me, I'm guessing it's designed to save a
    : subsequent rb_next() in a few circumstances (at risk of then setting
    : a wrong cached_hole_size?); but they did appear few to me, and I didn't
    : feel I could sign off something with that in when I don't grasp it,
    : and it seems responsible for extra code and mistaken BUG_ON below it.

    : I've reverted to the familiar rbtree loop that find_vma() does (but
    : with va_end >= addr as you had, to respect the additional guard page):
    : and then (given that cached_hole_size starts out 0) I don't see the
    : need for any complications below it. If you do want to keep that loop
    : as you had it, please add a comment to explain what it's trying to do,
    : and where addr is relative to first when you emerge from it.

    : Aren't your tests "size first->va_start" forgetting the guard page we want
    : before the next area? I've changed those.

    : I have not changed your many "addr + size - 1 < addr" overflow tests,
    : but have since come to wonder, shouldn't they be "addr + size < addr"
    : tests - won't the vend checks go wrong if addr + size is 0?

    : I have added a few comments - Wolfgang Wander's 2.6.13 description of
    : 1363c3cd8603a913a27e2995dccbd70d5312d8e6 Avoiding mmap fragmentation
    : helped me a lot, perhaps a pointer to that would be good too. And I found
    : it easier to understand when I renamed cached_start slightly and moved the
    : overflow label down.

    : This patch would go after your mm-vmap-area-cache.patch in mmotm.
    : Trivially, nobody is going to get that BUG_ON with this patch, and it
    : appears to work fine on my machines; but I have not given it anything like
    : the testing you did on your original, and may have broken all the
    : performance you were aiming for. Please take a look and test it out
    : integrate with yours if you're satisfied - thanks.

    [akpm@linux-foundation.org: add locking comment]
    Signed-off-by: Nick Piggin
    Signed-off-by: Hugh Dickins
    Reviewed-by: Minchan Kim
    Reported-and-tested-by: Steven Whitehouse
    Reported-and-tested-by: Avi Kivity
    Tested-by: "Barry J. Marson"
    Cc: Prarit Bhargava
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

14 Jan, 2011

6 commits

  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (59 commits)
    ACPI / PM: Fix build problems for !CONFIG_ACPI related to NVS rework
    ACPI: fix resource check message
    ACPI / Battery: Update information on info notification and resume
    ACPI: Drop device flag wake_capable
    ACPI: Always check if _PRW is present before trying to evaluate it
    ACPI / PM: Check status of power resources under mutexes
    ACPI / PM: Rename acpi_power_off_device()
    ACPI / PM: Drop acpi_power_nocheck
    ACPI / PM: Drop acpi_bus_get_power()
    Platform / x86: Make fujitsu_laptop use acpi_bus_update_power()
    ACPI / Fan: Rework the handling of power resources
    ACPI / PM: Register power resource devices as soon as they are needed
    ACPI / PM: Register acpi_power_driver early
    ACPI / PM: Add function for updating device power state consistently
    ACPI / PM: Add function for device power state initialization
    ACPI / PM: Introduce __acpi_bus_get_power()
    ACPI / PM: Introduce function for refcounting device power resources
    ACPI / PM: Add functions for manipulating lists of power resources
    ACPI / PM: Prevent acpi_power_get_inferred_state() from making changes
    ACPICA: Update version to 20101209
    ...

    Linus Torvalds
     
  • IS_ERR() already implies unlikely(), so it can be omitted here.

    Signed-off-by: Tobias Klauser
    Reviewed-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tobias Klauser
     
  • Four architectures (arm, mips, sparc, x86) use __vmalloc_area() for
    module_init(). Much of the code is duplicated and can be generalized in a
    globally accessible function, __vmalloc_node_range().

    __vmalloc_node() now calls into __vmalloc_node_range() with a range of
    [VMALLOC_START, VMALLOC_END) for functionally equivalent behavior.

    Each architecture may then use __vmalloc_node_range() directly to remove
    the duplication of code.

    Signed-off-by: David Rientjes
    Cc: Christoph Lameter
    Cc: Russell King
    Cc: Ralf Baechle
    Cc: "David S. Miller"
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • pcpu_get_vm_areas() only uses GFP_KERNEL allocations, so remove the gfp_t
    formal and use the mask internally.

    Signed-off-by: David Rientjes
    Cc: Christoph Lameter
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • get_vm_area_node() is unused in the kernel and can thus be removed.

    Signed-off-by: David Rientjes
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Signed-off-by: Joe Perches
    Acked-by: Pekka Enberg
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

12 Jan, 2011

1 commit

  • Generic Hardware Error Source provides a way to report platform
    hardware errors (such as that from chipset). It works in so called
    "Firmware First" mode, that is, hardware errors are reported to
    firmware firstly, then reported to Linux by firmware. This way, some
    non-standard hardware error registers or non-standard hardware link
    can be checked by firmware to produce more valuable hardware error
    information for Linux.

    This patch adds POLL/IRQ/NMI notification types support.

    Because the memory area used to transfer hardware error information
    from BIOS to Linux can be determined only in NMI, IRQ or timer
    handler, but general ioremap can not be used in atomic context, so a
    special version of atomic ioremap is implemented for that.

    Known issue:

    - Error information can not be printed for recoverable errors notified
    via NMI, because printk is not NMI-safe. Will fix this via delay
    printing to IRQ context via irq_work or make printk NMI-safe.

    v2:

    - adjust printk format per comments.

    Signed-off-by: Huang Ying
    Reviewed-by: Andi Kleen
    Signed-off-by: Len Brown

    Huang Ying
     

03 Dec, 2010

1 commit

  • On stock 2.6.37-rc4, running:

    # mount lilith:/export /mnt/lilith
    # find /mnt/lilith/ -type f -print0 | xargs -0 file

    crashes the machine fairly quickly under Xen. Often it results in oops
    messages, but the couple of times I tried just now, it just hung quietly
    and made Xen print some rude messages:

    (XEN) mm.c:2389:d80 Bad type (saw 7400000000000001 != exp
    3000000000000000) for mfn 1d7058 (pfn 18fa7)
    (XEN) mm.c:964:d80 Attempt to create linear p.t. with write perms
    (XEN) mm.c:2389:d80 Bad type (saw 7400000000000010 != exp
    1000000000000000) for mfn 1d2e04 (pfn 1d1fb)
    (XEN) mm.c:2965:d80 Error while pinning mfn 1d2e04

    Which means the domain tried to map a pagetable page RW, which would
    allow it to map arbitrary memory, so Xen stopped it. This is because
    vm_unmap_ram() left some pages mapped in the vmalloc area after NFS had
    finished with them, and those pages got recycled as pagetable pages
    while still having these RW aliases.

    Removing those mappings immediately removes the Xen-visible aliases, and
    so it has no problem with those pages being reused as pagetable pages.
    Deferring the TLB flush doesn't upset Xen because it can flush the TLB
    itself as needed to maintain its invariants.

    When unmapping a region in the vmalloc space, clear the ptes
    immediately. There's no point in deferring this because there's no
    amortization benefit.

    The TLBs are left dirty, and they are flushed lazily to amortize the
    cost of the IPIs.

    This specific motivation for this patch is an oops-causing regression
    since 2.6.36 when using NFS under Xen, triggered by the NFS client's use
    of vm_map_ram() introduced in 56e4ebf877b60 ("NFS: readdir with vmapped
    pages") . XFS also uses vm_map_ram() and could cause similar problems.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: Nick Piggin
    Cc: Bryan Schumaker
    Cc: Trond Myklebust
    Cc: Alex Elder
    Cc: Dave Chinner
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeremy Fitzhardinge
     

27 Oct, 2010

3 commits


23 Oct, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
    percpu: update comments to reflect that percpu allocations are always zero-filled
    percpu: Optimize __get_cpu_var()
    x86, percpu: Optimize this_cpu_ptr
    percpu: clear memory allocated with the km allocator
    percpu: fix build breakage on s390 and cleanup build configuration tests
    percpu: use percpu allocator on UP too
    percpu: reduce PCPU_MIN_UNIT_SIZE to 32k
    vmalloc: pcpu_get/free_vm_areas() aren't needed on UP

    Fixed up trivial conflicts in include/linux/percpu.h

    Linus Torvalds
     

17 Sep, 2010

1 commit

  • During the reading of /proc/vmcore the kernel is doing
    ioremap()/iounmap() repeatedly. And the buildup of un-flushed
    vm_area_struct's is causing a great deal of overhead. (rb_next()
    is chewing up most of that time).

    This solution is to provide function set_iounmap_nonlazy(). It
    causes a subsequent call to iounmap() to immediately purge the
    vma area (with try_purge_vmap_area_lazy()).

    With this patch we have seen the time for writing a 250MB
    compressed dump drop from 71 seconds to 44 seconds.

    Signed-off-by: Cliff Wickman
    Cc: Andrew Morton
    Cc: kexec@lists.infradead.org
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Cliff Wickman
     

08 Sep, 2010

1 commit


13 Aug, 2010

1 commit

  • * 'stable/xen-swiotlb-0.8.6' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    x86: Detect whether we should use Xen SWIOTLB.
    pci-swiotlb-xen: Add glue code to setup dma_ops utilizing xen_swiotlb_* functions.
    swiotlb-xen: SWIOTLB library for Xen PV guest with PCI passthrough.
    xen/mmu: inhibit vmap aliases rather than trying to clear them out
    vmap: add flag to allow lazy unmap to be disabled at runtime
    xen: Add xen_create_contiguous_region
    xen: Rename the balloon lock
    xen: Allow unprivileged Xen domains to create iomap pages
    xen: use _PAGE_IOMAP in ioremap to do machine mappings

    Fix up trivial conflicts (adding both xen swiotlb and xen pci platform
    driver setup close to each other) in drivers/xen/{Kconfig,Makefile} and
    include/xen/xen-ops.h

    Linus Torvalds
     

10 Aug, 2010

2 commits

  • kmalloc() may fail, if so return -ENOMEM.

    Signed-off-by: Kulikov Vasiliy
    Acked-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kulikov Vasiliy
     
  • Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more
    clear what is the purpose of the operation, which otherwise looks like a
    no-op.

    The semantic patch that makes this change is as follows:
    (http://coccinelle.lip6.fr/)

    //
    @@
    type T;
    T x;
    identifier f;
    @@

    T f (...) { }

    @@
    expression x;
    @@

    - ERR_PTR(PTR_ERR(x))
    + ERR_CAST(x)
    //

    Signed-off-by: Julia Lawall
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Julia Lawall
     

27 Jul, 2010

1 commit


10 Jul, 2010

1 commit

  • Current x86 ioremap() doesn't handle physical address higher than
    32-bit properly in X86_32 PAE mode. When physical address higher than
    32-bit is passed to ioremap(), higher 32-bits in physical address is
    cleared wrongly. Due to this bug, ioremap() can map wrong address to
    linear address space.

    In my case, 64-bit MMIO region was assigned to a PCI device (ioat
    device) on my system. Because of the ioremap()'s bug, wrong physical
    address (instead of MMIO region) was mapped to linear address space.
    Because of this, loading ioatdma driver caused unexpected behavior
    (kernel panic, kernel hangup, ...).

    Signed-off-by: Kenji Kaneshige
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Kenji Kaneshige
     

03 Feb, 2010

2 commits

  • Improve handling of fragmented per-CPU vmaps. We previously don't free
    up per-CPU maps until all its addresses have been used and freed. So
    fragmented blocks could fill up vmalloc space even if they actually had
    no active vmap regions within them.

    Add some logic to allow all CPUs to have these blocks purged in the case
    of failure to allocate a new vm area, and also put some logic to trim
    such blocks of a current CPU if we hit them in the allocation path (so
    as to avoid a large build up of them).

    Christoph reported some vmap allocation failures when using the per CPU
    vmap APIs in XFS, which cannot be reproduced after this patch and the
    previous bug fix.

    Cc: linux-mm@kvack.org
    Cc: stable@kernel.org
    Tested-by: Christoph Hellwig
    Signed-off-by: Nick Piggin
    --
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • RCU list walking of the per-cpu vmap cache was broken. It did not use
    RCU primitives, and also the union of free_list and rcu_head is
    obviously wrong (because free_list is indeed the list we are RCU
    walking).

    While we are there, remove a couple of unused fields from an earlier
    iteration.

    These APIs aren't actually used anywhere, because of problems with the
    XFS conversion. Christoph has now verified that the problems are solved
    with these patches. Also it is an exported interface, so I think it
    will be good to be merged now (and Christoph wants to get the XFS
    changes into their local tree).

    Cc: stable@kernel.org
    Cc: linux-mm@kvack.org
    Tested-by: Christoph Hellwig
    Signed-off-by: Nick Piggin
    --
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

21 Jan, 2010

1 commit

  • In free_unmap_area_noflush(), va->flags is marked as VM_LAZY_FREE first, and
    then vmap_lazy_nr is increased atomically.

    But, in __purge_vmap_area_lazy(), while traversing of vmap_are_list, nr
    is counted by checking VM_LAZY_FREE is set to va->flags. After counting
    the variable nr, kernel reads vmap_lazy_nr atomically and checks a
    BUG_ON condition whether nr is greater than vmap_lazy_nr to prevent
    vmap_lazy_nr from being negative.

    The problem is that, if interrupted right after marking VM_LAZY_FREE,
    increment of vmap_lazy_nr can be delayed. Consequently, BUG_ON
    condition can be met because nr is counted more than vmap_lazy_nr.

    It is highly probable when vmalloc/vfree are called frequently. This
    scenario have been verified by adding delay between marking VM_LAZY_FREE
    and increasing vmap_lazy_nr in free_unmap_area_noflush().

    Even the vmap_lazy_nr is for checking high watermark, it never be the
    strict watermark. Although the BUG_ON condition is to prevent
    vmap_lazy_nr from being negative, vmap_lazy_nr is signed variable. So,
    it could go down to negative value temporarily.

    Consequently, removing the BUG_ON condition is proper.

    A possible BUG_ON message is like the below.

    kernel BUG at mm/vmalloc.c:517!
    invalid opcode: 0000 [#1] SMP
    EIP: 0060:[] EFLAGS: 00010297 CPU: 3
    EIP is at __purge_vmap_area_lazy+0x144/0x150
    EAX: ee8a8818 EBX: c08e77d4 ECX: e7c7ae40 EDX: c08e77ec
    ESI: 000081fe EDI: e7c7ae60 EBP: e7c7ae64 ESP: e7c7ae3c
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    Call Trace:
    [] free_unmap_vmap_area_noflush+0x69/0x70
    [] remove_vm_area+0x22/0x70
    [] __vunmap+0x45/0xe0
    [] vmalloc+0x2c/0x30
    Code: 8d 59 e0 eb 04 66 90 89 cb 89 d0 e8 87 fe ff ff 8b 43 20 89 da 8d 48 e0 8d 43 20 3b 04 24 75 e7 fe 05 a8 a5 a3 c0 e9 78 ff ff ff 0b eb fe 90 8d b4 26 00 00 00 00 56 89 c6 b8 ac a5 a3 c0 31
    EIP: [] __purge_vmap_area_lazy+0x144/0x150 SS:ESP 0068:e7c7ae3c

    [ See also http://marc.info/?l=linux-kernel&m=126335856228090&w=2 ]

    Signed-off-by: Yongseok Koh
    Reviewed-by: Minchan Kim
    Cc: Nick Piggin
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yongseok Koh
     

16 Dec, 2009

1 commit


15 Dec, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
    m68k: rename global variable vmalloc_end to m68k_vmalloc_end
    percpu: add missing per_cpu_ptr_to_phys() definition for UP
    percpu: Fix kdump failure if booted with percpu_alloc=page
    percpu: make misc percpu symbols unique
    percpu: make percpu symbols in ia64 unique
    percpu: make percpu symbols in powerpc unique
    percpu: make percpu symbols in x86 unique
    percpu: make percpu symbols in xen unique
    percpu: make percpu symbols in cpufreq unique
    percpu: make percpu symbols in oprofile unique
    percpu: make percpu symbols in tracer unique
    percpu: make percpu symbols under kernel/ and mm/ unique
    percpu: remove some sparse warnings
    percpu: make alloc_percpu() handle array types
    vmalloc: fix use of non-existent percpu variable in put_cpu_var()
    this_cpu: Use this_cpu_xx in trace_functions_graph.c
    this_cpu: Use this_cpu_xx for ftrace
    this_cpu: Use this_cpu_xx in nmi handling
    this_cpu: Use this_cpu operations in RCU
    this_cpu: Use this_cpu ops for VM statistics
    ...

    Fix up trivial (famous last words) global per-cpu naming conflicts in
    arch/x86/kvm/svm.c
    mm/slab.c

    Linus Torvalds
     

29 Oct, 2009

1 commit


12 Oct, 2009

1 commit


09 Oct, 2009

1 commit


08 Oct, 2009

2 commits

  • When a vmalloc'd area is mmap'd into userspace, some kind of
    co-ordination is necessary for this to work on platforms with cpu
    D-caches which can have aliases.

    Otherwise kernel side writes won't be seen properly in userspace
    and vice versa.

    If the kernel side mapping and the user side one have the same
    alignment, modulo SHMLBA, this can work as long as VM_SHARED is
    shared of VMA and for all current users this is true. VM_SHARED
    will force SHMLBA alignment of the user side mmap on platforms with
    D-cache aliasing matters.

    The bulk of this patch is just making it so that a specific
    alignment can be passed down into __get_vm_area_node(). All
    existing callers pass in '1' which preserves existing behavior.
    vmalloc_user() gives SHMLBA for the alignment.

    As a side effect this should get the video media drivers and other
    vmalloc_user() users into more working shape on such systems.

    Signed-off-by: David S. Miller
    Acked-by: Peter Zijlstra
    Cc: Jens Axboe
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    David Miller
     
  • fix the following 'make includecheck' warning:

    mm/vmalloc.c: linux/highmem.h is included more than once.

    Signed-off-by: Jaswinder Singh Rajput
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jaswinder Singh Rajput
     

23 Sep, 2009

1 commit

  • Some archs define MODULED_VADDR/MODULES_END which is not in VMALLOC area.
    This is handled only in x86-64. This patch make it more generic. And we
    can use vread/vwrite to access the area. Fix it.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Jiri Slaby
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

22 Sep, 2009

4 commits

  • Sizing of memory allocations shouldn't depend on the number of physical
    pages found in a system, as that generally includes (perhaps a huge amount
    of) non-RAM pages. The amount of what actually is usable as storage
    should instead be used as a basis here.

    Some of the calculations (i.e. those not intending to use high memory)
    should likely even use (totalram_pages - totalhigh_pages).

    Signed-off-by: Jan Beulich
    Acked-by: Rusty Russell
    Acked-by: Ingo Molnar
    Cc: Dave Airlie
    Cc: Kyle McMartin
    Cc: Jeremy Fitzhardinge
    Cc: Pekka Enberg
    Cc: Hugh Dickins
    Cc: "David S. Miller"
    Cc: Patrick McHardy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • vread/vwrite access vmalloc area without checking there is a page or not.
    In most case, this works well.

    In old ages, the caller of get_vm_ara() is only IOREMAP and there is no
    memory hole within vm_struct's [addr...addr + size - PAGE_SIZE] (
    -PAGE_SIZE is for a guard page.)

    After per-cpu-alloc patch, it uses get_vm_area() for reserve continuous
    virtual address but remap _later_. There tend to be a hole in valid
    vmalloc area in vm_struct lists. Then, skip the hole (not mapped page) is
    necessary. This patch updates vread/vwrite() for avoiding memory hole.

    Routines which access vmalloc area without knowing for which addr is used
    are
    - /proc/kcore
    - /dev/kmem

    kcore checks IOREMAP, /dev/kmem doesn't. After this patch, IOREMAP is
    checked and /dev/kmem will avoid to read/write it. Fixes to /proc/kcore
    will be in the next patch in series.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: WANG Cong
    Cc: Mike Smith
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • vmap area should be purged after vm_struct is removed from the list
    because vread/vwrite etc...believes the range is valid while it's on
    vm_struct list.

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: WANG Cong
    Cc: Mike Smith
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • There is no need for double error checking.

    Signed-off-by: Figo.zhang
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Figo.zhang
     

14 Aug, 2009

2 commits

  • To directly use spread NUMA memories for percpu units, percpu
    allocator will be updated to allow sparsely mapping units in a chunk.
    As the distances between units can be very large, this makes
    allocating single vmap area for each chunk undesirable. This patch
    implements pcpu_get_vm_areas() and pcpu_free_vm_areas() which
    allocates and frees sparse congruent vmap areas.

    pcpu_get_vm_areas() take @offsets and @sizes array which define
    distances and sizes of vmap areas. It scans down from the top of
    vmalloc area looking for the top-most address which can accomodate all
    the areas. The top-down scan is to avoid interacting with regular
    vmallocs which can push up these congruent areas up little by little
    ending up wasting address space and page table.

    To speed up top-down scan, the highest possible address hint is
    maintained. Although the scan is linear from the hint, given the
    usual large holes between memory addresses between NUMA nodes, the
    scanning is highly likely to finish after finding the first hole for
    the last unit which is scanned first.

    Signed-off-by: Tejun Heo
    Cc: Nick Piggin

    Tejun Heo
     
  • Separate out insert_vmalloc_vm() from __get_vm_area_node().
    insert_vmalloc_vm() initializes vm_struct from vmap_area and inserts
    it into vmlist. insert_vmalloc_vm() only initializes fields which can
    be determined from @vm, @flags and @caller The rest should be
    initialized by the caller. For __get_vm_area_node(), all other fields
    just need to be cleared and this is done by using kzalloc instead of
    kmalloc.

    This will be used to implement pcpu_get_vm_areas().

    Signed-off-by: Tejun Heo
    Cc: Nick Piggin

    Tejun Heo
     

12 Jun, 2009

2 commits

  • * 'for-linus' of git://linux-arm.org/linux-2.6:
    kmemleak: Add the corresponding MAINTAINERS entry
    kmemleak: Simple testing module for kmemleak
    kmemleak: Enable the building of the memory leak detector
    kmemleak: Remove some of the kmemleak false positives
    kmemleak: Add modules support
    kmemleak: Add kmemleak_alloc callback from alloc_large_system_hash
    kmemleak: Add the vmalloc memory allocation/freeing hooks
    kmemleak: Add the slub memory allocation/freeing hooks
    kmemleak: Add the slob memory allocation/freeing hooks
    kmemleak: Add the slab memory allocation/freeing hooks
    kmemleak: Add documentation on the memory leak detector
    kmemleak: Add the base support

    Manual conflict resolution (with the slab/earlyboot changes) in:
    drivers/char/vt.c
    init/main.c
    mm/slab.c

    Linus Torvalds
     
  • We can call vmalloc_init() after kmem_cache_init() and use kzalloc() instead of
    the bootmem allocator when initializing vmalloc data structures.

    Acked-by: Johannes Weiner
    Acked-by: Linus Torvalds
    Acked-by: Nick Piggin
    Cc: Ingo Molnar
    Cc: Yinghai Lu
    Signed-off-by: Pekka Enberg

    Pekka Enberg