13 Aug, 2010

1 commit

  • * 'stable/xen-swiotlb-0.8.6' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    x86: Detect whether we should use Xen SWIOTLB.
    pci-swiotlb-xen: Add glue code to setup dma_ops utilizing xen_swiotlb_* functions.
    swiotlb-xen: SWIOTLB library for Xen PV guest with PCI passthrough.
    xen/mmu: inhibit vmap aliases rather than trying to clear them out
    vmap: add flag to allow lazy unmap to be disabled at runtime
    xen: Add xen_create_contiguous_region
    xen: Rename the balloon lock
    xen: Allow unprivileged Xen domains to create iomap pages
    xen: use _PAGE_IOMAP in ioremap to do machine mappings

    Fix up trivial conflicts (adding both xen swiotlb and xen pci platform
    driver setup close to each other) in drivers/xen/{Kconfig,Makefile} and
    include/xen/xen-ops.h

    Linus Torvalds
     

27 Jul, 2010

1 commit


10 Jul, 2010

1 commit

  • Current x86 ioremap() doesn't handle physical address higher than
    32-bit properly in X86_32 PAE mode. When physical address higher than
    32-bit is passed to ioremap(), higher 32-bits in physical address is
    cleared wrongly. Due to this bug, ioremap() can map wrong address to
    linear address space.

    In my case, 64-bit MMIO region was assigned to a PCI device (ioat
    device) on my system. Because of the ioremap()'s bug, wrong physical
    address (instead of MMIO region) was mapped to linear address space.
    Because of this, loading ioatdma driver caused unexpected behavior
    (kernel panic, kernel hangup, ...).

    Signed-off-by: Kenji Kaneshige
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Kenji Kaneshige
     

14 Aug, 2009

1 commit

  • To directly use spread NUMA memories for percpu units, percpu
    allocator will be updated to allow sparsely mapping units in a chunk.
    As the distances between units can be very large, this makes
    allocating single vmap area for each chunk undesirable. This patch
    implements pcpu_get_vm_areas() and pcpu_free_vm_areas() which
    allocates and frees sparse congruent vmap areas.

    pcpu_get_vm_areas() take @offsets and @sizes array which define
    distances and sizes of vmap areas. It scans down from the top of
    vmalloc area looking for the top-most address which can accomodate all
    the areas. The top-down scan is to avoid interacting with regular
    vmallocs which can push up these congruent areas up little by little
    ending up wasting address space and page table.

    To speed up top-down scan, the highest possible address hint is
    maintained. Although the scan is linear from the hint, given the
    usual large holes between memory addresses between NUMA nodes, the
    scanning is highly likely to finish after finding the first hole for
    the last unit which is scanned first.

    Signed-off-by: Tejun Heo
    Cc: Nick Piggin

    Tejun Heo
     

25 Feb, 2009

1 commit


24 Feb, 2009

1 commit

  • Impact: allow larger alignment for early vmalloc area allocation

    Some early vmalloc users might want larger alignment, for example, for
    custom large page mapping. Add @align to vm_area_register_early().
    While at it, drop docbook comment on non-existent @size.

    Signed-off-by: Tejun Heo
    Cc: Nick Piggin
    Cc: Ivan Kokshaysky

    Tejun Heo
     

20 Feb, 2009

2 commits

  • Impact: two more public map/unmap functions

    Implement map_kernel_range_noflush() and unmap_kernel_range_noflush().
    These functions respectively map and unmap address range in kernel VM
    area but doesn't do any vcache or tlb flushing. These will be used by
    new percpu allocator.

    Signed-off-by: Tejun Heo
    Cc: Nick Piggin

    Tejun Heo
     
  • Impact: allow multiple early vm areas

    There are places where kernel VM area needs to be allocated before
    vmalloc is initialized. This is done by allocating static vm_struct,
    initializing several fields and linking it to vmlist and later vmalloc
    initialization picking up these from vmlist. This is currently done
    manually and if there's more than one such areas, there's no defined
    way to arbitrate who gets which address.

    This patch implements vm_area_register_early(), which takes vm_area
    struct with flags and size initialized, assigns address to it and puts
    it on the vmlist. This way, multiple early vm areas can determine
    which addresses they should use. The only current user - alpha mm
    init - is converted to use it.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

19 Feb, 2009

1 commit

  • We have get_vm_area_caller() and __get_vm_area() but not
    __get_vm_area_caller()

    On powerpc, I use __get_vm_area() to separate the ranges of addresses
    given to vmalloc vs. ioremap (various good reasons for that) so in order
    to be able to implement the new caller tracking in /proc/vmallocinfo, I
    need a "_caller" variant of it.

    (akpm: needed for ongoing powerpc development, so merge it early)

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Benjamin Herrenschmidt
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     

07 Jan, 2009

1 commit

  • Sparse output following warnings.

    mm/vmalloc.c:1436:6: warning: symbol 'vread' was not declared. Should it be static?
    mm/vmalloc.c:1474:6: warning: symbol 'vwrite' was not declared. Should it be static?

    However, it is used by /dev/kmem. fixed here.

    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

23 Oct, 2008

1 commit


20 Oct, 2008

1 commit

  • Rewrite the vmap allocator to use rbtrees and lazy tlb flushing, and
    provide a fast, scalable percpu frontend for small vmaps (requires a
    slightly different API, though).

    The biggest problem with vmap is actually vunmap. Presently this requires
    a global kernel TLB flush, which on most architectures is a broadcast IPI
    to all CPUs to flush the cache. This is all done under a global lock. As
    the number of CPUs increases, so will the number of vunmaps a scaled
    workload will want to perform, and so will the cost of a global TLB flush.
    This gives terrible quadratic scalability characteristics.

    Another problem is that the entire vmap subsystem works under a single
    lock. It is a rwlock, but it is actually taken for write in all the fast
    paths, and the read locking would likely never be run concurrently anyway,
    so it's just pointless.

    This is a rewrite of vmap subsystem to solve those problems. The existing
    vmalloc API is implemented on top of the rewritten subsystem.

    The TLB flushing problem is solved by using lazy TLB unmapping. vmap
    addresses do not have to be flushed immediately when they are vunmapped,
    because the kernel will not reuse them again (would be a use-after-free)
    until they are reallocated. So the addresses aren't allocated again until
    a subsequent TLB flush. A single TLB flush then can flush multiple
    vunmaps from each CPU.

    XEN and PAT and such do not like deferred TLB flushing because they can't
    always handle multiple aliasing virtual addresses to a physical address.
    They now call vm_unmap_aliases() in order to flush any deferred mappings.
    That call is very expensive (well, actually not a lot more expensive than
    a single vunmap under the old scheme), however it should be OK if not
    called too often.

    The virtual memory extent information is stored in an rbtree rather than a
    linked list to improve the algorithmic scalability.

    There is a per-CPU allocator for small vmaps, which amortizes or avoids
    global locking.

    To use the per-CPU interface, the vm_map_ram / vm_unmap_ram interfaces
    must be used in place of vmap and vunmap. Vmalloc does not use these
    interfaces at the moment, so it will not be quite so scalable (although it
    will use lazy TLB flushing).

    As a quick test of performance, I ran a test that loops in the kernel,
    linearly mapping then touching then unmapping 4 pages. Different numbers
    of tests were run in parallel on an 4 core, 2 socket opteron. Results are
    in nanoseconds per map+touch+unmap.

    threads vanilla vmap rewrite
    1 14700 2900
    2 33600 3000
    4 49500 2800
    8 70631 2900

    So with a 8 cores, the rewritten version is already 25x faster.

    In a slightly more realistic test (although with an older and less
    scalable version of the patch), I ripped the not-very-good vunmap batching
    code out of XFS, and implemented the large buffer mapping with vm_map_ram
    and vm_unmap_ram... along with a couple of other tricks, I was able to
    speed up a large directory workload by 20x on a 64 CPU system. I believe
    vmap/vunmap is actually sped up a lot more than 20x on such a system, but
    I'm running into other locks now. vmap is pretty well blown off the
    profiles.

    Before:
    1352059 total 0.1401
    798784 _write_lock 8320.6667
    Cc: Jeremy Fitzhardinge
    Cc: Krzysztof Helt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

17 Aug, 2008

1 commit

  • Try to comment away a little of the confusion between mm's vm_area_struct
    vm_flags and vmalloc's vm_struct flags: based on an idea by Ulrich Drepper.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

28 Apr, 2008

2 commits

  • Add caller information so that /proc/vmallocinfo shows where the allocation
    request for a slice of vmalloc memory originated.

    Results in output like this:

    0xffffc20000000000-0xffffc20000801000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages
    0xffffc20000801000-0xffffc20000806000 20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc
    0xffffc20000806000-0xffffc20000c07000 4198400 alloc_large_system_hash+0x127/0x246 pages=1024 vmalloc vpages
    0xffffc20000c07000-0xffffc20000c0a000 12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc
    0xffffc20000c0a000-0xffffc20000c0c000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
    0xffffc20000c0c000-0xffffc20000c0f000 12288 acpi_os_map_memory+0x13/0x1c phys=cff64000 ioremap
    0xffffc20000c10000-0xffffc20000c15000 20480 acpi_os_map_memory+0x13/0x1c phys=cff65000 ioremap
    0xffffc20000c16000-0xffffc20000c18000 8192 acpi_os_map_memory+0x13/0x1c phys=cff69000 ioremap
    0xffffc20000c18000-0xffffc20000c1a000 8192 acpi_os_map_memory+0x13/0x1c phys=fed1f000 ioremap
    0xffffc20000c1a000-0xffffc20000c1c000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
    0xffffc20000c1c000-0xffffc20000c1e000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
    0xffffc20000c1e000-0xffffc20000c20000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
    0xffffc20000c20000-0xffffc20000c22000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
    0xffffc20000c22000-0xffffc20000c24000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
    0xffffc20000c24000-0xffffc20000c26000 8192 acpi_os_map_memory+0x13/0x1c phys=e0081000 ioremap
    0xffffc20000c26000-0xffffc20000c28000 8192 acpi_os_map_memory+0x13/0x1c phys=e0080000 ioremap
    0xffffc20000c28000-0xffffc20000c2d000 20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc
    0xffffc20000c2d000-0xffffc20000c31000 16384 tcp_init+0xd5/0x31c pages=3 vmalloc
    0xffffc20000c31000-0xffffc20000c34000 12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc
    0xffffc20000c34000-0xffffc20000c36000 8192 init_vdso_vars+0xde/0x1f1
    0xffffc20000c36000-0xffffc20000c38000 8192 pci_iomap+0x8a/0xb4 phys=d8e00000 ioremap
    0xffffc20000c38000-0xffffc20000c3a000 8192 usb_hcd_pci_probe+0x139/0x295 [usbcore] phys=d8e00000 ioremap
    0xffffc20000c3a000-0xffffc20000c3e000 16384 sys_swapon+0x509/0xa15 pages=3 vmalloc
    0xffffc20000c40000-0xffffc20000c61000 135168 e1000_probe+0x1c4/0xa32 phys=d8a20000 ioremap
    0xffffc20000c61000-0xffffc20000c6a000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
    0xffffc20000c6a000-0xffffc20000c73000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
    0xffffc20000c73000-0xffffc20000c7c000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
    0xffffc20000c7c000-0xffffc20000c7f000 12288 e1000e_setup_tx_resources+0x29/0xbe pages=2 vmalloc
    0xffffc20000c80000-0xffffc20001481000 8392704 pci_mmcfg_arch_init+0x90/0x118 phys=e0000000 ioremap
    0xffffc20001481000-0xffffc20001682000 2101248 alloc_large_system_hash+0x127/0x246 pages=512 vmalloc
    0xffffc20001682000-0xffffc20001e83000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages
    0xffffc20001e83000-0xffffc20002204000 3674112 alloc_large_system_hash+0x127/0x246 pages=896 vmalloc vpages
    0xffffc20002204000-0xffffc2000220d000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
    0xffffc2000220d000-0xffffc20002216000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
    0xffffc20002216000-0xffffc2000221f000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
    0xffffc2000221f000-0xffffc20002228000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
    0xffffc20002228000-0xffffc20002231000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
    0xffffc20002231000-0xffffc20002234000 12288 e1000e_setup_rx_resources+0x35/0x122 pages=2 vmalloc
    0xffffc20002240000-0xffffc20002261000 135168 e1000_probe+0x1c4/0xa32 phys=d8a60000 ioremap
    0xffffc20002261000-0xffffc2000270c000 4894720 sys_swapon+0x509/0xa15 pages=1194 vmalloc vpages
    0xffffffffa0000000-0xffffffffa0022000 139264 module_alloc+0x4f/0x55 pages=33 vmalloc
    0xffffffffa0022000-0xffffffffa0029000 28672 module_alloc+0x4f/0x55 pages=6 vmalloc
    0xffffffffa002b000-0xffffffffa0034000 36864 module_alloc+0x4f/0x55 pages=8 vmalloc
    0xffffffffa0034000-0xffffffffa003d000 36864 module_alloc+0x4f/0x55 pages=8 vmalloc
    0xffffffffa003d000-0xffffffffa0049000 49152 module_alloc+0x4f/0x55 pages=11 vmalloc
    0xffffffffa0049000-0xffffffffa0050000 28672 module_alloc+0x4f/0x55 pages=6 vmalloc

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Christoph Lameter
    Reviewed-by: KOSAKI Motohiro
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Implement a new proc file that allows the display of the currently allocated
    vmalloc memory.

    It allows to see the users of vmalloc. That is important if vmalloc space is
    scarce (i386 for example).

    And it's going to be important for the compound page fallback to vmalloc.
    Many of the current users can be switched to use compound pages with fallback.
    This means that the number of users of vmalloc is reduced and page tables no
    longer necessary to access the memory. /proc/vmallocinfo allows to review how
    that reduction occurs.

    If memory becomes fragmented and larger order allocations are no longer
    possible then /proc/vmallocinfo allows to see which compound page allocations
    fell back to virtual compound pages. That is important for new users of
    virtual compound pages. Such as order 1 stack allocation etc that may
    fallback to virtual compound pages in the future.

    /proc/vmallocinfo permissions are made readable-only-by-root to avoid possible
    information leakage.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: CONFIG_MMU=n build fix]
    Signed-off-by: Christoph Lameter
    Reviewed-by: KOSAKI Motohiro
    Cc: Hugh Dickins
    Cc: Nick Piggin
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

06 Feb, 2008

1 commit


22 Jul, 2007

1 commit

  • get_vm_area always returns an area with an adjacent guard page. That guard
    page is included in vm_struct.size. iounmap uses vm_struct.size to
    determine how much address space needs to have change_page_attr applied to
    it, which will BUG if applied to the guard page.

    This patch adds a helper function - get_vm_area_size() in linux/vmalloc.h -
    to return the actual size of a vm area, and uses it to make iounmap do the
    right thing. There are probably other places which should be using
    get_vm_area_size().

    Thanks to Dave Young for debugging the
    problem.

    [ Andi, it wasn't clear to me whether x86_64 needs the same fix. ]

    Signed-off-by: Jeremy Fitzhardinge
    Cc: Dave Young
    Cc: Chuck Ebbert
    Signed-off-by: Andrew Morton
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Jeremy Fitzhardinge
     

18 Jul, 2007

1 commit

  • Allocate/release a chunk of vmalloc address space:
    alloc_vm_area reserves a chunk of address space, and makes sure all
    the pagetables are constructed for that address range - but no pages.

    free_vm_area releases the address space range.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Ian Pratt
    Signed-off-by: Christian Limpach
    Signed-off-by: Chris Wright
    Cc: "Jan Beulich"
    Cc: "Andi Kleen"

    Jeremy Fitzhardinge
     

14 Jun, 2007

1 commit

  • This makes unmap_vm_area static and a wrapper around a new
    exported unmap_kernel_range that takes an explicit range instead
    of a vm_area struct.

    This makes it more versatile for code that wants to play with kernel
    page tables outside of the standard vmalloc area.

    (One example is some rework of the PowerPC PCI IO space mapping
    code that depends on that patch and removes some code duplication
    and horrible abuse of forged struct vm_struct).

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Paul Mackerras

    Benjamin Herrenschmidt
     

09 May, 2007

1 commit

  • This patch moves the die notifier handling to common code. Previous
    various architectures had exactly the same code for it. Note that the new
    code is compiled unconditionally, this should be understood as an appel to
    the other architecture maintainer to implement support for it aswell (aka
    sprinkling a notify_die or two in the proper place)

    arm had a notifiy_die that did something totally different, I renamed it to
    arm_notify_die as part of the patch and made it static to the file it's
    declared and used at. avr32 used to pass slightly less information through
    this interface and I brought it into line with the other architectures.

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
    [bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
    Signed-off-by: Christoph Hellwig
    Cc:
    Cc: Russell King
    Signed-off-by: Bryan Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

13 Nov, 2006

1 commit

  • - reorder 'struct vm_struct' to speedup lookups on CPUS with small cache
    lines. The fields 'next,addr,size' should be now in the same cache line,
    to speedup lookups.

    - One minor cleanup in __get_vm_area_node()

    - Bugfixes in vmalloc_user() and vmalloc_32_user() NULL returns from
    __vmalloc() and __find_vm_area() were not tested.

    [akpm@osdl.org: remove redundant BUG_ONs]
    Signed-off-by: Eric Dumazet
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

29 Oct, 2006

1 commit

  • If __vmalloc is called to allocate memory with GFP_ATOMIC in atomic
    context, the chain of calls results in __get_vm_area_node allocating memory
    for vm_struct with GFP_KERNEL, causing the 'sleeping from invalid context'
    warning. This patch fixes it by passing the gfp flags along so
    __get_vm_area_node allocates memory for vm_struct with the same flags.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Giridhar Pemmasani
     

27 Sep, 2006

1 commit


26 Sep, 2006

1 commit

  • This patch makes the following needlessly global functions static:
    - slab.c: kmem_find_general_cachep()
    - swap.c: __page_cache_release()
    - vmalloc.c: __vmalloc_node()

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

15 Jul, 2006

1 commit

  • __vunmap must not rely on area->nr_pages when picking the release methode
    for area->pages. It may be too small when __vmalloc_area_node failed early
    due to lacking memory. Instead, use a flag in vmstruct to differentiate.

    Signed-off-by: Jan Kiszka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kiszka
     

23 Jun, 2006

1 commit


30 Oct, 2005

1 commit

  • This patch adds

    vmalloc_node(size, node) -> Allocate necessary memory on the specified node

    and

    get_vm_area_node(size, flags, node)

    and the other functions that it depends on.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

09 Oct, 2005

1 commit

  • - added typedef unsigned int __nocast gfp_t;

    - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
    the same warnings as far as sparse is concerned, doesn't change
    generated code (from gcc point of view we replaced unsigned int with
    typedef) and documents what's going on far better.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

05 Sep, 2005

1 commit

  • Version 6 of the ARM architecture introduces the concept of 16MB pages
    (supersections) and 36-bit (40-bit actually, but nobody uses this) physical
    addresses. 36-bit addressed memory and I/O and ARMv6 can only be mapped
    using supersections and the requirement on these is that both virtual and
    physical addresses be 16MB aligned. In trying to add support for ioremap()
    of 36-bit I/O, we run into the issue that get_vm_area() allows for a
    maximum of 512K alignment via the IOREMAP_MAX_ORDER constant. To work
    around this, we can:

    - Allocate a larger VM area than needed (size + (1ul << IOREMAP_MAX_ORDER))
    and then align the pointer ourselves, but this ends up with 512K of
    wasted VM per ioremap().

    - Provide a new __get_vm_area_aligned() API and make __get_vm_area() sit
    on top of this. I did this and it works but I don't like the idea
    adding another VM API just for this one case.

    - My preferred solution which is to allow the architecture to override
    the IOREMAP_MAX_ORDER constant with it's own version.

    Signed-off-by: Deepak Saxena
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Deepak Saxena
     

21 May, 2005

1 commit

  • Caused oopses again. Also fix potential mismatch in checking if
    change_page_attr was needed.

    To do it without races I needed to change mm/vmalloc.c to export a
    __remove_vm_area that does not take vmlist lock.

    Noticed by Terence Ripperda and based on a patch of his.

    Signed-off-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds