15 Jan, 2012

1 commit

  • Kmemleak patches

    Main features:
    - Handle percpu memory allocations (only scanning them, not actually
    reporting).
    - Memory hotplug support.

    Usability improvements:
    - Show the origin of early allocations.
    - Report previously found leaks even if kmemleak has been disabled by
    some error.

    * tag 'kmemleak' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux:
    kmemleak: Add support for memory hotplug
    kmemleak: Handle percpu memory allocation
    kmemleak: Report previously found leaks even after an error
    kmemleak: When the early log buffer is exceeded, report the actual number
    kmemleak: Show where early_log issues come from

    Linus Torvalds
     

16 Dec, 2011

1 commit

  • per_cpu_ptr_to_phys() incorrectly rounds up its result for non-kmalloc
    case to the page boundary, which is bogus for any non-page-aligned
    address.

    This affects the only in-tree user of this function - sysfs handler
    for per-cpu 'crash_notes' physical address. The trouble is that the
    crash_notes per-cpu variable is not page-aligned:

    crash_notes = 0xc08e8ed4
    PER-CPU OFFSET VALUES:
    CPU 0: 3711f000
    CPU 1: 37129000
    CPU 2: 37133000
    CPU 3: 3713d000

    So, the per-cpu addresses are:
    crash_notes on CPU 0: f7a07ed4 => phys 36b57ed4
    crash_notes on CPU 1: f7a11ed4 => phys 36b4ded4
    crash_notes on CPU 2: f7a1bed4 => phys 36b43ed4
    crash_notes on CPU 3: f7a25ed4 => phys 36b39ed4

    However, /sys/devices/system/cpu/cpu*/crash_notes says:
    /sys/devices/system/cpu/cpu0/crash_notes: 36b57000
    /sys/devices/system/cpu/cpu1/crash_notes: 36b4d000
    /sys/devices/system/cpu/cpu2/crash_notes: 36b43000
    /sys/devices/system/cpu/cpu3/crash_notes: 36b39000

    As you can see, all values are rounded down to a page
    boundary. Consequently, this is where kexec sets up the NOTE segments,
    and thus where the secondary kernel is looking for them. However, when
    the first kernel crashes, it saves the notes to the unaligned
    addresses, where they are not found.

    Fix it by adding offset_in_page() to the translated page address.

    -tj: Combined Eugene's and Petr's commit messages.

    Signed-off-by: Eugene Surovegin
    Signed-off-by: Tejun Heo
    Reported-by: Petr Tesarik
    Cc: stable@kernel.org

    Eugene Surovegin
     

03 Dec, 2011

1 commit

  • This patch adds kmemleak callbacks from the percpu allocator, reducing a
    number of false positives caused by kmemleak not scanning such memory
    blocks. The percpu chunks are never reported as leaks because of current
    kmemleak limitations with the __percpu pointer not pointing directly to
    the actual chunks.

    Reported-by: Huajun Li
    Acked-by: Christoph Lameter
    Acked-by: Tejun Heo
    Signed-off-by: Catalin Marinas

    Catalin Marinas
     

24 Nov, 2011

1 commit


23 Nov, 2011

2 commits

  • Percpu allocator recorded the cpus which map to the first and last
    units in pcpu_first/last_unit_cpu respectively and used them to
    determine the address range of a chunk - e.g. it assumed that the
    first unit has the lowest address in a chunk while the last unit has
    the highest address.

    This simply isn't true. Groups in a chunk can have arbitrary positive
    or negative offsets from the previous one and there is no guarantee
    that the first unit occupies the lowest offset while the last one the
    highest.

    Fix it by actually comparing unit offsets to determine cpus occupying
    the lowest and highest offsets. Also, rename pcu_first/last_unit_cpu
    to pcpu_low/high_unit_cpu to avoid confusion.

    The chunk address range is used to flush cache on vmalloc area
    map/unmap and decide whether a given address is in the first chunk by
    per_cpu_ptr_to_phys() and the bug was discovered by invalid
    per_cpu_ptr_to_phys() translation for crash_note.

    Kudos to Dave Young for tracking down the problem.

    Signed-off-by: Tejun Heo
    Reported-by: WANG Cong
    Reported-by: Dave Young
    Tested-by: Dave Young
    LKML-Reference:
    Cc: stable @kernel.org

    Tejun Heo
     
  • Currently pcpu_mem_alloc() is implemented always return zeroed memory.
    So rename it to make user like pcpu_get_pages_and_bitmap() know don't
    reinit it.

    Signed-off-by: Bob Liu
    Reviewed-by: Pekka Enberg
    Reviewed-by: Michal Hocko
    Signed-off-by: Tejun Heo

    Bob Liu
     

25 May, 2011

1 commit


24 May, 2011

1 commit


31 Mar, 2011

1 commit


29 Mar, 2011

1 commit

  • On 32-bit systems which don't happen to implicitly define or cast
    VMALLOC_START and/or VMALLOC_END to long in their arch headers, the
    printk in the percpu code will cause a warning to be emitted:

    mm/percpu.c: In function 'pcpu_embed_first_chunk':
    mm/percpu.c:1648: warning: format '%lx' expects type 'long unsigned int',
    but argument 3 has type 'unsigned int'

    So add an explicit cast to unsigned long here.

    Signed-off-by: Mike Frysinger
    Signed-off-by: Tejun Heo

    Mike Frysinger
     

28 Mar, 2011

1 commit

  • per_cpu_ptr_to_phys() uses VMALLOC_START and VMALLOC_END to determine if an
    address is in the vmalloc() region or not. This is incorrect on NOMMU as
    there is no real vmalloc() capability (vmalloc() is emulated by kmalloc()).

    The correct way to do this is to use is_vmalloc_addr(). This encapsulates the
    vmalloc() region test in MMU mode and just returns 0 in NOMMU mode.

    On FRV in NOMMU mode, the percpu compilation fails without this patch:

    mm/percpu.c: In function 'per_cpu_ptr_to_phys':
    mm/percpu.c:1011: error: 'VMALLOC_START' undeclared (first use in this function)
    mm/percpu.c:1011: error: (Each undeclared identifier is reported only once
    mm/percpu.c:1011: error: for each function it appears in.)
    mm/percpu.c:1012: error: 'VMALLOC_END' undeclared (first use in this function)
    mm/percpu.c:1018: warning: control reaches end of non-void function

    Signed-off-by: David Howells

    David Howells
     

25 Mar, 2011

1 commit

  • Percpu allocator honors alignment request upto PAGE_SIZE and both the
    percpu addresses in the percpu address space and the translated kernel
    addresses should be aligned accordingly. The calculation of the
    former depends on the alignment of percpu output section in the kernel
    image.

    The linker script macros PERCPU_VADDR() and PERCPU() are used to
    define this output section and the latter takes @align parameter.
    Several architectures are using @align smaller than PAGE_SIZE breaking
    percpu memory alignment.

    This patch removes @align parameter from PERCPU(), renames it to
    PERCPU_SECTION() and makes it always align to PAGE_SIZE. While at it,
    add PCPU_SETUP_BUG_ON() checks such that alignment problems are
    reliably detected and remove percpu alignment comment recently added
    in workqueue.c as the condition would trigger BUG way before reaching
    there.

    For um, this patch raises the alignment of percpu area. As the area
    is in .init, there shouldn't be any noticeable difference.

    This problem was discovered by David Howells while debugging boot
    failure on mn10300.

    Signed-off-by: Tejun Heo
    Acked-by: Mike Frysinger
    Cc: uclinux-dist-devel@blackfin.uclinux.org
    Cc: David Howells
    Cc: Jeff Dike
    Cc: user-mode-linux-devel@lists.sourceforge.net

    Tejun Heo
     

14 Jan, 2011

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
    Documentation/trace/events.txt: Remove obsolete sched_signal_send.
    writeback: fix global_dirty_limits comment runtime -> real-time
    ppc: fix comment typo singal -> signal
    drivers: fix comment typo diable -> disable.
    m68k: fix comment typo diable -> disable.
    wireless: comment typo fix diable -> disable.
    media: comment typo fix diable -> disable.
    remove doc for obsolete dynamic-printk kernel-parameter
    remove extraneous 'is' from Documentation/iostats.txt
    Fix spelling milisec -> ms in snd_ps3 module parameter description
    Fix spelling mistakes in comments
    Revert conflicting V4L changes
    i7core_edac: fix typos in comments
    mm/rmap.c: fix comment
    sound, ca0106: Fix assignment to 'channel'.
    hrtimer: fix a typo in comment
    init/Kconfig: fix typo
    anon_inodes: fix wrong function name in comment
    fix comment typos concerning "consistent"
    poll: fix a typo in comment
    ...

    Fix up trivial conflicts in:
    - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
    - fs/ext4/ext4.h

    Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.

    Linus Torvalds
     

08 Jan, 2011

1 commit

  • * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits)
    gameport: use this_cpu_read instead of lookup
    x86: udelay: Use this_cpu_read to avoid address calculation
    x86: Use this_cpu_inc_return for nmi counter
    x86: Replace uses of current_cpu_data with this_cpu ops
    x86: Use this_cpu_ops to optimize code
    vmstat: User per cpu atomics to avoid interrupt disable / enable
    irq_work: Use per cpu atomics instead of regular atomics
    cpuops: Use cmpxchg for xchg to avoid lock semantics
    x86: this_cpu_cmpxchg and this_cpu_xchg operations
    percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support
    percpu,x86: relocate this_cpu_add_return() and friends
    connector: Use this_cpu operations
    xen: Use this_cpu_inc_return
    taskstats: Use this_cpu_ops
    random: Use this_cpu_inc_return
    fs: Use this_cpu_inc_return in buffer.c
    highmem: Use this_cpu_xx_return() operations
    vmstat: Use this_cpu_inc_return for vm statistics
    x86: Support for this_cpu_add, sub, dec, inc_return
    percpu: Generic support for this_cpu_add, sub, dec, inc_return
    ...

    Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c}
    as per Tejun.

    Linus Torvalds
     

22 Dec, 2010

1 commit


07 Dec, 2010

1 commit


02 Nov, 2010

1 commit

  • "gadget", "through", "command", "maintain", "maintain", "controller", "address",
    "between", "initiali[zs]e", "instead", "function", "select", "already",
    "equal", "access", "management", "hierarchy", "registration", "interest",
    "relative", "memory", "offset", "already",

    Signed-off-by: Uwe Kleine-König
    Signed-off-by: Jiri Kosina

    Uwe Kleine-König
     

25 Oct, 2010

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
    Update broken web addresses in arch directory.
    Update broken web addresses in the kernel.
    Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget
    Revert "Fix typo: configuation => configuration" partially
    ida: document IDA_BITMAP_LONGS calculation
    ext2: fix a typo on comment in ext2/inode.c
    drivers/scsi: Remove unnecessary casts of private_data
    drivers/s390: Remove unnecessary casts of private_data
    net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data
    drivers/infiniband: Remove unnecessary casts of private_data
    drivers/gpu/drm: Remove unnecessary casts of private_data
    kernel/pm_qos_params.c: Remove unnecessary casts of private_data
    fs/ecryptfs: Remove unnecessary casts of private_data
    fs/seq_file.c: Remove unnecessary casts of private_data
    arm: uengine.c: remove C99 comments
    arm: scoop.c: remove C99 comments
    Fix typo configue => configure in comments
    Fix typo: configuation => configuration
    Fix typo interrest[ing|ed] => interest[ing|ed]
    Fix various typos of valid in comments
    ...

    Fix up trivial conflicts in:
    drivers/char/ipmi/ipmi_si_intf.c
    drivers/usb/gadget/rndis.c
    net/irda/irnet/irnet_ppp.c

    Linus Torvalds
     

23 Oct, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
    percpu: update comments to reflect that percpu allocations are always zero-filled
    percpu: Optimize __get_cpu_var()
    x86, percpu: Optimize this_cpu_ptr
    percpu: clear memory allocated with the km allocator
    percpu: fix build breakage on s390 and cleanup build configuration tests
    percpu: use percpu allocator on UP too
    percpu: reduce PCPU_MIN_UNIT_SIZE to 32k
    vmalloc: pcpu_get/free_vm_areas() aren't needed on UP

    Fixed up trivial conflicts in include/linux/percpu.h

    Linus Torvalds
     

21 Sep, 2010

1 commit

  • pcpu_first/last_unit_cpu are used to track which cpu has the first and
    last units assigned. This in turn is used to determine the span of a
    chunk for man/unmap cache flushes and whether an address belongs to
    the first chunk or not in per_cpu_ptr_to_phys().

    When the number of possible CPUs isn't power of two, a chunk may
    contain unassigned units towards the end of a chunk. The logic to
    determine pcpu_last_unit_cpu was incorrect when there was an unused
    unit at the end of a chunk. It failed to ignore the unused unit and
    assigned the unused marker NR_CPUS to pcpu_last_unit_cpu.

    This was discovered through kdump failure which was caused by
    malfunctioning per_cpu_ptr_to_phys() on a kvm setup with 50 possible
    CPUs by CAI Qian.

    Signed-off-by: Tejun Heo
    Reported-by: CAI Qian
    Cc: stable@kernel.org

    Tejun Heo
     

10 Sep, 2010

2 commits


08 Sep, 2010

1 commit

  • On UP, percpu allocations were redirected to kmalloc. This has the
    following problems.

    * For certain amount of allocations (determined by
    PERCPU_DYNAMIC_EARLY_SLOTS and PERCPU_DYNAMIC_EARLY_SIZE), percpu
    allocator can be used before the usual kernel memory allocator is
    brought online. On SMP, this is used to initialize the kernel
    memory allocator.

    * percpu allocator honors alignment upto PAGE_SIZE but kmalloc()
    doesn't. For example, workqueue makes use of larger alignments for
    cpu_workqueues.

    Currently, users of percpu allocators need to handle UP differently,
    which is somewhat fragile and ugly. Other than small amount of
    memory, there isn't much to lose by enabling percpu allocator on UP.
    It can simply use kernel memory based chunk allocation which was added
    for SMP archs w/o MMUs.

    This patch removes mm/percpu_up.c, builds mm/percpu.c on UP too and
    makes UP build use percpu-km. As percpu addresses and kernel
    addresses are always identity mapped and static percpu variables don't
    need any special treatment, nothing is arch dependent and mm/percpu.c
    implements generic setup_per_cpu_areas() for UP.

    Signed-off-by: Tejun Heo
    Reviewed-by: Christoph Lameter
    Acked-by: Pekka Enberg

    Tejun Heo
     

27 Aug, 2010

2 commits


11 Aug, 2010

1 commit


28 Jun, 2010

2 commits

  • This patch updates percpu allocator such that it can serve limited
    amount of allocation before slab comes online. This is primarily to
    allow slab to depend on working percpu allocator.

    Two parameters, PERCPU_DYNAMIC_EARLY_SIZE and SLOTS, determine how
    much memory space and allocation map slots are reserved. If this
    reserved area is exhausted, WARN_ON_ONCE() will trigger and allocation
    will fail till slab comes online.

    The following changes are made to implement early alloc.

    * pcpu_mem_alloc() now checks slab_is_available()

    * Chunks are allocated using pcpu_mem_alloc()

    * Init paths make sure ai->dyn_size is at least as large as
    PERCPU_DYNAMIC_EARLY_SIZE.

    * Initial alloc maps are allocated in __initdata and copied to
    kmalloc'd areas once slab is online.

    Signed-off-by: Tejun Heo
    Cc: Christoph Lameter

    Tejun Heo
     
  • In pcpu_build_alloc_info() and pcpu_embed_first_chunk(), @dyn_size was
    ssize_t, -1 meant auto-size, 0 forced 0 and positive meant minimum
    size. There's no use case for forcing 0 and the upcoming early alloc
    support always requires non-zero dynamic size. Make @dyn_size always
    mean minimum dyn_size.

    While at it, make pcpu_build_alloc_info() static which doesn't have
    any external caller as suggested by David Rientjes.

    Signed-off-by: Tejun Heo
    Cc: David Rientjes

    Tejun Heo
     

18 Jun, 2010

1 commit

  • per_cpu_ptr_to_phys() determines whether the passed in @addr belongs
    to the first_chunk or not by just matching the address against the
    address range of the base unit (unit0, used by cpu0). When an adress
    from another cpu was passed in, it will always determine that the
    address doesn't belong to the first chunk even when it does. This
    makes the function return a bogus physical address which may lead to
    crash.

    This problem was discovered by Cliff Wickman while investigating a
    crash during kdump on a SGI UV system.

    Signed-off-by: Tejun Heo
    Reported-by: Cliff Wickman
    Tested-by: Cliff Wickman
    Cc: stable@kernel.org

    Tejun Heo
     

17 Jun, 2010

1 commit


01 May, 2010

5 commits

  • Implement an alternate percpu chunk management based on kernel memeory
    for nommu SMP architectures. Instead of mapping into vmalloc area,
    chunks are allocated as a contiguous kernel memory using
    alloc_pages(). As such, percpu allocator on nommu will have the
    following restrictions.

    * It can't fill chunks on-demand page-by-page. It has to allocate
    each chunk fully upfront.

    * It can't support sparse chunk for NUMA configurations. SMP w/o mmu
    is crazy enough. Let's hope no one does NUMA w/o mmu. :-P

    * If chunk size isn't power-of-two multiple of PAGE_SIZE, the
    unaligned amount will be wasted on each chunk. So, archs which use
    this better align chunk size.

    For instructions on how to use this, read the comment on top of
    mm/percpu-km.c.

    Signed-off-by: Tejun Heo
    Reviewed-by: David Howells
    Cc: Graff Yang
    Cc: Sonic Zhang

    Tejun Heo
     
  • Separate out and move chunk management (creation/desctruction and
    [de]population) code into percpu-vm.c which is included by percpu.c
    and compiled together. The interface for chunk management is defined
    as follows.

    * pcpu_populate_chunk - populate the specified range of a chunk
    * pcpu_depopulate_chunk - depopulate the specified range of a chunk
    * pcpu_create_chunk - create a new chunk
    * pcpu_destroy_chunk - destroy a chunk, always preceded by full depop
    * pcpu_addr_to_page - translate address to physical address
    * pcpu_verify_alloc_info - check alloc_info is acceptable during init

    Other than wrapping vmalloc_to_page() inside pcpu_addr_to_page() and
    dummy pcpu_verify_alloc_info() implementation, this patch only moves
    code around. This separation is to allow alternate chunk management
    implementation.

    Signed-off-by: Tejun Heo
    Reviewed-by: David Howells
    Cc: Graff Yang
    Cc: Sonic Zhang

    Tejun Heo
     
  • Make the following misc preparations for percpu nommu support.

    * Remove refernces to vmalloc in common comments as nommu percpu won't
    use it.

    * Rename chunk->vms to chunk->data and make it void *. Its use is
    determined by chunk management implementation.

    * Relocate utility functions and add __maybe_unused to functions which
    might not be used by different chunk management implementations.

    This patch doesn't cause any functional change. This is to allow
    alternate chunk management implementation for percpu nommu support.

    Signed-off-by: Tejun Heo
    Reviewed-by: David Howells
    Cc: Graff Yang
    Cc: Sonic Zhang

    Tejun Heo
     
  • Reorganize alloc/free_pcpu_chunk() such that chunk struct alloc/free
    live in pcpu_alloc/free_chunk() and the rest in
    pcpu_create/destroy_chunk(). While at it, add missing error handling
    for chunk->map allocation failure.

    This is to allow alternate chunk management implementation for percpu
    nommu support.

    Signed-off-by: Tejun Heo
    Reviewed-by: David Howells
    Cc: Graff Yang
    Cc: Sonic Zhang

    Tejun Heo
     
  • Factor out pcpu_addr_in_first/reserved_chunk() from
    pcpu_chunk_addr_search() and use it to update per_cpu_ptr_to_phys()
    such that it handles first chunk differently from the rest.

    This patch doesn't cause any functional change and is to prepare for
    percpu nommu support.

    Signed-off-by: Tejun Heo
    Reviewed-by: David Howells
    Cc: Graff Yang
    Cc: Sonic Zhang

    Tejun Heo
     

29 Mar, 2010

1 commit

  • lockdep has custom code to check whether a pointer belongs to static
    percpu area which is somewhat broken. Implement proper
    is_kernel/module_percpu_address() and replace the custom code.

    On UP, percpu variables are regular static variables and can't be
    distinguished from them. Always return %false on UP.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Ingo Molnar

    Tejun Heo
     

17 Feb, 2010

1 commit

  • Add __percpu sparse annotations to core subsystems.

    These annotations are to make sparse consider percpu variables to be
    in a different address space and warn if accessed without going
    through percpu accessors. This patch doesn't affect normal builds.

    Signed-off-by: Tejun Heo
    Reviewed-by: Christoph Lameter
    Acked-by: Paul E. McKenney
    Cc: Jens Axboe
    Cc: linux-mm@kvack.org
    Cc: Rusty Russell
    Cc: Dipankar Sarma
    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Eric Biederman

    Tejun Heo
     

02 Feb, 2010

1 commit


12 Jan, 2010

1 commit

  • __pcpu_ptr_to_addr() can be overridden by the architecture and might not
    behave well if passed a NULL pointer. So avoid calling it until we have
    verified that its arg is not NULL.

    Cc: Rusty Russell
    Cc: Kamalesh Babulal
    Acked-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

05 Jan, 2010

1 commit