29 Mar, 2014

1 commit


18 Mar, 2014

1 commit

  • 723ad1d90b56 ("percpu: store offsets instead of lengths in ->map[]")
    updated percpu area allocator to use the lowest bit, instead of sign,
    to signify whether the area is occupied and forced min align to 2;
    unfortunately, it forgot to force the allocation size to be even
    causing malfunctions for the very rare odd-sized allocations.

    Always force the allocations to be even sized.

    tj: Wrote patch description.

    Original-patch-by: Al Viro
    Signed-off-by: Tejun Heo

    Viro
     

07 Mar, 2014

3 commits

  • If we know that first N areas are all in use, we can obviously skip
    them when searching for a free one. And that kind of hint is very
    easy to maintain.

    Signed-off-by: Al Viro
    Signed-off-by: Tejun Heo

    Al Viro
     
  • Current code keeps +-length for each area in chunk->map[]. It has
    several unpleasant consequences:
    * even if we know that first 50 areas are all in use, allocation
    still needs to go through all those areas just to sum their sizes, just
    to get the offset of free one.
    * freeing needs to find the array entry refering to the area
    in question; again, the need to sum the sizes until we reach the offset
    we are interested in. Note that offsets are monotonous, so simple
    binary search would do here.

    New data representation: array of pairs.
    Each pair is represented by one int - we use offset|1 for
    and offset for (we make sure that all offsets are even).
    In the end we put a sentry entry - . The first
    entry is ; it would be possible to store together the flag
    for Nth area and offset for N+1st, but that leads to much hairier code.

    In other words, where the old variant would have
    4, -8, -4, 4, -12, 100
    (4 bytes free, 8 in use, 4 in use, 4 free, 12 in use, 100 free) we store
    , , , , , ,
    i.e.
    0, 5, 13, 16, 21, 32, 133

    This commit switches to new data representation and takes care of a couple
    of low-hanging fruits in free_pcpu_area() - one is the switch to binary
    search, another is not doing two memmove() when one would do. Speeding
    the alloc side up (by keeping track of how many areas in the beginning are
    known to be all in use) also becomes possible - that'll be done in the next
    commit.

    Signed-off-by: Al Viro
    Signed-off-by: Tejun Heo

    Al Viro
     
  • ... and simplify the results a bit. Makes the next step easier
    to deal with - we will be changing the data representation for
    chunk->map[] and it's easier to do if the code in question is
    not split between pcpu_alloc_area() and pcpu_split_block().

    Signed-off-by: Al Viro
    Signed-off-by: Tejun Heo

    Al Viro
     

22 Jan, 2014

2 commits

  • Merge first patch-bomb from Andrew Morton:

    - a couple of misc things

    - inotify/fsnotify work from Jan

    - ocfs2 updates (partial)

    - about half of MM

    * emailed patches from Andrew Morton : (117 commits)
    mm/migrate: remove unused function, fail_migrate_page()
    mm/migrate: remove putback_lru_pages, fix comment on putback_movable_pages
    mm/migrate: correct failure handling if !hugepage_migration_support()
    mm/migrate: add comment about permanent failure path
    mm, page_alloc: warn for non-blockable __GFP_NOFAIL allocation failure
    mm: compaction: reset scanner positions immediately when they meet
    mm: compaction: do not mark unmovable pageblocks as skipped in async compaction
    mm: compaction: detect when scanners meet in isolate_freepages
    mm: compaction: reset cached scanner pfn's before reading them
    mm: compaction: encapsulate defer reset logic
    mm: compaction: trace compaction begin and end
    memcg, oom: lock mem_cgroup_print_oom_info
    sched: add tracepoints related to NUMA task migration
    mm: numa: do not automatically migrate KSM pages
    mm: numa: trace tasks that fail migration due to rate limiting
    mm: numa: limit scope of lock for NUMA migrate rate limiting
    mm: numa: make NUMA-migrate related functions static
    lib/show_mem.c: show num_poisoned_pages when oom
    mm/hwpoison: add '#' to hwpoison_inject
    mm/memblock: use WARN_ONCE when MAX_NUMNODES passed as input parameter
    ...

    Linus Torvalds
     
  • Switch to memblock interfaces for early memory allocator instead of
    bootmem allocator. No functional change in beahvior than what it is in
    current code from bootmem users points of view.

    Archs already converted to NO_BOOTMEM now directly use memblock
    interfaces instead of bootmem wrappers build on top of memblock. And
    the archs which still uses bootmem, these new apis just fallback to
    exiting bootmem APIs.

    Signed-off-by: Santosh Shilimkar
    Cc: "Rafael J. Wysocki"
    Cc: Arnd Bergmann
    Cc: Christoph Lameter
    Cc: Greg Kroah-Hartman
    Cc: Grygorii Strashko
    Cc: H. Peter Anvin
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: Konrad Rzeszutek Wilk
    Cc: Michal Hocko
    Cc: Paul Walmsley
    Cc: Pavel Machek
    Cc: Russell King
    Cc: Tejun Heo
    Cc: Tony Lindgren
    Cc: Yinghai Lu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Santosh Shilimkar
     

21 Jan, 2014

1 commit


23 Sep, 2013

1 commit

  • If memory allocation of in pcpu_embed_first_chunk() fails, the
    allocated memory is not released correctly. In the release loop also
    the non-allocated elements are released which leads to the following
    kernel BUG on systems with very little memory:

    [ 0.000000] kernel BUG at mm/bootmem.c:307!
    [ 0.000000] illegal operation: 0001 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    [ 0.000000] Modules linked in:
    [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0 #22
    [ 0.000000] task: 0000000000a20ae0 ti: 0000000000a08000 task.ti: 0000000000a08000
    [ 0.000000] Krnl PSW : 0400000180000000 0000000000abda7a (__free+0x116/0x154)
    [ 0.000000] R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:0 PM:0 EA:3
    ...
    [ 0.000000] [] mark_bootmem_node+0xde/0xf0
    [ 0.000000] [] mark_bootmem+0xa8/0x118
    [ 0.000000] [] pcpu_embed_first_chunk+0xe7a/0xf0c
    [ 0.000000] [] setup_per_cpu_areas+0x4a/0x28c

    To fix the problem now only allocated elements are released. This then
    leads to the correct kernel panic:

    [ 0.000000] Kernel panic - not syncing: Failed to initialize percpu areas.
    ...
    [ 0.000000] Call Trace:
    [ 0.000000] ([] show_trace+0x132/0x150)
    [ 0.000000] [] show_stack+0xc4/0xd4
    [ 0.000000] [] dump_stack+0x74/0xd8
    [ 0.000000] [] panic+0xea/0x264
    [ 0.000000] [] setup_per_cpu_areas+0x5c/0x28c

    tj: Flipped if conditional so that it doesn't need "continue".

    Signed-off-by: Michael Holzheu
    Signed-off-by: Tejun Heo

    Michael Holzheu
     

02 Dec, 2012

1 commit


29 Oct, 2012

1 commit


06 Oct, 2012

1 commit


10 May, 2012

2 commits

  • Kmemleak tracks the percpu allocations via a specific API and the
    originally allocated areas must be removed from kmemleak (via
    kmemleak_free). The code was already doing this for SMP systems.

    Reported-by: Sami Liedes
    Cc: Tejun Heo
    Cc: Christoph Lameter
    Signed-off-by: Catalin Marinas
    Signed-off-by: Tejun Heo

    Catalin Marinas
     
  • pcpu_embed_first_chunk() allocates memory for each node, copies percpu
    data and frees unused portions of it before proceeding to the next
    group. This assumes that allocations for different nodes doesn't
    overlap; however, depending on memory topology, the bootmem allocator
    may end up allocating memory from a different node than the requested
    one which may overlap with the portion freed from one of the previous
    percpu areas. This leads to percpu groups for different nodes
    overlapping which is a serious bug.

    This patch separates out copy & partial free from the allocation loop
    such that all allocations are complete before partial frees happen.

    This also fixes overlapping frees which could happen on allocation
    failure path - out_free_areas path frees whole groups but the groups
    could have portions freed at that point.

    Signed-off-by: Tejun Heo
    Cc: stable@vger.kernel.org
    Reported-by: "Pavel V. Panteleev"
    Tested-by: "Pavel V. Panteleev"
    LKML-Reference:

    Tejun Heo
     

30 Mar, 2012

1 commit


15 Jan, 2012

1 commit

  • Kmemleak patches

    Main features:
    - Handle percpu memory allocations (only scanning them, not actually
    reporting).
    - Memory hotplug support.

    Usability improvements:
    - Show the origin of early allocations.
    - Report previously found leaks even if kmemleak has been disabled by
    some error.

    * tag 'kmemleak' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux:
    kmemleak: Add support for memory hotplug
    kmemleak: Handle percpu memory allocation
    kmemleak: Report previously found leaks even after an error
    kmemleak: When the early log buffer is exceeded, report the actual number
    kmemleak: Show where early_log issues come from

    Linus Torvalds
     

16 Dec, 2011

1 commit

  • per_cpu_ptr_to_phys() incorrectly rounds up its result for non-kmalloc
    case to the page boundary, which is bogus for any non-page-aligned
    address.

    This affects the only in-tree user of this function - sysfs handler
    for per-cpu 'crash_notes' physical address. The trouble is that the
    crash_notes per-cpu variable is not page-aligned:

    crash_notes = 0xc08e8ed4
    PER-CPU OFFSET VALUES:
    CPU 0: 3711f000
    CPU 1: 37129000
    CPU 2: 37133000
    CPU 3: 3713d000

    So, the per-cpu addresses are:
    crash_notes on CPU 0: f7a07ed4 => phys 36b57ed4
    crash_notes on CPU 1: f7a11ed4 => phys 36b4ded4
    crash_notes on CPU 2: f7a1bed4 => phys 36b43ed4
    crash_notes on CPU 3: f7a25ed4 => phys 36b39ed4

    However, /sys/devices/system/cpu/cpu*/crash_notes says:
    /sys/devices/system/cpu/cpu0/crash_notes: 36b57000
    /sys/devices/system/cpu/cpu1/crash_notes: 36b4d000
    /sys/devices/system/cpu/cpu2/crash_notes: 36b43000
    /sys/devices/system/cpu/cpu3/crash_notes: 36b39000

    As you can see, all values are rounded down to a page
    boundary. Consequently, this is where kexec sets up the NOTE segments,
    and thus where the secondary kernel is looking for them. However, when
    the first kernel crashes, it saves the notes to the unaligned
    addresses, where they are not found.

    Fix it by adding offset_in_page() to the translated page address.

    -tj: Combined Eugene's and Petr's commit messages.

    Signed-off-by: Eugene Surovegin
    Signed-off-by: Tejun Heo
    Reported-by: Petr Tesarik
    Cc: stable@kernel.org

    Eugene Surovegin
     

03 Dec, 2011

1 commit

  • This patch adds kmemleak callbacks from the percpu allocator, reducing a
    number of false positives caused by kmemleak not scanning such memory
    blocks. The percpu chunks are never reported as leaks because of current
    kmemleak limitations with the __percpu pointer not pointing directly to
    the actual chunks.

    Reported-by: Huajun Li
    Acked-by: Christoph Lameter
    Acked-by: Tejun Heo
    Signed-off-by: Catalin Marinas

    Catalin Marinas
     

24 Nov, 2011

1 commit


23 Nov, 2011

2 commits

  • Percpu allocator recorded the cpus which map to the first and last
    units in pcpu_first/last_unit_cpu respectively and used them to
    determine the address range of a chunk - e.g. it assumed that the
    first unit has the lowest address in a chunk while the last unit has
    the highest address.

    This simply isn't true. Groups in a chunk can have arbitrary positive
    or negative offsets from the previous one and there is no guarantee
    that the first unit occupies the lowest offset while the last one the
    highest.

    Fix it by actually comparing unit offsets to determine cpus occupying
    the lowest and highest offsets. Also, rename pcu_first/last_unit_cpu
    to pcpu_low/high_unit_cpu to avoid confusion.

    The chunk address range is used to flush cache on vmalloc area
    map/unmap and decide whether a given address is in the first chunk by
    per_cpu_ptr_to_phys() and the bug was discovered by invalid
    per_cpu_ptr_to_phys() translation for crash_note.

    Kudos to Dave Young for tracking down the problem.

    Signed-off-by: Tejun Heo
    Reported-by: WANG Cong
    Reported-by: Dave Young
    Tested-by: Dave Young
    LKML-Reference:
    Cc: stable @kernel.org

    Tejun Heo
     
  • Currently pcpu_mem_alloc() is implemented always return zeroed memory.
    So rename it to make user like pcpu_get_pages_and_bitmap() know don't
    reinit it.

    Signed-off-by: Bob Liu
    Reviewed-by: Pekka Enberg
    Reviewed-by: Michal Hocko
    Signed-off-by: Tejun Heo

    Bob Liu
     

25 May, 2011

1 commit


24 May, 2011

1 commit


31 Mar, 2011

1 commit


29 Mar, 2011

1 commit

  • On 32-bit systems which don't happen to implicitly define or cast
    VMALLOC_START and/or VMALLOC_END to long in their arch headers, the
    printk in the percpu code will cause a warning to be emitted:

    mm/percpu.c: In function 'pcpu_embed_first_chunk':
    mm/percpu.c:1648: warning: format '%lx' expects type 'long unsigned int',
    but argument 3 has type 'unsigned int'

    So add an explicit cast to unsigned long here.

    Signed-off-by: Mike Frysinger
    Signed-off-by: Tejun Heo

    Mike Frysinger
     

28 Mar, 2011

1 commit

  • per_cpu_ptr_to_phys() uses VMALLOC_START and VMALLOC_END to determine if an
    address is in the vmalloc() region or not. This is incorrect on NOMMU as
    there is no real vmalloc() capability (vmalloc() is emulated by kmalloc()).

    The correct way to do this is to use is_vmalloc_addr(). This encapsulates the
    vmalloc() region test in MMU mode and just returns 0 in NOMMU mode.

    On FRV in NOMMU mode, the percpu compilation fails without this patch:

    mm/percpu.c: In function 'per_cpu_ptr_to_phys':
    mm/percpu.c:1011: error: 'VMALLOC_START' undeclared (first use in this function)
    mm/percpu.c:1011: error: (Each undeclared identifier is reported only once
    mm/percpu.c:1011: error: for each function it appears in.)
    mm/percpu.c:1012: error: 'VMALLOC_END' undeclared (first use in this function)
    mm/percpu.c:1018: warning: control reaches end of non-void function

    Signed-off-by: David Howells

    David Howells
     

25 Mar, 2011

1 commit

  • Percpu allocator honors alignment request upto PAGE_SIZE and both the
    percpu addresses in the percpu address space and the translated kernel
    addresses should be aligned accordingly. The calculation of the
    former depends on the alignment of percpu output section in the kernel
    image.

    The linker script macros PERCPU_VADDR() and PERCPU() are used to
    define this output section and the latter takes @align parameter.
    Several architectures are using @align smaller than PAGE_SIZE breaking
    percpu memory alignment.

    This patch removes @align parameter from PERCPU(), renames it to
    PERCPU_SECTION() and makes it always align to PAGE_SIZE. While at it,
    add PCPU_SETUP_BUG_ON() checks such that alignment problems are
    reliably detected and remove percpu alignment comment recently added
    in workqueue.c as the condition would trigger BUG way before reaching
    there.

    For um, this patch raises the alignment of percpu area. As the area
    is in .init, there shouldn't be any noticeable difference.

    This problem was discovered by David Howells while debugging boot
    failure on mn10300.

    Signed-off-by: Tejun Heo
    Acked-by: Mike Frysinger
    Cc: uclinux-dist-devel@blackfin.uclinux.org
    Cc: David Howells
    Cc: Jeff Dike
    Cc: user-mode-linux-devel@lists.sourceforge.net

    Tejun Heo
     

14 Jan, 2011

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
    Documentation/trace/events.txt: Remove obsolete sched_signal_send.
    writeback: fix global_dirty_limits comment runtime -> real-time
    ppc: fix comment typo singal -> signal
    drivers: fix comment typo diable -> disable.
    m68k: fix comment typo diable -> disable.
    wireless: comment typo fix diable -> disable.
    media: comment typo fix diable -> disable.
    remove doc for obsolete dynamic-printk kernel-parameter
    remove extraneous 'is' from Documentation/iostats.txt
    Fix spelling milisec -> ms in snd_ps3 module parameter description
    Fix spelling mistakes in comments
    Revert conflicting V4L changes
    i7core_edac: fix typos in comments
    mm/rmap.c: fix comment
    sound, ca0106: Fix assignment to 'channel'.
    hrtimer: fix a typo in comment
    init/Kconfig: fix typo
    anon_inodes: fix wrong function name in comment
    fix comment typos concerning "consistent"
    poll: fix a typo in comment
    ...

    Fix up trivial conflicts in:
    - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
    - fs/ext4/ext4.h

    Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.

    Linus Torvalds
     

08 Jan, 2011

1 commit

  • * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits)
    gameport: use this_cpu_read instead of lookup
    x86: udelay: Use this_cpu_read to avoid address calculation
    x86: Use this_cpu_inc_return for nmi counter
    x86: Replace uses of current_cpu_data with this_cpu ops
    x86: Use this_cpu_ops to optimize code
    vmstat: User per cpu atomics to avoid interrupt disable / enable
    irq_work: Use per cpu atomics instead of regular atomics
    cpuops: Use cmpxchg for xchg to avoid lock semantics
    x86: this_cpu_cmpxchg and this_cpu_xchg operations
    percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support
    percpu,x86: relocate this_cpu_add_return() and friends
    connector: Use this_cpu operations
    xen: Use this_cpu_inc_return
    taskstats: Use this_cpu_ops
    random: Use this_cpu_inc_return
    fs: Use this_cpu_inc_return in buffer.c
    highmem: Use this_cpu_xx_return() operations
    vmstat: Use this_cpu_inc_return for vm statistics
    x86: Support for this_cpu_add, sub, dec, inc_return
    percpu: Generic support for this_cpu_add, sub, dec, inc_return
    ...

    Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c}
    as per Tejun.

    Linus Torvalds
     

22 Dec, 2010

1 commit


07 Dec, 2010

1 commit


02 Nov, 2010

1 commit

  • "gadget", "through", "command", "maintain", "maintain", "controller", "address",
    "between", "initiali[zs]e", "instead", "function", "select", "already",
    "equal", "access", "management", "hierarchy", "registration", "interest",
    "relative", "memory", "offset", "already",

    Signed-off-by: Uwe Kleine-König
    Signed-off-by: Jiri Kosina

    Uwe Kleine-König
     

25 Oct, 2010

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
    Update broken web addresses in arch directory.
    Update broken web addresses in the kernel.
    Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget
    Revert "Fix typo: configuation => configuration" partially
    ida: document IDA_BITMAP_LONGS calculation
    ext2: fix a typo on comment in ext2/inode.c
    drivers/scsi: Remove unnecessary casts of private_data
    drivers/s390: Remove unnecessary casts of private_data
    net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data
    drivers/infiniband: Remove unnecessary casts of private_data
    drivers/gpu/drm: Remove unnecessary casts of private_data
    kernel/pm_qos_params.c: Remove unnecessary casts of private_data
    fs/ecryptfs: Remove unnecessary casts of private_data
    fs/seq_file.c: Remove unnecessary casts of private_data
    arm: uengine.c: remove C99 comments
    arm: scoop.c: remove C99 comments
    Fix typo configue => configure in comments
    Fix typo: configuation => configuration
    Fix typo interrest[ing|ed] => interest[ing|ed]
    Fix various typos of valid in comments
    ...

    Fix up trivial conflicts in:
    drivers/char/ipmi/ipmi_si_intf.c
    drivers/usb/gadget/rndis.c
    net/irda/irnet/irnet_ppp.c

    Linus Torvalds
     

23 Oct, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
    percpu: update comments to reflect that percpu allocations are always zero-filled
    percpu: Optimize __get_cpu_var()
    x86, percpu: Optimize this_cpu_ptr
    percpu: clear memory allocated with the km allocator
    percpu: fix build breakage on s390 and cleanup build configuration tests
    percpu: use percpu allocator on UP too
    percpu: reduce PCPU_MIN_UNIT_SIZE to 32k
    vmalloc: pcpu_get/free_vm_areas() aren't needed on UP

    Fixed up trivial conflicts in include/linux/percpu.h

    Linus Torvalds
     

21 Sep, 2010

1 commit

  • pcpu_first/last_unit_cpu are used to track which cpu has the first and
    last units assigned. This in turn is used to determine the span of a
    chunk for man/unmap cache flushes and whether an address belongs to
    the first chunk or not in per_cpu_ptr_to_phys().

    When the number of possible CPUs isn't power of two, a chunk may
    contain unassigned units towards the end of a chunk. The logic to
    determine pcpu_last_unit_cpu was incorrect when there was an unused
    unit at the end of a chunk. It failed to ignore the unused unit and
    assigned the unused marker NR_CPUS to pcpu_last_unit_cpu.

    This was discovered through kdump failure which was caused by
    malfunctioning per_cpu_ptr_to_phys() on a kvm setup with 50 possible
    CPUs by CAI Qian.

    Signed-off-by: Tejun Heo
    Reported-by: CAI Qian
    Cc: stable@kernel.org

    Tejun Heo
     

10 Sep, 2010

2 commits


08 Sep, 2010

1 commit

  • On UP, percpu allocations were redirected to kmalloc. This has the
    following problems.

    * For certain amount of allocations (determined by
    PERCPU_DYNAMIC_EARLY_SLOTS and PERCPU_DYNAMIC_EARLY_SIZE), percpu
    allocator can be used before the usual kernel memory allocator is
    brought online. On SMP, this is used to initialize the kernel
    memory allocator.

    * percpu allocator honors alignment upto PAGE_SIZE but kmalloc()
    doesn't. For example, workqueue makes use of larger alignments for
    cpu_workqueues.

    Currently, users of percpu allocators need to handle UP differently,
    which is somewhat fragile and ugly. Other than small amount of
    memory, there isn't much to lose by enabling percpu allocator on UP.
    It can simply use kernel memory based chunk allocation which was added
    for SMP archs w/o MMUs.

    This patch removes mm/percpu_up.c, builds mm/percpu.c on UP too and
    makes UP build use percpu-km. As percpu addresses and kernel
    addresses are always identity mapped and static percpu variables don't
    need any special treatment, nothing is arch dependent and mm/percpu.c
    implements generic setup_per_cpu_areas() for UP.

    Signed-off-by: Tejun Heo
    Reviewed-by: Christoph Lameter
    Acked-by: Pekka Enberg

    Tejun Heo
     

27 Aug, 2010

2 commits