16 Mar, 2010

1 commit

  • perf_arch_fetch_caller_regs() is exported for the overriden x86
    version, but not for the generic weak version.

    As a general rule, weak functions should not have their symbol
    exported in the same file they are defined.

    So let's export it on trace_event_perf.c as it is used by trace
    events only.

    This fixes:

    ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined!
    ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined!

    -v2: And also only build it if trace events are enabled.
    -v3: Fix changelog mistake

    Reported-by: Stephen Rothwell
    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Xiao Guangrong
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

12 Mar, 2010

1 commit


11 Mar, 2010

4 commits

  • This patch is an optimization in perf_event_task_sched_in() to avoid
    scheduling the events twice in a row.

    Without it, the perf_disable()/perf_enable() pair is invoked twice,
    thereby pinned events counts while scheduling flexible events and we go
    throuh hw_perf_enable() twice.

    By encapsulating, the whole sequence into perf_disable()/perf_enable() we
    ensure, hw_perf_enable() is going to be invoked only once because of the
    refcount protection.

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    eranian@google.com
     
  • Export perf_trace_regs and perf_arch_fetch_caller_regs since module will
    use these.

    Signed-off-by: Xiao Guangrong
    [ use EXPORT_PER_CPU_SYMBOL_GPL() ]
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Xiao Guangrong
     
  • From : Ananth N Mavinakayanahalli

    When freeing the instruction slot, the arithmetic to calculate
    the index of the slot in the page needs to account for the total
    size of the instruction on the various architectures.

    Calculate the index correctly when freeing the out-of-line
    execution slot.

    Reported-by: Sachin Sant
    Reported-by: Heiko Carstens
    Signed-off-by: Ananth N Mavinakayanahalli
    Signed-off-by: Masami Hiramatsu
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Masami Hiramatsu
     
  • Anton Blanchard found that he could reliably make the kernel hit a
    BUG_ON in the slab allocator by taking a cpu offline and then online
    while a system-wide perf record session was running.

    The reason is that when the cpu comes up, we completely reinitialize
    the ctx field of the struct perf_cpu_context for the cpu. If there is
    a system-wide perf record session running, then there will be a struct
    perf_event that has a reference to the context, so its refcount will
    be 2. (The perf_event has been removed from the context's group_entry
    and event_entry lists by perf_event_exit_cpu(), but that doesn't
    remove the perf_event's reference to the context and doesn't decrement
    the context's refcount.)

    When the cpu comes up, perf_event_init_cpu() gets called, and it calls
    __perf_event_init_context() on the cpu's context. That resets the
    refcount to 1. Then when the perf record session finishes and the
    perf_event is closed, the refcount gets decremented to 0 and the
    context gets kfreed after an RCU grace period. Since the context
    wasn't kmalloced -- it's part of a per-cpu variable -- bad things
    happen.

    In fact we don't need to completely reinitialize the context when the
    cpu comes up. It's sufficient to initialize the context once at boot,
    but we need to do it for all possible cpus.

    This moves the context initialization to happen at boot time. With
    this, we don't trash the refcount and the context never gets kfreed,
    and we don't hit the BUG_ON.

    Reported-by: Anton Blanchard
    Signed-off-by: Paul Mackerras
    Tested-by: Anton Blanchard
    Acked-by: Peter Zijlstra
    Cc:
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     

10 Mar, 2010

9 commits

  • Drop the obsolete "profile" naming used by perf for trace events.
    Perf can now do more than simple events counting, so generalize
    the API naming.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Cc: Masami Hiramatsu
    Cc: Jason Baron

    Frederic Weisbecker
     
  • We are taking a wrong regs snapshot when a trace event triggers.
    Either we use get_irq_regs(), which gives us the interrupted
    registers if we are in an interrupt, or we use task_pt_regs()
    which gives us the state before we entered the kernel, assuming
    we are lucky enough to be no kernel thread, in which case
    task_pt_regs() returns the initial set of regs when the kernel
    thread was started.

    What we want is different. We need a hot snapshot of the regs,
    so that we can get the instruction pointer to record in the
    sample, the frame pointer for the callchain, and some other
    things.

    Let's use the new perf_fetch_caller_regs() for that.

    Comparison with perf record -e lock: -R -a -f -g
    Before:

    perf [kernel] [k] __do_softirq
    |
    --- __do_softirq
    |
    |--55.16%-- __open
    |
    --44.84%-- __write_nocancel

    After:

    perf [kernel] [k] perf_tp_event
    |
    --- perf_tp_event
    |
    |--41.07%-- lock_acquire
    | |
    | |--39.36%-- _raw_spin_lock
    | | |
    | | |--7.81%-- hrtimer_interrupt
    | | | smp_apic_timer_interrupt
    | | | apic_timer_interrupt

    The old case was producing unreliable callchains. Now having
    right frame and instruction pointers, we have the trace we
    want.

    Also syscalls and kprobe events already have the right regs,
    let's use them instead of wasting a retrieval.

    v2: Follow the rename perf_save_regs() -> perf_fetch_caller_regs()

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Cc: Arnaldo Carvalho de Melo
    Cc: Masami Hiramatsu
    Cc: Jason Baron
    Cc: Archs

    Frederic Weisbecker
     
  • Events that trigger overflows by interrupting a context can
    use get_irq_regs() or task_pt_regs() to retrieve the state
    when the event triggered. But this is not the case for some
    other class of events like trace events as tracepoints are
    executed in the same context than the code that triggered
    the event.

    It means we need a different api to capture the regs there,
    namely we need a hot snapshot to get the most important
    informations for perf: the instruction pointer to get the
    event origin, the frame pointer for the callchain, the code
    segment for user_mode() tests (we always use __KERNEL_CS as
    trace events always occur from the kernel) and the eflags
    for further purposes.

    v2: rename perf_save_regs to perf_fetch_caller_regs as per
    Masami's suggestion.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Cc: Arnaldo Carvalho de Melo
    Cc: Masami Hiramatsu
    Cc: Jason Baron
    Cc: Archs

    Frederic Weisbecker
     
  • There are rcu locked read side areas in the path where we submit
    a trace event. And these rcu_read_(un)lock() trigger lock events,
    which create recursive events.

    One pair in do_perf_sw_event:

    __lock_acquire
    |
    |--96.11%-- lock_acquire
    | |
    | |--27.21%-- do_perf_sw_event
    | | perf_tp_event
    | | |
    | | |--49.62%-- ftrace_profile_lock_release
    | | | lock_release
    | | | |
    | | | |--33.85%-- _raw_spin_unlock

    Another pair in perf_output_begin/end:

    __lock_acquire
    |--23.40%-- perf_output_begin
    | | __perf_event_overflow
    | | perf_swevent_overflow
    | | perf_swevent_add
    | | perf_swevent_ctx_event
    | | do_perf_sw_event
    | | perf_tp_event
    | | |
    | | |--55.37%-- ftrace_profile_lock_acquire
    | | | lock_acquire
    | | | |
    | | | |--37.31%-- _raw_spin_lock

    The problem is not that much the trace recursion itself, as we have a
    recursion protection already (though it's always wasteful to recurse).
    But the trace events are outside the lockdep recursion protection, then
    each lockdep event triggers a lock trace, which will trigger two
    other lockdep events. Here the recursive lock trace event won't
    be taken because of the trace recursion, so the recursion stops there
    but lockdep will still analyse these new events:

    To sum up, for each lockdep events we have:

    lock_*()
    |
    trace lock_acquire
    |
    ----- rcu_read_lock()
    | |
    | lock_acquire()
    | |
    | trace_lock_acquire() (stopped)
    | |
    | lockdep analyze
    |
    ----- rcu_read_unlock()
    |
    lock_release
    |
    trace_lock_release() (stopped)
    |
    lockdep analyze

    And you can repeat the above two times as we have two rcu read side
    sections when we submit an event.

    This is fixed in this patch by moving the lock trace event under
    the lockdep recursion protection.

    Signed-off-by: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Steven Rostedt
    Cc: Paul Mackerras
    Cc: Hitoshi Mitake
    Cc: Li Zefan
    Cc: Lai Jiangshan
    Cc: Masami Hiramatsu
    Cc: Jens Axboe

    Frederic Weisbecker
     
  • Try to avoid useless rotation and PMU disables.

    [ Could be improved by keeping a nr_runnable count to better account
    for the < PERF_STAT_INACTIVE counters ]

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: paulus@samba.org
    Cc: eranian@google.com
    Cc: robert.richter@amd.com
    Cc: fweisbec@gmail.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Currently we always call hw_perf_disable(), even if its already disabled,
    this seems superflous, esp. since it cannot be made NMI safe (see further
    patches).

    Signed-off-by: Peter Zijlstra
    Cc: paulus@samba.org
    Cc: eranian@google.com
    Cc: robert.richter@amd.com
    Cc: fweisbec@gmail.com
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Remove the hw_perf_event_*() hotplug hooks in favour of per PMU hotplug
    notifiers. This has the advantage of reducing the static weak interface
    as well as exposing all hotplug actions to the PMU.

    Use this to fix x86 hotplug usage where we did things in ONLINE which
    should have been done in UP_PREPARE or STARTING.

    Signed-off-by: Peter Zijlstra
    Cc: Paul Mundt
    Cc: paulus@samba.org
    Cc: eranian@google.com
    Cc: robert.richter@amd.com
    Cc: fweisbec@gmail.com
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • This makes it easier to extend perf_sample_data and fixes a bug on arm
    and sparc, which failed to set ->raw to NULL, which can cause crashes
    when combined with PERF_SAMPLE_RAW.

    It also optimizes PowerPC and tracepoint, because the struct
    initialization is forced to zero out the whole structure.

    Signed-off-by: Peter Zijlstra
    Acked-by: Jean Pihet
    Reviewed-by: Frederic Weisbecker
    Acked-by: David S. Miller
    Cc: Jamie Iles
    Cc: Paul Mackerras
    Cc: Stephane Eranian
    Cc: stable@kernel.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Conflicts:
    tools/perf/util/probe-event.c

    Merge reason: Pick up -rc1 and resolve the conflict as well.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

08 Mar, 2010

5 commits

  • A little more whack-a-mole annotating the dynamic sysfs attributes. I
    had everything built into my earlier test kernel, and so I missed
    these.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • These are the non-static sysfs attributes that exist on
    my test machine. Fix them to use sysfs_attr_init or
    sysfs_bin_attr_init as appropriate. It simply requires
    making a sysfs attribute present to see this. So this
    is a little bit tedious but otherwise not too bad.

    Signed-off-by: Eric W. Biederman
    Acked-by: WANG Cong
    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Constify struct sysfs_ops.

    This is part of the ops structure constification
    effort started by Arjan van de Ven et al.

    Benefits of this constification:

    * prevents modification of data that is shared
    (referenced) by many other structure instances
    at runtime

    * detects/prevents accidental (but not intentional)
    modification attempts on archs that enforce
    read-only kernel data at runtime

    * potentially better optimized code as the compiler
    can assume that the const data cannot be changed

    * the compiler/linker move const data into .rodata
    and therefore exclude them from false sharing

    Signed-off-by: Emese Revfy
    Acked-by: David Teigland
    Acked-by: Matt Domsch
    Acked-by: Maciej Sosnowski
    Acked-by: Hans J. Koch
    Acked-by: Pekka Enberg
    Acked-by: Jens Axboe
    Acked-by: Stephen Hemminger
    Signed-off-by: Greg Kroah-Hartman

    Emese Revfy
     
  • Constify struct kset_uevent_ops.

    This is part of the ops structure constification
    effort started by Arjan van de Ven et al.

    Benefits of this constification:

    * prevents modification of data that is shared
    (referenced) by many other structure instances
    at runtime

    * detects/prevents accidental (but not intentional)
    modification attempts on archs that enforce
    read-only kernel data at runtime

    * potentially better optimized code as the compiler
    can assume that the const data cannot be changed

    * the compiler/linker move const data into .rodata
    and therefore exclude them from false sharing

    Signed-off-by: Emese Revfy
    Signed-off-by: Greg Kroah-Hartman

    Emese Revfy
     
  • Passing the attribute to the low level IO functions allows all kinds
    of cleanups, by sharing low level IO code without requiring
    an own function for every piece of data.

    Also drivers can extend the attributes with own data fields
    and use that in the low level function.

    Similar to sysdev_attributes and normal attributes.

    This is a tree-wide sweep, converting everything in one go.

    No functional changes in this patch other than passing the new
    argument everywhere.

    Tested on x86, the non x86 parts are uncompiled.

    Signed-off-by: Andi Kleen
    Signed-off-by: Greg Kroah-Hartman

    Andi Kleen
     

07 Mar, 2010

16 commits

  • The current ELF dumper implementation can produce broken corefiles if
    program headers exceed 65535. This number is determined by the number of
    vmas which the process have. In particular, some extreme programs may use
    more than 65535 vmas. (If you google max_map_count, you can find some
    users facing this problem.) This kind of program never be able to generate
    correct coredumps.

    This patch implements ``extended numbering'' that uses sh_info field of
    the first section header instead of e_phnum field in order to represent
    upto 4294967295 vmas.

    This is supported by
    AMD64-ABI(http://www.x86-64.org/documentation.html) and
    Solaris(http://docs.sun.com/app/docs/doc/817-1984/).
    Of course, we are preparing patches for gdb and binutils.

    Signed-off-by: Daisuke HATAYAMA
    Cc: "Luck, Tony"
    Cc: Jeff Dike
    Cc: David Howells
    Cc: Greg Ungerer
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Alexander Viro
    Cc: Andi Kleen
    Cc: Alan Cox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke HATAYAMA
     
  • elf_core_dump() and elf_fdpic_core_dump() use #ifdef and the corresponding
    macro for hiding _multiline_ logics in functions. This patch removes
    #ifdef and replaces ELF_CORE_EXTRA_* by corresponding functions. For
    architectures not implemeonting ELF_CORE_EXTRA_*, we use weak functions in
    order to reduce a range of modification.

    This cleanup is for my next patches, but I think this cleanup itself is
    worth doing regardless of my firnal purpose.

    Signed-off-by: Daisuke HATAYAMA
    Cc: "Luck, Tony"
    Cc: Jeff Dike
    Cc: David Howells
    Cc: Greg Ungerer
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Alexander Viro
    Cc: Andi Kleen
    Cc: Alan Cox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke HATAYAMA
     
  • kernel/printk.c:72: warning: `saved_console_loglevel' defined but not used

    Signed-off-by: Gustavo F. Padovan
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gustavo F. Padovan
     
  • tasklist_lock does protect the task and its pid, it can't go away. The
    problem is that find_pid_ns() itself is unsafe without rcu lock, it can
    race with copy_process()->free_pid(any_pid).

    Protecting copy_process()->free_pid(any_pid) with tasklist_lock would make
    it possible to call find_task_by_pid_ns() under tasklist safely, but we
    don't do so because we are trying to get rid of the read_lock sites of
    tasklist_lock.

    Signed-off-by: Tetsuo Handa
    Cc: Oleg Nesterov
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • I've had some complaints about panic_timeout being wildly innacurate on
    shared processor PowerPC partitions (a 3 minute panic_timeout taking 30
    minutes).

    The problem is we loop on mdelay(1) and with a 1ms in 10ms hypervisor
    timeslice each of these will take 10ms (ie 10x) longer. I expect other
    platforms with shared processor hypervisors will see the same issue.

    This patch keeps the old behaviour if we have a panic_blink (only keyboard
    LEDs right now) and does 1 second mdelays if we don't.

    Signed-off-by: Anton Blanchard
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     
  • Make sure compiler won't do weird things with limits. E.g. fetching them
    twice may return 2 different values after writable limits are implemented.

    I.e. either use rlimit helpers added in commit 3e10e716abf3 ("resource:
    add helpers for fetching rlimits") or ACCESS_ONCE if not applicable.

    Signed-off-by: Jiri Slaby
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • Fetch rlimit (both hard and soft) values only once and work on them. It
    removes many accesses through sig structure and makes the code cleaner.

    Mostly a preparation for writable resource limits support.

    Signed-off-by: Jiri Slaby
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: john stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • kernel/exit.c:1183:26: warning: symbol 'status' shadows an earlier one
    kernel/exit.c:1173:21: originally declared here

    Signed-off-by: Thiago Farina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thiago Farina
     
  • Fix the following 'make includecheck' warning:
    kernel/params.c: linux/string.h is included more than once.

    Signed-off-by: Jaswinder Singh Rajput
    Cc: André Goddard Rosa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jaswinder Singh Rajput
     
  • "ret" needs to be signed or the error handling for splice_to_pipe() won't
    work correctly.

    Signed-off-by: Dan Carpenter
    Cc: Tom Zanussi
    Cc: Jens Axboe
    Cc: Lai Jiangshan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • Additional_cpus is only supported for IA64 now. X86_64 should not be
    included.

    Signed-off-by: Chen Gong
    Cc: Ingo Molnar
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen Gong
     
  • There are quite a few GFP_KERNEL memory allocations made during
    suspend/hibernation and resume that may cause the system to hang, because
    the I/O operations they depend on cannot be completed due to the
    underlying devices being suspended.

    Avoid this problem by clearing the __GFP_IO and __GFP_FS bits in
    gfp_allowed_mask before suspend/hibernation and restoring the original
    values of these bits in gfp_allowed_mask durig the subsequent resume.

    [akpm@linux-foundation.org: fix CONFIG_PM=n linkage]
    Signed-off-by: Rafael J. Wysocki
    Reported-by: Maxim Levitsky
    Cc: Sebastian Ott
    Cc: Benjamin Herrenschmidt
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • The old anon_vma code can lead to scalability issues with heavily forking
    workloads. Specifically, each anon_vma will be shared between the parent
    process and all its child processes.

    In a workload with 1000 child processes and a VMA with 1000 anonymous
    pages per process that get COWed, this leads to a system with a million
    anonymous pages in the same anon_vma, each of which is mapped in just one
    of the 1000 processes. However, the current rmap code needs to walk them
    all, leading to O(N) scanning complexity for each page.

    This can result in systems where one CPU is walking the page tables of
    1000 processes in page_referenced_one, while all other CPUs are stuck on
    the anon_vma lock. This leads to catastrophic failure for a benchmark
    like AIM7, where the total number of processes can reach in the tens of
    thousands. Real workloads are still a factor 10 less process intensive
    than AIM7, but they are catching up.

    This patch changes the way anon_vmas and VMAs are linked, which allows us
    to associate multiple anon_vmas with a VMA. At fork time, each child
    process gets its own anon_vmas, in which its COWed pages will be
    instantiated. The parents' anon_vma is also linked to the VMA, because
    non-COWed pages could be present in any of the children.

    This reduces rmap scanning complexity to O(1) for the pages of the 1000
    child processes, with O(N) complexity for at most 1/N pages in the system.
    This reduces the average scanning cost in heavily forking workloads from
    O(N) to 2.

    The only real complexity in this patch stems from the fact that linking a
    VMA to anon_vmas now involves memory allocations. This means vma_adjust
    can fail, if it needs to attach a VMA to anon_vma structures. This in
    turn means error handling needs to be added to the calling functions.

    A second source of complexity is that, because there can be multiple
    anon_vmas, the anon_vma linking in vma_adjust can no longer be done under
    "the" anon_vma lock. To prevent the rmap code from walking up an
    incomplete VMA, this patch introduces the VM_LOCK_RMAP VMA flag. This bit
    flag uses the same slot as the NOMMU VM_MAPPED_COPY, with an ifdef in mm.h
    to make sure it is impossible to compile a kernel that needs both symbolic
    values for the same bitflag.

    Some test results:

    Without the anon_vma changes, when AIM7 hits around 9.7k users (on a test
    box with 16GB RAM and not quite enough IO), the system ends up running
    >99% in system time, with every CPU on the same anon_vma lock in the
    pageout code.

    With these changes, AIM7 hits the cross-over point around 29.7k users.
    This happens with ~99% IO wait time, there never seems to be any spike in
    system time. The anon_vma lock contention appears to be resolved.

    [akpm@linux-foundation.org: cleanups]
    Signed-off-by: Rik van Riel
    Cc: KOSAKI Motohiro
    Cc: Larry Woodman
    Cc: Lee Schermerhorn
    Cc: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Considering the nature of per mm stats, it's the shared object among
    threads and can be a cache-miss point in the page fault path.

    This patch adds per-thread cache for mm_counter. RSS value will be
    counted into a struct in task_struct and synchronized with mm's one at
    events.

    Now, in this patch, the event is the number of calls to handle_mm_fault.
    Per-thread value is added to mm at each 64 calls.

    rough estimation with small benchmark on parallel thread (2threads) shows
    [before]
    4.5 cache-miss/faults
    [after]
    4.0 cache-miss/faults
    Anyway, the most contended object is mmap_sem if the number of threads grows.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Minchan Kim
    Cc: Christoph Lameter
    Cc: Lee Schermerhorn
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Presently, per-mm statistics counter is defined by macro in sched.h

    This patch modifies it to
    - defined in mm.h as inlinf functions
    - use array instead of macro's name creation.

    This patch is for reducing patch size in future patch to modify
    implementation of per-mm counter.

    Signed-off-by: KAMEZAWA Hiroyuki
    Reviewed-by: Minchan Kim
    Cc: Christoph Lameter
    Cc: Lee Schermerhorn
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Rename for_each_bit to for_each_set_bit in the kernel source tree. To
    permit for_each_clear_bit(), should that ever be added.

    The patch includes a macro to map the old for_each_bit() onto the new
    for_each_set_bit(). This is a (very) temporary thing to ease the migration.

    [akpm@linux-foundation.org: add temporary for_each_bit()]
    Suggested-by: Alexey Dobriyan
    Suggested-by: Andrew Morton
    Signed-off-by: Akinobu Mita
    Cc: "David S. Miller"
    Cc: Russell King
    Cc: David Woodhouse
    Cc: Artem Bityutskiy
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

06 Mar, 2010

2 commits

  • …nel/git/tip/linux-2.6-tip

    * 'perf-probes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: Issue at least one memory barrier in stop_machine_text_poke()
    perf probe: Correct probe syntax on command line help
    perf probe: Add lazy line matching support
    perf probe: Show more lines after last line
    perf probe: Check function address range strictly in line finder
    perf probe: Use libdw callback routines
    perf probe: Use elfutils-libdw for analyzing debuginfo
    perf probe: Rename probe finder functions
    perf probe: Fix bugs in line range finder
    perf probe: Update perf probe document
    perf probe: Do not show --line option without dwarf support
    kprobes: Add documents of jump optimization
    kprobes/x86: Support kprobes jump optimization on x86
    x86: Add text_poke_smp for SMP cross modifying code
    kprobes/x86: Cleanup save/restore registers
    kprobes/x86: Boost probes when reentering
    kprobes: Jump optimization sysctl interface
    kprobes: Introduce kprobes jump optimization
    kprobes: Introduce generic insn_slot framework
    kprobes/x86: Cleanup RELATIVEJUMP_INSTRUCTION to RELATIVEJUMP_OPCODE

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    padata: Allocate the cpumask for the padata instance
    crypto: authenc - Move saved IV in front of the ablkcipher request
    crypto: hash - Fix handling of unaligned buffers
    crypto: authenc - Use correct ahash complete functions
    crypto: md5 - Set statesize

    Linus Torvalds
     

05 Mar, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
    init: Open /dev/console from rootfs
    mqueue: fix typo "failues" -> "failures"
    mqueue: only set error codes if they are really necessary
    mqueue: simplify do_open() error handling
    mqueue: apply mathematics distributivity on mq_bytes calculation
    mqueue: remove unneeded info->messages initialization
    mqueue: fix mq_open() file descriptor leak on user-space processes
    fix race in d_splice_alias()
    set S_DEAD on unlink() and non-directory rename() victims
    vfs: add NOFOLLOW flag to umount(2)
    get rid of ->mnt_parent in tomoyo/realpath
    hppfs can use existing proc_mnt, no need for do_kern_mount() in there
    Mirror MS_KERNMOUNT in ->mnt_flags
    get rid of useless vfsmount_lock use in put_mnt_ns()
    Take vfsmount_lock to fs/internal.h
    get rid of insanity with namespace roots in tomoyo
    take check for new events in namespace (guts of mounts_poll()) to namespace.c
    Don't mess with generic_permission() under ->d_lock in hpfs
    sanitize const/signedness for udf
    nilfs: sanitize const/signedness in dealing with ->d_name.name
    ...

    Fix up fairly trivial (famous last words...) conflicts in
    drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c

    Linus Torvalds
     

04 Mar, 2010

1 commit