26 May, 2011

1 commit

  • Due to commit dc326fca2b64 (x86, cpu: Clean up and unify the NOP selection infrastructure), we get the following warning:

    arch/x86/kernel/ftrace.c: In function ‘ftrace_make_nop’:
    arch/x86/kernel/ftrace.c:308:6: warning: assignment discards qualifiers from pointer target type
    arch/x86/kernel/ftrace.c: In function ‘ftrace_make_call’:
    arch/x86/kernel/ftrace.c:318:6: warning: assignment discards qualifiers from pointer target type

    ftrace_nop_replace() now returns const unsigned char *, so change its associated function/variable to its compatible type to keep compiler clam.

    Signed-off-by: Rakib Mullick
    Link: http://lkml.kernel.org/r/1305221620.7986.4.camel@localhost.localdomain

    [ updated for change of const void *src in probe_kernel_write() ]

    Signed-off-by: Steven Rostedt

    Rakib Mullick
     

19 Apr, 2011

1 commit

  • Clean up and unify the NOP selection infrastructure:

    - Make the atomic 5-byte NOP a part of the selection system.
    - Pick NOPs once during early boot and then be done with it.

    Signed-off-by: H. Peter Anvin
    Cc: Tejun Heo
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Jason Baron
    Link: http://lkml.kernel.org/r/1303166160-10315-3-git-send-email-hpa@linux.intel.com

    H. Peter Anvin
     

10 Mar, 2011

1 commit

  • Currently the index to the ret_stack is updated and the real return address
    is saved in the ret_stack. Then we call the trace function. The trace
    function could decide that it doesn't want to trace this function
    (ex. set_graph_function does not match) and it will return 0 which means
    not to trace this call.

    The normal function graph tracer has this code:

    if (!(trace->depth || ftrace_graph_addr(trace->func)) ||
    ftrace_graph_ignore_irqs())
    return 0;

    What this states is, if the trace depth (which is curr_ret_stack)
    is zero (top of nested functions) then test if we want to trace this
    function. If this function is not to be traced, then return 0 and
    the rest of the function graph tracer logic will not trace this function.

    The problem arises when an interrupt comes in after we updated the
    curr_ret_stack. The next function that gets called will have a trace->depth
    of 1. Which fools this trace code into thinking that we are in a nested
    function, and that we should trace. This causes interrupts to be traced
    when they should not be.

    The solution is to trace the function first and then update the ret_stack.

    Reported-by: zhiping zhong
    Reported-by: wu zhangjin
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

08 Jan, 2011

1 commit

  • * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits)
    gameport: use this_cpu_read instead of lookup
    x86: udelay: Use this_cpu_read to avoid address calculation
    x86: Use this_cpu_inc_return for nmi counter
    x86: Replace uses of current_cpu_data with this_cpu ops
    x86: Use this_cpu_ops to optimize code
    vmstat: User per cpu atomics to avoid interrupt disable / enable
    irq_work: Use per cpu atomics instead of regular atomics
    cpuops: Use cmpxchg for xchg to avoid lock semantics
    x86: this_cpu_cmpxchg and this_cpu_xchg operations
    percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support
    percpu,x86: relocate this_cpu_add_return() and friends
    connector: Use this_cpu operations
    xen: Use this_cpu_inc_return
    taskstats: Use this_cpu_ops
    random: Use this_cpu_inc_return
    fs: Use this_cpu_inc_return in buffer.c
    highmem: Use this_cpu_xx_return() operations
    vmstat: Use this_cpu_inc_return for vm statistics
    x86: Support for this_cpu_add, sub, dec, inc_return
    percpu: Generic support for this_cpu_add, sub, dec, inc_return
    ...

    Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c}
    as per Tejun.

    Linus Torvalds
     

30 Dec, 2010

1 commit

  • Go through x86 code and replace __get_cpu_var and get_cpu_var
    instances that refer to a scalar and are not used for address
    determinations.

    Cc: Yinghai Lu
    Cc: Ingo Molnar
    Acked-by: Tejun Heo
    Acked-by: "H. Peter Anvin"
    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Tejun Heo
     

18 Nov, 2010

1 commit

  • This patch is a logical extension of the protection provided by
    CONFIG_DEBUG_RODATA to LKMs. The protection is provided by
    splitting module_core and module_init into three logical parts
    each and setting appropriate page access permissions for each
    individual section:

    1. Code: RO+X
    2. RO data: RO+NX
    3. RW data: RW+NX

    In order to achieve proper protection, layout_sections() have
    been modified to align each of the three parts mentioned above
    onto page boundary. Next, the corresponding page access
    permissions are set right before successful exit from
    load_module(). Further, free_module() and sys_init_module have
    been modified to set module_core and module_init as RW+NX right
    before calling module_free().

    By default, the original section layout and access flags are
    preserved. When compiled with CONFIG_DEBUG_SET_MODULE_RONX=y,
    the patch will page-align each group of sections to ensure that
    each page contains only one type of content and will enforce
    RO/NX for each group of pages.

    -v1: Initial proof-of-concept patch.
    -v2: The patch have been re-written to reduce the number of #ifdefs
    and to make it architecture-agnostic. Code formatting has also
    been corrected.
    -v3: Opportunistic RO/NX protection is now unconditional. Section
    page-alignment is enabled when CONFIG_DEBUG_RODATA=y.
    -v4: Removed most macros and improved coding style.
    -v5: Changed page-alignment and RO/NX section size calculation
    -v6: Fixed comments. Restricted RO/NX enforcement to x86 only
    -v7: Introduced CONFIG_DEBUG_SET_MODULE_RONX, added
    calls to set_all_modules_text_rw() and set_all_modules_text_ro()
    in ftrace
    -v8: updated for compatibility with linux 2.6.33-rc5
    -v9: coding style fixes
    -v10: more coding style fixes
    -v11: minor adjustments for -tip
    -v12: minor adjustments for v2.6.35-rc2-tip
    -v13: minor adjustments for v2.6.37-rc1-tip

    Signed-off-by: Siarhei Liakh
    Signed-off-by: Xuxian Jiang
    Acked-by: Arjan van de Ven
    Reviewed-by: James Morris
    Signed-off-by: H. Peter Anvin
    Cc: Andi Kleen
    Cc: Rusty Russell
    Cc: Stephen Rothwell
    Cc: Dave Jones
    Cc: Kees Cook
    Cc: Linus Torvalds
    LKML-Reference:
    [ minor cleanliness edits, -v14: build failure fix ]
    Signed-off-by: Ingo Molnar

    matthieu castet
     

21 Sep, 2010

1 commit


27 Feb, 2010

1 commit


25 Feb, 2010

1 commit

  • The code in stop_machine that modifies the kernel text has a bit
    of logic to handle the case of NMIs. stop_machine does not prevent
    NMIs from executing, and if an NMI were to trigger on another CPU
    as the modifying CPU is changing the NMI text, a GPF could result.

    To prevent the GPF, the NMI calls ftrace_nmi_enter() which may
    modify the code first, then any other NMIs will just change the
    text to the same content which will do no harm. The code that
    stop_machine called must wait for NMIs to finish while it changes
    each location in the kernel. That code may also change the text
    to what the NMI changed it to. The key is that the text will never
    change content while another CPU is executing it.

    To make the above work, the call to ftrace_nmi_enter() must also
    do a smp_mb() as well as atomic_inc(). But for applications like
    perf that require a high number of NMIs for profiling, this can have
    a dramatic effect on the system. Not only is it doing a full memory
    barrier on both nmi_enter() as well as nmi_exit() it is also
    modifying a global variable with an atomic operation. This kills
    performance on large SMP machines.

    Since the memory barriers are only needed when ftrace is in the
    process of modifying the text (which is seldom), this patch
    adds a "modifying_code" variable that gets set before stop machine
    is executed and cleared afterwards.

    The NMIs will check this variable and store it in a per CPU
    "save_modifying_code" variable that it will use to check if it
    needs to do the memory barriers and atomic dec on NMI exit.

    Acked-by: Peter Zijlstra
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

17 Feb, 2010

1 commit

  • Most implementations of arch_syscall_addr() are the same, so create a
    default version in common code and move the one piece that differs (the
    syscall table) to asm/syscall.h. New arch ports don't have to waste
    time copying & pasting this simple function.

    The s390/sparc versions need to be different, so document why.

    Signed-off-by: Mike Frysinger
    Acked-by: David S. Miller
    Acked-by: Paul Mundt
    Acked-by: Heiko Carstens
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Mike Frysinger
     

09 Dec, 2009

1 commit

  • * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits)
    x86, mm: Correct the implementation of is_untracked_pat_range()
    x86/pat: Trivial: don't create debugfs for memtype if pat is disabled
    x86, mtrr: Fix sorting of mtrr after subtracting
    x86: Move find_smp_config() earlier and avoid bootmem usage
    x86, platform: Change is_untracked_pat_range() to bool; cleanup init
    x86: Change is_ISA_range() into an inline function
    x86, mm: is_untracked_pat_range() takes a normal semiclosed range
    x86, mm: Call is_untracked_pat_range() rather than is_ISA_range()
    x86: UV SGI: Don't track GRU space in PAT
    x86: SGI UV: Fix BAU initialization
    x86, numa: Use near(er) online node instead of roundrobin for NUMA
    x86, numa, bootmem: Only free bootmem on NUMA failure path
    x86: Change crash kernel to reserve via reserve_early()
    x86: Eliminate redundant/contradicting cache line size config options
    x86: When cleaning MTRRs, do not fold WP into UC
    x86: remove "extern" from function prototypes in
    x86, mm: Report state of NX protections during boot
    x86, mm: Clean up and simplify NX enablement
    x86, pageattr: Make set_memory_(x|nx) aware of NX support
    x86, sleep: Always save the value of EFER
    ...

    Fix up conflicts (added both iommu_shutdown and is_untracked_pat_range)
    to 'struct x86_platform_ops') in
    arch/x86/include/asm/x86_init.h
    arch/x86/kernel/x86_init.c

    Linus Torvalds
     

03 Nov, 2009

1 commit


14 Oct, 2009

1 commit

  • Most of the syscalls metadata processing is done from arch.
    But these operations are mostly generic accross archs. Especially now
    that we have a common variable name that expresses the number of
    syscalls supported by an arch: NR_syscalls, the only remaining bits
    that need to reside in arch is the syscall nr to addr translation.

    v2: Compare syscalls symbols only after the "sys" prefix so that we
    avoid spurious mismatches with archs that have syscalls wrappers,
    in which case syscalls symbols have "SyS" prefixed aliases.
    (Reported by: Heiko Carstens)

    Signed-off-by: Frederic Weisbecker
    Acked-by: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Li Zefan
    Cc: Masami Hiramatsu
    Cc: Jason Baron
    Cc: Lai Jiangshan
    Cc: Martin Schwidefsky
    Cc: Paul Mundt

    Frederic Weisbecker
     

12 Oct, 2009

1 commit


27 Aug, 2009

1 commit

  • Convert the syscalls event tracing code to use NR_syscalls, instead of
    FTRACE_SYSCALL_MAX. NR_syscalls is standard accross most arches, and
    reduces code confusion/complexity.

    Signed-off-by: Jason Baron
    Cc: Paul Mundt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Lai Jiangshan
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Mathieu Desnoyers
    Cc: Jiaying Zhang
    Cc: Martin Bligh
    Cc: Li Zefan
    Cc: Josh Stone
    Cc: Thomas Gleixner
    Cc: H. Peter Anwin
    Cc: Hendrik Brueckner
    Cc: Heiko Carstens
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Jason Baron
     

12 Aug, 2009

3 commits

  • The current state of syscalls tracepoints generates only one event id
    for every syscall events.

    This patch associates an id with each syscall trace event, so that we
    can identify each syscall trace event using the 'perf' tool.

    Signed-off-by: Jason Baron
    Cc: Lai Jiangshan
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Mathieu Desnoyers
    Cc: Jiaying Zhang
    Cc: Martin Bligh
    Cc: Li Zefan
    Cc: Masami Hiramatsu
    Signed-off-by: Frederic Weisbecker

    Jason Baron
     
  • Call arch_init_ftrace_syscalls at boot, so we can determine early the
    set of syscalls for the syscall trace events.

    Signed-off-by: Jason Baron
    Cc: Lai Jiangshan
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Mathieu Desnoyers
    Cc: Jiaying Zhang
    Cc: Martin Bligh
    Cc: Li Zefan
    Cc: Masami Hiramatsu
    Signed-off-by: Frederic Weisbecker

    Jason Baron
     
  • Add a new function to support translating a syscall name to number at
    runtime.
    This allows the syscall event tracer to map syscall names to number.

    Signed-off-by: Jason Baron
    Cc: Lai Jiangshan
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Mathieu Desnoyers
    Cc: Jiaying Zhang
    Cc: Martin Bligh
    Cc: Li Zefan
    Cc: Masami Hiramatsu
    Signed-off-by: Frederic Weisbecker

    Jason Baron
     

06 Aug, 2009

1 commit


19 Jun, 2009

1 commit

  • In case gcc does something funny with the stack frames, or the return
    from function code, we would like to detect that.

    An arch may implement passing of a variable that is unique to the
    function and can be saved on entering a function and can be tested
    when exiting the function. Usually the frame pointer can be used for
    this purpose.

    This patch also implements this for x86. Where it passes in the stack
    frame of the parent function, and will test that frame on exit.

    There was a case in x86_32 with optimize for size (-Os) where, for a
    few functions, gcc would align the stack frame and place a copy of the
    return address into it. The function graph tracer modified the copy and
    not the actual return address. On return from the funtion, it did not go
    to the tracer hook, but returned to the parent. This broke the function
    graph tracer, because the return of the parent (where gcc did not do
    this funky manipulation) returned to the location that the child function
    was suppose to. This caused strange kernel crashes.

    This test detected the problem and pointed out where the issue was.

    This modifies the parameters of one of the functions that the arch
    specific code calls, so it includes changes to arch code to accommodate
    the new prototype.

    Note, I notice that the parsic arch implements its own push_return_trace.
    This is now a generic function and the ftrace_push_return_trace should be
    used instead. This patch does not touch that code.

    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Frederic Weisbecker
    Cc: Helge Deller
    Cc: Kyle McMartin
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

14 May, 2009

1 commit

  • After upgrading from gcc 4.2.2 to 4.4.0, the function graph tracer broke.
    Investigating, I found that in the asm that replaces the return value,
    gcc was using the same register for the old value as it was for the
    new value.

    mov (addr), old
    mov new, (addr)

    But if old and new are the same register, we clobber new with old!
    I first thought this was a bug in gcc 4.4.0 and reported it:

    http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40132

    Andrew Pinski responded (quickly), saying that it was correct gcc behavior
    and the code needed to denote old as an "early clobber".

    Instead of "=r"(old), we need "=&r"(old).

    [Impact: keep function graph tracer from breaking with gcc 4.4.0 ]

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

09 Apr, 2009

1 commit

  • Impact: fix build warnings and possibe compat misbehavior on IA64

    Building a kernel on ia64 might trigger these ugly build warnings:

    CC arch/ia64/ia32/sys_ia32.o
    In file included from arch/ia64/ia32/sys_ia32.c:55:
    arch/ia64/ia32/ia32priv.h:290:1: warning: "elf_check_arch" redefined
    In file included from include/linux/elf.h:7,
    from include/linux/module.h:14,
    from include/linux/ftrace.h:8,
    from include/linux/syscalls.h:68,
    from arch/ia64/ia32/sys_ia32.c:18:
    arch/ia64/include/asm/elf.h:19:1: warning: this is the location of the previous definition
    [...]

    sys_ia32.c includes linux/syscalls.h which in turn includes linux/ftrace.h
    to import the syscalls tracing prototypes.

    But including ftrace.h can pull too much things for a low level file,
    especially on ia64 where the ia32 private headers conflict with higher
    level headers.

    Now we isolate the syscall tracing headers in their own lightweight file.

    Reported-by: Tony Luck
    Tested-by: Tony Luck
    Signed-off-by: Frederic Weisbecker
    Acked-by: Tony Luck
    Signed-off-by: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Jason Baron
    Cc: "Frank Ch. Eigler"
    Cc: Mathieu Desnoyers
    Cc: KOSAKI Motohiro
    Cc: Lai Jiangshan
    Cc: Jiaying Zhang
    Cc: Michael Rubin
    Cc: Martin Bligh
    Cc: Michael Davidson
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

07 Apr, 2009

1 commit


24 Mar, 2009

1 commit


19 Mar, 2009

1 commit

  • When I review the sensitive code ftrace_nmi_enter(), I found
    the atomic variable nmi_running does protect NMI VS do_ftrace_mod_code(),
    but it can not protects NMI(entered nmi) VS NMI(ftrace_nmi_enter()).

    cpu#1 | cpu#2 | cpu#3
    ftrace_nmi_enter() | do_ftrace_mod_code() |
    not modify | |
    ------------------------|-----------------------|--
    executing | set mod_code_write = 1|
    executing --|-----------------------|--------------------
    executing | | ftrace_nmi_enter()
    executing | | do modify
    ------------------------|-----------------------|-----------------
    ftrace_nmi_exit() | |

    cpu#3 may be being modified the code which is still being executed on cpu#1,
    it will have undefined results and possibly take a GPF, this patch
    prevents it occurred.

    Signed-off-by: Lai Jiangshan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Lai Jiangshan
     

13 Mar, 2009

1 commit


05 Mar, 2009

1 commit

  • Impact: decrease hangs risks with the graph tracer on slow systems

    Since the function graph tracer can spend too much time on timer
    interrupts, it's better now to use the more lightweight local
    clock. Anyway, the function graph traces are more reliable on a
    per cpu trace.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

23 Feb, 2009

1 commit


21 Feb, 2009

2 commits

  • Impact: fix to prevent NMI lockup

    If the page fault handler produces a WARN_ON in the modifying of
    text, and the system is setup to have a high frequency of NMIs,
    we can lock up the system on a failure to modify code.

    The modifying of code with NMIs allows all NMIs to modify the code
    if it is about to run. This prevents a modifier on one CPU from
    modifying code running in NMI context on another CPU. The modifying
    is done through stop_machine, so only NMIs must be considered.

    But if the write causes the page fault handler to produce a warning,
    the print can slow it down enough that as soon as it is done
    it will take another NMI before going back to the process context.
    The new NMI will perform the write again causing another print and
    this will hang the box.

    This patch turns off the writing as soon as a failure is detected
    and does not wait for it to be turned off by the process context.
    This will keep NMIs from getting stuck in this back and forth
    of print outs.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Impact: keep kernel text read only

    Because dynamic ftrace converts the calls to mcount into and out of
    nops at run time, we needed to always keep the kernel text writable.

    But this defeats the point of CONFIG_DEBUG_RODATA. This patch converts
    the kernel code to writable before ftrace modifies the text, and converts
    it back to read only afterward.

    The kernel text is converted to read/write, stop_machine is called to
    modify the code, then the kernel text is converted back to read only.

    The original version used SYSTEM_STATE to determine when it was OK
    or not to change the code to rw or ro. Andrew Morton pointed out that
    using SYSTEM_STATE is a bad idea since there is no guarantee to what
    its state will actually be.

    Instead, I moved the check into the set_kernel_text_* functions
    themselves, and use a local variable to determine when it is
    OK to change the kernel text RW permissions.

    [ Update: Ingo Molnar suggested moving the prototypes to cacheflush.h ]

    Reviewed-by: Andrew Morton
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

19 Feb, 2009

2 commits


11 Feb, 2009

5 commits


09 Feb, 2009

1 commit

  • When the function graph tracer picks a return address, it ensures this address
    is really a kernel text one by calling __kernel_text_address()

    Actually this path has never been taken.Its role was more likely to debug the tracer
    on the beginning of its development but this function is wasteful since it is called
    for every traced function.

    The fault check is already sufficient.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

08 Feb, 2009

2 commits

  • Impact: clean up

    Now that a generic in_nmi is available, this patch removes the
    special code in the ring_buffer and implements the in_nmi generic
    version instead.

    With this change, I was also able to rename the "arch_ftrace_nmi_enter"
    back to "ftrace_nmi_enter" and remove the code from the ring buffer.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The function graph tracer piggy backed onto the dynamic ftracer
    to use the in_nmi custom code for dynamic tracing. The problem
    was (as Andrew Morton pointed out) it really only wanted to bail
    out if the context of the current CPU was in NMI context. But the
    dynamic ftrace in_nmi custom code was true if _any_ CPU happened
    to be in NMI context.

    Now that we have a generic in_nmi interface, this patch changes
    the function graph code to use it instead of the dynamic ftarce
    custom code.

    Reported-by: Andrew Morton
    Signed-off-by: Steven Rostedt

    Steven Rostedt