21 Sep, 2009

1 commit

  • Bye-bye Performance Counters, welcome Performance Events!

    In the past few months the perfcounters subsystem has grown out its
    initial role of counting hardware events, and has become (and is
    becoming) a much broader generic event enumeration, reporting, logging,
    monitoring, analysis facility.

    Naming its core object 'perf_counter' and naming the subsystem
    'perfcounters' has become more and more of a misnomer. With pending
    code like hw-breakpoints support the 'counter' name is less and
    less appropriate.

    All in one, we've decided to rename the subsystem to 'performance
    events' and to propagate this rename through all fields, variables
    and API names. (in an ABI compatible fashion)

    The word 'event' is also a bit shorter than 'counter' - which makes
    it slightly more convenient to write/handle as well.

    Thanks goes to Stephane Eranian who first observed this misnomer and
    suggested a rename.

    User-space tooling and ABI compatibility is not affected - this patch
    should be function-invariant. (Also, defconfigs were not touched to
    keep the size down.)

    This patch has been generated via the following script:

    FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

    sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

    for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
    done

    FILES=$(find . -name perf_event.*)

    sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

    ... to keep it as correct as possible. This script can also be
    used by anyone who has pending perfcounters patches - it converts
    a Linux kernel tree over to the new naming. We tried to time this
    change to the point in time where the amount of pending patches
    is the smallest: the end of the merge window.

    Namespace clashes were fixed up in a preparatory patch - and some
    stylistic fallout will be fixed up in a subsequent patch.

    ( NOTE: 'counters' are still the proper terminology when we deal
    with hardware registers - and these sed scripts are a bit
    over-eager in renaming them. I've undone some of that, but
    in case there's something left where 'counter' would be
    better than 'event' we can undo that on an individual basis
    instead of touching an otherwise nicely automated patch. )

    Suggested-by: Stephane Eranian
    Acked-by: Peter Zijlstra
    Acked-by: Paul Mackerras
    Reviewed-by: Arjan van de Ven
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: Kyle McMartin
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

12 Sep, 2009

1 commit

  • …el/git/tip/linux-2.6-tip

    * 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (105 commits)
    ring-buffer: only enable ring_buffer_swap_cpu when needed
    ring-buffer: check for swapped buffers in start of committing
    tracing: report error in trace if we fail to swap latency buffer
    tracing: add trace_array_printk for internal tracers to use
    tracing: pass around ring buffer instead of tracer
    tracing: make tracing_reset safe for external use
    tracing: use timestamp to determine start of latency traces
    tracing: Remove mentioning of legacy latency_trace file from documentation
    tracing/filters: Defer pred allocation, fix memory leak
    tracing: remove users of tracing_reset
    tracing: disable buffers and synchronize_sched before resetting
    tracing: disable update max tracer while reading trace
    tracing: print out start and stop in latency traces
    ring-buffer: disable all cpu buffers when one finds a problem
    ring-buffer: do not count discarded events
    ring-buffer: remove ring_buffer_event_discard
    ring-buffer: fix ring_buffer_read crossing pages
    ring-buffer: remove unnecessary cpu_relax
    ring-buffer: do not swap buffers during a commit
    ring-buffer: do not reset while in a commit
    ...

    Linus Torvalds
     

11 Sep, 2009

2 commits


26 Aug, 2009

1 commit

  • s/HAVE_FTRACE_SYSCALLS/HAVE_SYSCALL_TRACEPOINTS/g
    s/TIF_SYSCALL_FTRACE/TIF_SYSCALL_TRACEPOINT/g

    The syscall enter/exit tracing is no longer specific to just ftrace, so
    they now have names that reflect their tie to tracepoints instead.

    Signed-off-by: Josh Stone
    Cc: Jason Baron
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Li Zefan
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Mathieu Desnoyers
    Cc: Jiaying Zhang
    Cc: Martin Bligh
    Cc: Lai Jiangshan
    Cc: Paul Mundt
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Josh Stone
     

07 Jul, 2009

3 commits


22 Jun, 2009

1 commit


16 Jun, 2009

1 commit


12 Jun, 2009

5 commits


10 Apr, 2009

1 commit

  • Impact: performance regression fix for s390

    The adaptive spinning mutexes will not always do what one would expect on
    virtualized architectures like s390. Especially the cpu_relax() loop in
    mutex_spin_on_owner might hurt if the mutex holding cpu has been scheduled
    away by the hypervisor.

    We would end up in a cpu_relax() loop when there is no chance that the
    state of the mutex changes until the target cpu has been scheduled again by
    the hypervisor.

    For that reason we should change the default behaviour to no-spin on s390.

    We do have an instruction which allows to yield the current cpu in favour of
    a different target cpu. Also we have an instruction which allows us to figure
    out if the target cpu is physically backed.

    However we need to do some performance tests until we can come up with
    a solution that will do the right thing on s390.

    Signed-off-by: Heiko Carstens
    Acked-by: Peter Zijlstra
    Cc: Martin Schwidefsky
    Cc: Christian Borntraeger
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Heiko Carstens
     

01 Apr, 2009

1 commit

  • CONFIG_DEBUG_PAGEALLOC is now supported by x86, powerpc, sparc64, and
    s390. This patch implements it for the rest of the architectures by
    filling the pages with poison byte patterns after free_pages() and
    verifying the poison patterns before alloc_pages().

    This generic one cannot detect invalid page accesses immediately but
    invalid read access may cause invalid dereference by poisoned memory and
    invalid write access can be detected after a long delay.

    Signed-off-by: Akinobu Mita
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

26 Mar, 2009

3 commits

  • The zcore code switches to real addressing mode when creating a kernel dump.
    This is not possible, if it is built as a kernel module. With this patch
    zcore (zfcpdump) can't be built as a kernel module any more.

    Signed-off-by: Michael Holzheu
    Signed-off-by: Martin Schwidefsky

    Michael Holzheu
     
  • Everybody enables it so there is no point for an extra config option.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • With CONFIG_NET not set appldata build breaks on s390.

    arch/s390/appldata/built-in.o: In function appldata_get_net_sum_data:
    appldata_net_sum.c:(.text+0x2684): undefined reference to dev_get_stats
    appldata_net_sum.c:(.text+0x2688): undefined reference to init_net
    appldata_net_sum.c:(.text+0x268c): undefined reference to init_net
    appldata_net_sum.c:(.text+0x2694): undefined reference to dev_base_lock

    The following patch fixes the issue for me.

    Signed-off-by: Sachin Sant
    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Sachin Sant
     

14 Jan, 2009

1 commit


06 Jan, 2009

1 commit


03 Jan, 2009

1 commit

  • …/git/tip/linux-2.6-tip

    * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
    x86: export vector_used_by_percpu_irq
    x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
    sched: nominate preferred wakeup cpu, fix
    x86: fix lguest used_vectors breakage, -v2
    x86: fix warning in arch/x86/kernel/io_apic.c
    sched: fix warning in kernel/sched.c
    sched: move test_sd_parent() to an SMP section of sched.h
    sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
    sched: activate active load balancing in new idle cpus
    sched: bias task wakeups to preferred semi-idle packages
    sched: nominate preferred wakeup cpu
    sched: favour lower logical cpu number for sched_mc balance
    sched: framework for sched_mc/smt_power_savings=N
    sched: convert BALANCE_FOR_xx_POWER to inline functions
    x86: use possible_cpus=NUM to extend the possible cpus allowed
    x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
    x86: update io_apic.c to the new cpumask code
    x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
    x86: xen: use smp_call_function_many()
    x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
    ...

    Fixed up trivial conflict in kernel/time/tick-sched.c manually

    Linus Torvalds
     

25 Dec, 2008

6 commits


13 Dec, 2008

1 commit

  • Impact: cleanup

    Each SMP arch defines these themselves. Move them to a central
    location.

    Twists:
    1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a
    CONFIG_INIT_ALL_POSSIBLE for this rather than break them.

    2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'.
    Those archs simply have phys_cpu_present_map replaced everywhere.

    3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky
    so I just manipulate them both in sync.

    4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map'
    declarations.

    Signed-off-by: Rusty Russell
    Reviewed-by: Grant Grundler
    Tested-by: Tony Luck
    Acked-by: Ingo Molnar
    Cc: Mike Travis
    Cc: ink@jurassic.park.msu.ru
    Cc: rmk@arm.linux.org.uk
    Cc: starvik@axis.com
    Cc: tony.luck@intel.com
    Cc: takata@linux-m32r.org
    Cc: ralf@linux-mips.org
    Cc: grundler@parisc-linux.org
    Cc: paulus@samba.org
    Cc: schwidefsky@de.ibm.com
    Cc: lethal@linux-sh.org
    Cc: wli@holomorphy.com
    Cc: davem@davemloft.net
    Cc: jdike@addtoit.com
    Cc: mingo@redhat.com

    Rusty Russell
     

28 Oct, 2008

2 commits

  • We got a stack overflow with a small stack configuration on a 32 bit
    system. It just looks like as 4kb isn't enough and too dangerous.
    So lets get rid of 4kb stacks on 32 bit.

    But one thing I completely dislike about the call trace below is that
    just for debugging or tracing purposes sprintf gets called (cio_start_key):

    /* process condition code */
    sprintf(dbf_txt, "ccode:%d", ccode);
    CIO_TRACE_EVENT(4, dbf_txt);

    But maybe its just me who thinks that this could be done better.

    Kernel stack overflow.
    Modules linked in: dm_multipath sunrpc bonding qeth_l2 dm_mod qeth ccwgroup vmur
    CPU: 1 Not tainted 2.6.27-30.x.20081015-s390default #1
    Process httpd (pid: 3807, task: 20ae2df8, ksp: 1666fb78)
    Krnl PSW : 040c0000 8027098a (number+0xe/0x348)
    R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0
    Krnl GPRS: 00d43318 0027097c 1666f277 9666f270
    00000000 00000000 0000000a ffffffff
    9666f270 1666f228 1666f277 1666f098
    00000002 80270982 80271016 1666f098
    Krnl Code: 8027097e: f0340dd0a7f1 srp 3536(4,%r0),2033(%r10),4
    80270984: 0f00 clcl %r0,%r0
    80270986: a7840001 brc 8,80270988
    >8027098a: 18ef lr %r14,%r15
    8027098c: a7faff68 ahi %r15,-152
    80270990: 18bf lr %r11,%r15
    80270992: 18a2 lr %r10,%r2
    80270994: 1893 lr %r9,%r3

    Modified calltrace with annotated stackframe size of each function:

    stackframe size
    |
    0 304 vsnprintf+850 [0x271016]
    1 72 sprintf+74 [0x271522]
    2 56 cio_start_key+262 [0x2d4c16]
    3 56 ccw_device_start_key+222 [0x2dfe92]
    4 56 ccw_device_start+40 [0x2dff28]
    5 48 raw3215_start_io+104 [0x30b0f8]
    6 56 raw3215_write+494 [0x30ba0a]
    7 40 con3215_write+68 [0x30bafc]
    8 40 __call_console_drivers+146 [0x12b0fa]
    9 32 _call_console_drivers+102 [0x12b192]
    10 64 release_console_sem+268 [0x12b614]
    11 168 vprintk+462 [0x12bca6]
    12 72 printk+68 [0x12bfd0]
    13 256 __print_symbol+50 [0x15a882]
    14 56 __show_trace+162 [0x103d06]
    15 32 show_trace+224 [0x103e70]
    16 48 show_stack+152 [0x103f20]
    17 56 dump_stack+126 [0x104612]
    18 96 __alloc_pages_internal+592 [0x175004]
    19 80 cache_alloc_refill+776 [0x196f3c]
    20 40 __kmalloc+258 [0x1972ae]
    21 40 __alloc_skb+94 [0x328086]
    22 32 pskb_copy+50 [0x328252]
    23 32 skb_realloc_headroom+110 [0x328a72]
    24 104 qeth_l2_hard_start_xmit+378 [0x7803bfde]
    25 56 dev_hard_start_xmit+450 [0x32ef6e]
    26 56 __qdisc_run+390 [0x3425d6]
    27 48 dev_queue_xmit+410 [0x331e06]
    28 40 ip_finish_output+308 [0x354ac8]
    29 56 ip_output+218 [0x355b6e]
    30 24 ip_local_out+56 [0x354584]
    31 120 ip_queue_xmit+300 [0x355cec]
    32 96 tcp_transmit_skb+812 [0x367da8]
    33 40 tcp_push_one+158 [0x369fda]
    34 112 tcp_sendmsg+852 [0x35d5a0]
    35 240 sock_sendmsg+164 [0x32035c]
    36 56 kernel_sendmsg+86 [0x32064a]
    37 88 sock_no_sendpage+98 [0x322b22]
    38 104 tcp_sendpage+70 [0x35cc1e]
    39 48 sock_sendpage+74 [0x31eb66]
    40 64 pipe_to_sendpage+102 [0x1c4b2e]
    41 64 __splice_from_pipe+120 [0x1c5340]
    42 72 splice_from_pipe+90 [0x1c57e6]
    43 56 generic_splice_sendpage+38 [0x1c5832]
    44 48 do_splice_from+104 [0x1c4c38]
    45 48 direct_splice_actor+52 [0x1c4c88]
    46 80 splice_direct_to_actor+180 [0x1c4f80]
    47 72 do_splice_direct+70 [0x1c5112]
    48 64 do_sendfile+360 [0x19de18]
    49 72 sys_sendfile64+126 [0x19df32]
    50 336 sysc_do_restart+18 [0x111a1a]

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • allyesconfig and allmodconfig built kernels have a tape IPL record.
    A the vmreader record makes much more sense, since hardly anybody will
    ever IPL a kernel from tape. So change the default.
    As I side effect I can test these kernels without fiddling around with
    the kernel config ;)

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

20 Oct, 2008

1 commit

  • This patch implements a new freezer subsystem in the control groups
    framework. It provides a way to stop and resume execution of all tasks in
    a cgroup by writing in the cgroup filesystem.

    The freezer subsystem in the container filesystem defines a file named
    freezer.state. Writing "FROZEN" to the state file will freeze all tasks
    in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in
    the cgroup. Reading will return the current state.

    * Examples of usage :

    # mkdir /containers/freezer
    # mount -t cgroup -ofreezer freezer /containers
    # mkdir /containers/0
    # echo $some_pid > /containers/0/tasks

    to get status of the freezer subsystem :

    # cat /containers/0/freezer.state
    RUNNING

    to freeze all tasks in the container :

    # echo FROZEN > /containers/0/freezer.state
    # cat /containers/0/freezer.state
    FREEZING
    # cat /containers/0/freezer.state
    FROZEN

    to unfreeze all tasks in the container :

    # echo RUNNING > /containers/0/freezer.state
    # cat /containers/0/freezer.state
    RUNNING

    This is the basic mechanism which should do the right thing for user space
    task in a simple scenario.

    It's important to note that freezing can be incomplete. In that case we
    return EBUSY. This means that some tasks in the cgroup are busy doing
    something that prevents us from completely freezing the cgroup at this
    time. After EBUSY, the cgroup will remain partially frozen -- reflected
    by freezer.state reporting "FREEZING" when read. The state will remain
    "FREEZING" until one of these things happens:

    1) Userspace cancels the freezing operation by writing "RUNNING" to
    the freezer.state file
    2) Userspace retries the freezing operation by writing "FROZEN" to
    the freezer.state file (writing "FREEZING" is not legal
    and returns EIO)
    3) The tasks that blocked the cgroup from entering the "FROZEN"
    state disappear from the cgroup's set of tasks.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: export thaw_process]
    Signed-off-by: Cedric Le Goater
    Signed-off-by: Matt Helsley
    Acked-by: Serge E. Hallyn
    Tested-by: Matt Helsley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Helsley
     

15 Oct, 2008

1 commit


11 Oct, 2008

1 commit

  • * System call parameter and result access functions
    * Add tracehook calls
    * Split syscall_trace into two functions do_syscall_trace_enter and
    do_syscall_trace_exit

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

01 Aug, 2008

1 commit


25 Jul, 2008

1 commit


17 Jul, 2008

1 commit

  • Compiling a kernel with allmodconfig or allyesconfig results in tons
    of gcc warnings, because the default maximum stacksize from which on
    gcc will emit a warning is just 256 bytes.
    Increase this to 2048, so these warnings don't distract from the real
    warnings that we need to watch at.

    Signed-off-by: Heiko Carstens
    Cc: Martin Schwidefsky

    Heiko Carstens
     

14 Jul, 2008

2 commits