19 Feb, 2009

1 commit

  • Compilation of kprobes.c with CONFIG_PM unset is broken due to some broken
    config dependncies. Fix that.

    Signed-off-by: Rafael J. Wysocki
    Reported-by: Ingo Molnar
    Tested-by: Masami Hiramatsu
    Cc: Len Brown
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

15 Jan, 2009

1 commit


11 Jan, 2009

1 commit

  • If you do

    smp_call_function_single(expression-with-side-effects, ...)

    then expression-with-side-effects never gets evaluated on UP builds.

    As always, implementing it in C is the correct thing to do.

    While we're there, uninline it for size and possible header dependency
    reasons.

    And create a new kernel/up.c, as a place in which to put
    uniprocessor-specific code and storage. It should mirror kernel/smp.c.

    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    Andrew Morton
     

08 Jan, 2009

1 commit

  • Right now, most of the kernel boot is strictly synchronous, such that
    various hardware delays are done sequentially.

    In order to make the kernel boot faster, this patch introduces
    infrastructure to allow doing some of the initialization steps
    asynchronously, which will hide significant portions of the hardware delays
    in practice.

    In order to not change device order and other similar observables, this
    patch does NOT do full parallel initialization.

    Rather, it operates more in the way an out of order CPU does; the work may
    be done out of order and asynchronous, but the observable effects
    (instruction retiring for the CPU) are still done in the original sequence.

    Signed-off-by: Arjan van de Ven

    Arjan van de Ven
     

31 Dec, 2008

1 commit

  • * 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
    stacktrace: provide save_stack_trace_tsk() weak alias
    rcu: provide RCU options on non-preempt architectures too
    printk: fix discarding message when recursion_bug
    futex: clean up futex_(un)lock_pi fault handling
    "Tree RCU": scalable classic RCU implementation
    futex: rename field in futex_q to clarify single waiter semantics
    x86/swiotlb: add default swiotlb_arch_range_needs_mapping
    x86/swiotlb: add default physbus conversion
    x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
    x86: add swiotlb allocation functions
    swiotlb: consolidate swiotlb info message printing
    swiotlb: support bouncing of HighMem pages
    swiotlb: factor out copy to/from device
    swiotlb: add arch hook to force mapping
    swiotlb: allow architectures to override physbusphys conversions
    swiotlb: add comment where we handle the overflow of a dma mask on 32 bit
    rcu: fix rcutorture behavior during reboot
    resources: skip sanity check of busy resources
    swiotlb: move some definitions to header
    swiotlb: allow architectures to override swiotlb pool allocation
    ...

    Fix up trivial conflicts in
    arch/x86/kernel/Makefile
    arch/x86/mm/init_32.c
    include/linux/hardirq.h
    as per Ingo's suggestions.

    Linus Torvalds
     

29 Dec, 2008

1 commit

  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
    sched: fix warning in fs/proc/base.c
    schedstat: consolidate per-task cpu runtime stats
    sched: use RCU variant of list traversal in for_each_leaf_rt_rq()
    sched, cpuacct: export percpu cpuacct cgroup stats
    sched, cpuacct: refactoring cpuusage_read / cpuusage_write
    sched: optimize update_curr()
    sched: fix wakeup preemption clock
    sched: add missing arch_update_cpu_topology() call
    sched: let arch_update_cpu_topology indicate if topology changed
    sched: idle_balance() does not call load_balance_newidle()
    sched: fix sd_parent_degenerate on non-numa smp machine
    sched: add uid information to sched_debug for CONFIG_USER_SCHED
    sched: move double_unlock_balance() higher
    sched: update comment for move_task_off_dead_cpu
    sched: fix inconsistency when redistribute per-cpu tg->cfs_rq shares
    sched/rt: removed unneeded defintion
    sched: add hierarchical accounting to cpu accounting controller
    sched: include group statistics in /proc/sched_debug
    sched: rename SCHED_NO_NO_OMIT_FRAME_POINTER => SCHED_OMIT_FRAME_POINTER
    sched: clean up SCHED_CPUMASK_ALLOC
    ...

    Linus Torvalds
     

19 Dec, 2008

1 commit

  • This patch fixes a long-standing performance bug in classic RCU that
    results in massive internal-to-RCU lock contention on systems with
    more than a few hundred CPUs. Although this patch creates a separate
    flavor of RCU for ease of review and patch maintenance, it is intended
    to replace classic RCU.

    This patch still handles stress better than does mainline, so I am still
    calling it ready for inclusion. This patch is against the -tip tree.
    Nevertheless, experience on an actual 1000+ CPU machine would still be
    most welcome.

    Most of the changes noted below were found while creating an rcutiny
    (which should permit ejecting the current rcuclassic) and while doing
    detailed line-by-line documentation.

    Updates from v9 (http://lkml.org/lkml/2008/12/2/334):

    o Fixes from remainder of line-by-line code walkthrough,
    including comment spelling, initialization, undesirable
    narrowing due to type conversion, removing redundant memory
    barriers, removing redundant local-variable initialization,
    and removing redundant local variables.

    I do not believe that any of these fixes address the CPU-hotplug
    issues that Andi Kleen was seeing, but please do give it a whirl
    in case the machine is smarter than I am.

    A writeup from the walkthrough may be found at the following
    URL, in case you are suffering from terminal insomnia or
    masochism:

    http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf

    o Made rcutree tracing use seq_file, as suggested some time
    ago by Lai Jiangshan.

    o Added a .csv variant of the rcudata debugfs trace file, to allow
    people having thousands of CPUs to drop the data into
    a spreadsheet. Tested with oocalc and gnumeric. Updated
    documentation to suit.

    Updates from v8 (http://lkml.org/lkml/2008/11/15/139):

    o Fix a theoretical race between grace-period initialization and
    force_quiescent_state() that could occur if more than three
    jiffies were required to carry out the grace-period
    initialization. Which it might, if you had enough CPUs.

    o Apply Ingo's printk-standardization patch.

    o Substitute local variables for repeated accesses to global
    variables.

    o Fix comment misspellings and redundant (but harmless) increments
    of ->n_rcu_pending (this latter after having explicitly added it).

    o Apply checkpatch fixes.

    Updates from v7 (http://lkml.org/lkml/2008/10/10/291):

    o Fixed a number of problems noted by Gautham Shenoy, including
    the cpu-stall-detection bug that he was having difficulty
    convincing me was real. ;-)

    o Changed cpu-stall detection to wait for ten seconds rather than
    three in order to reduce false positive, as suggested by Ingo
    Molnar.

    o Produced a design document (http://lwn.net/Articles/305782/).
    The act of writing this document uncovered a number of both
    theoretical and "here and now" bugs as noted below.

    o Fix dynticks_nesting accounting confusion, simplify WARN_ON()
    condition, fix kerneldoc comments, and add memory barriers
    in dynticks interface functions.

    o Add more data to tracing.

    o Remove unused "rcu_barrier" field from rcu_data structure.

    o Count calls to rcu_pending() from scheduling-clock interrupt
    to use as a surrogate timebase should jiffies stop counting.

    o Fix a theoretical race between force_quiescent_state() and
    grace-period initialization. Yes, initialization does have to
    go on for some jiffies for this race to occur, but given enough
    CPUs...

    Updates from v6 (http://lkml.org/lkml/2008/9/23/448):

    o Fix a number of checkpatch.pl complaints.

    o Apply review comments from Ingo Molnar and Lai Jiangshan
    on the stall-detection code.

    o Fix several bugs in !CONFIG_SMP builds.

    o Fix a misspelled config-parameter name so that RCU now announces
    at boot time if stall detection is configured.

    o Run tests on numerous combinations of configurations parameters,
    which after the fixes above, now build and run correctly.

    Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line):

    o Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a
    changeset some time ago, and finally got around to retesting
    this option).

    o Fix some tracing bugs in rcupreempt that caused incorrect
    totals to be printed.

    o I now test with a more brutal random-selection online/offline
    script (attached). Probably more brutal than it needs to be
    on the people reading it as well, but so it goes.

    o A number of optimizations and usability improvements:

    o Make rcu_pending() ignore the grace-period timeout when
    there is no grace period in progress.

    o Make force_quiescent_state() avoid going for a global
    lock in the case where there is no grace period in
    progress.

    o Rearrange struct fields to improve struct layout.

    o Make call_rcu() initiate a grace period if RCU was
    idle, rather than waiting for the next scheduling
    clock interrupt.

    o Invoke rcu_irq_enter() and rcu_irq_exit() only when
    idle, as suggested by Andi Kleen. I still don't
    completely trust this change, and might back it out.

    o Make CONFIG_RCU_TRACE be the single config variable
    manipulated for all forms of RCU, instead of the prior
    confusion.

    o Document tracing files and formats for both rcupreempt
    and rcutree.

    Updates from v4 for those missing v5 given its bad subject line:

    o Separated dynticks interface so that NMIs and irqs call separate
    functions, greatly simplifying it. In particular, this code
    no longer requires a proof of correctness. ;-)

    o Separated dynticks state out into its own per-CPU structure,
    avoiding the duplicated accounting.

    o The case where a dynticks-idle CPU runs an irq handler that
    invokes call_rcu() is now correctly handled, forcing that CPU
    out of dynticks-idle mode.

    o Review comments have been applied (thank you all!!!).
    For but one example, fixed the dynticks-ordering issue that
    Manfred pointed out, saving me much debugging. ;-)

    o Adjusted rcuclassic and rcupreempt to handle dynticks changes.

    Attached is an updated patch to Classic RCU that applies a hierarchy,
    greatly reducing the contention on the top-level lock for large machines.
    This passes 10-hour concurrent rcutorture and online-offline testing on
    128-CPU ppc64 without dynticks enabled, and exposes some timekeeping
    bugs in presence of dynticks (exciting working on a system where
    "sleep 1" hangs until interrupted...), which were fixed in the
    2.6.27 kernel. It is getting more reliable than mainline by some
    measures, so the next version will be against -tip for inclusion.
    See also Manfred Spraul's recent patches (or his earlier work from
    2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2).
    We will converge onto a common patch in the fullness of time, but are
    currently exploring different regions of the design space. That said,
    I have already gratefully stolen quite a few of Manfred's ideas.

    This patch provides CONFIG_RCU_FANOUT, which controls the bushiness
    of the RCU hierarchy. Defaults to 32 on 32-bit machines and 64 on
    64-bit machines. If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT,
    there is no hierarchy. By default, the RCU initialization code will
    adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA
    architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable
    this balancing, allowing the hierarchy to be exactly aligned to the
    underlying hardware. Up to two levels of hierarchy are permitted
    (in addition to the root node), allowing up to 16,384 CPUs on 32-bit
    systems and up to 262,144 CPUs on 64-bit systems. I just know that I
    am going to regret saying this, but this seems more than sufficient
    for the foreseeable future. (Some architectures might wish to set
    CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs.
    If this becomes a real problem, additional levels can be added, but I
    doubt that it will make a significant difference on real hardware.)

    In the common case, a given CPU will manipulate its private rcu_data
    structure and the rcu_node structure that it shares with its immediate
    neighbors. This can reduce both lock and memory contention by multiple
    orders of magnitude, which should eliminate the need for the strange
    manipulations that are reported to be required when running Linux on
    very large systems.

    Some shortcomings:

    o More bugs will probably surface as a result of an ongoing
    line-by-line code inspection.

    Patches will be provided as required.

    o There are probably hangs, rcutorture failures, &c. Seems
    quite stable on a 128-CPU machine, but that is kind of small
    compared to 4096 CPUs. However, seems to do better than
    mainline.

    Patches will be provided as required.

    o The memory footprint of this version is several KB larger
    than rcuclassic.

    A separate UP-only rcutiny patch will be provided, which will
    reduce the memory footprint significantly, even compared
    to the old rcuclassic. One such patch passes light testing,
    and has a memory footprint smaller even than rcuclassic.
    Initial reaction from various embedded guys was "it is not
    worth it", so am putting it aside.

    Credits:

    o Manfred Spraul for ideas, review comments, and bugs spotted,
    as well as some good friendly competition. ;-)

    o Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers,
    Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton
    for reviews and comments.

    o Thomas Gleixner for much-needed help with some timer issues
    (see patches below).

    o Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos,
    Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton
    Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines
    alive despite my heavy abuse^Wtesting.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

19 Nov, 2008

1 commit


18 Nov, 2008

2 commits

  • Conflicts:
    fs/cifs/misc.c

    Merge to resolve above, per the patch below.

    Signed-off-by: James Morris

    diff --cc fs/cifs/misc.c
    index ec36410,addd1dc..0000000
    --- a/fs/cifs/misc.c
    +++ b/fs/cifs/misc.c
    @@@ -347,13 -338,13 +338,13 @@@ header_assemble(struct smb_hdr *buffer
    /* BB Add support for establishing new tCon and SMB Session */
    /* with userid/password pairs found on the smb session */
    /* for other target tcp/ip addresses BB */
    - if (current->fsuid != treeCon->ses->linux_uid) {
    + if (current_fsuid() != treeCon->ses->linux_uid) {
    cFYI(1, ("Multiuser mode and UID "
    "did not match tcon uid"));
    - read_lock(&GlobalSMBSeslock);
    - list_for_each(temp_item, &GlobalSMBSessionList) {
    - ses = list_entry(temp_item, struct cifsSesInfo, cifsSessionList);
    + read_lock(&cifs_tcp_ses_lock);
    + list_for_each(temp_item, &treeCon->ses->server->smb_ses_list) {
    + ses = list_entry(temp_item, struct cifsSesInfo, smb_ses_list);
    - if (ses->linux_uid == current->fsuid) {
    + if (ses->linux_uid == current_fsuid()) {
    if (ses->server == treeCon->ses->server) {
    cFYI(1, ("found matching uid substitute right smb_uid"));
    buffer->Uid = ses->Suid;

    James Morris
     
  • For some unknown reason at Steven Rostedt added in disabling of the SPE
    instruction generation for e500 based PPC cores in commit
    6ec562328fda585be2d7f472cfac99d3b44d362a.

    We are removing it because:

    1. It generates e500 kernels that don't work
    2. its not the correct set of flags to do this
    3. we handle this in the arch/powerpc/Makefile already
    4. its unknown in talking to Steven why he did this

    Signed-off-by: Kumar Gala
    Tested-and-Acked-by: Steven Rostedt
    Signed-off-by: Linus Torvalds

    Kumar Gala
     

14 Nov, 2008

1 commit


11 Nov, 2008

1 commit


03 Nov, 2008

1 commit


22 Oct, 2008

1 commit


21 Oct, 2008

2 commits

  • …l/git/tip/linux-2.6-tip

    * 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (131 commits)
    tracing/fastboot: improve help text
    tracing/stacktrace: improve help text
    tracing/fastboot: fix initcalls disposition in bootgraph.pl
    tracing/fastboot: fix bootgraph.pl initcall name regexp
    tracing/fastboot: fix issues and improve output of bootgraph.pl
    tracepoints: synchronize unregister static inline
    tracepoints: tracepoint_synchronize_unregister()
    ftrace: make ftrace_test_p6nop disassembler-friendly
    markers: fix synchronize marker unregister static inline
    tracing/fastboot: add better resolution to initcall debug/tracing
    trace: add build-time check to avoid overrunning hex buffer
    ftrace: fix hex output mode of ftrace
    tracing/fastboot: fix initcalls disposition in bootgraph.pl
    tracing/fastboot: fix printk format typo in boot tracer
    ftrace: return an error when setting a nonexistent tracer
    ftrace: make some tracers reentrant
    ring-buffer: make reentrant
    ring-buffer: move page indexes into page headers
    tracing/fastboot: only trace non-module initcalls
    ftrace: move pc counter in irqtrace
    ...

    Manually fix conflicts:
    - init/main.c: initcall tracing
    - kernel/module.c: verbose level vs tracepoints
    - scripts/bootgraph.pl: fallout from cherry-picking commits.

    Linus Torvalds
     
  • Due to confusion between the ftrace infrastructure and the gcc profiling
    tracer "ftrace", this patch renames the config options from FTRACE to
    FUNCTION_TRACER. The other two names that are offspring from FTRACE
    DYNAMIC_FTRACE and FTRACE_MCOUNT_RECORD will stay the same.

    This patch was generated mostly by script, and partially by hand.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

20 Oct, 2008

2 commits

  • This patch implements a new freezer subsystem in the control groups
    framework. It provides a way to stop and resume execution of all tasks in
    a cgroup by writing in the cgroup filesystem.

    The freezer subsystem in the container filesystem defines a file named
    freezer.state. Writing "FROZEN" to the state file will freeze all tasks
    in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in
    the cgroup. Reading will return the current state.

    * Examples of usage :

    # mkdir /containers/freezer
    # mount -t cgroup -ofreezer freezer /containers
    # mkdir /containers/0
    # echo $some_pid > /containers/0/tasks

    to get status of the freezer subsystem :

    # cat /containers/0/freezer.state
    RUNNING

    to freeze all tasks in the container :

    # echo FROZEN > /containers/0/freezer.state
    # cat /containers/0/freezer.state
    FREEZING
    # cat /containers/0/freezer.state
    FROZEN

    to unfreeze all tasks in the container :

    # echo RUNNING > /containers/0/freezer.state
    # cat /containers/0/freezer.state
    RUNNING

    This is the basic mechanism which should do the right thing for user space
    task in a simple scenario.

    It's important to note that freezing can be incomplete. In that case we
    return EBUSY. This means that some tasks in the cgroup are busy doing
    something that prevents us from completely freezing the cgroup at this
    time. After EBUSY, the cgroup will remain partially frozen -- reflected
    by freezer.state reporting "FREEZING" when read. The state will remain
    "FREEZING" until one of these things happens:

    1) Userspace cancels the freezing operation by writing "RUNNING" to
    the freezer.state file
    2) Userspace retries the freezing operation by writing "FROZEN" to
    the freezer.state file (writing "FREEZING" is not legal
    and returns EIO)
    3) The tasks that blocked the cgroup from entering the "FROZEN"
    state disappear from the cgroup's set of tasks.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: export thaw_process]
    Signed-off-by: Cedric Le Goater
    Signed-off-by: Matt Helsley
    Acked-by: Serge E. Hallyn
    Tested-by: Matt Helsley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Helsley
     
  • Now that the TIF_FREEZE flag is available in all architectures, extract
    the refrigerator() and freeze_task() from kernel/power/process.c and make
    it available to all.

    The refrigerator() can now be used in a control group subsystem
    implementing a control group freezer.

    Signed-off-by: Cedric Le Goater
    Signed-off-by: Matt Helsley
    Acked-by: Serge E. Hallyn
    Tested-by: Matt Helsley
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Helsley
     

14 Oct, 2008

1 commit

  • Implementation of kernel tracepoints. Inspired from the Linux Kernel
    Markers. Allows complete typing verification by declaring both tracing
    statement inline functions and probe registration/unregistration static
    inline functions within the same macro "DEFINE_TRACE". No format string
    is required. See the tracepoint Documentation and Samples patches for
    usage examples.

    Taken from the documentation patch :

    "A tracepoint placed in code provides a hook to call a function (probe)
    that you can provide at runtime. A tracepoint can be "on" (a probe is
    connected to it) or "off" (no probe is attached). When a tracepoint is
    "off" it has no effect, except for adding a tiny time penalty (checking
    a condition for a branch) and space penalty (adding a few bytes for the
    function call at the end of the instrumented function and adds a data
    structure in a separate section). When a tracepoint is "on", the
    function you provide is called each time the tracepoint is executed, in
    the execution context of the caller. When the function provided ends its
    execution, it returns to the caller (continuing from the tracepoint
    site).

    You can put tracepoints at important locations in the code. They are
    lightweight hooks that can pass an arbitrary number of parameters, which
    prototypes are described in a tracepoint declaration placed in a header
    file."

    Addition and removal of tracepoints is synchronized by RCU using the
    scheduler (and preempt_disable) as guarantees to find a quiescent state
    (this is really RCU "classic"). The update side uses rcu_barrier_sched()
    with call_rcu_sched() and the read/execute side uses
    "preempt_disable()/preempt_enable()".

    We make sure the previous array containing probes, which has been
    scheduled for deletion by the rcu callback, is indeed freed before we
    proceed to the next update. It therefore limits the rate of modification
    of a single tracepoint to one update per RCU period. The objective here
    is to permit fast batch add/removal of probes on _different_
    tracepoints.

    Changelog :
    - Use #name ":" #proto as string to identify the tracepoint in the
    tracepoint table. This will make sure not type mismatch happens due to
    connexion of a probe with the wrong type to a tracepoint declared with
    the same name in a different header.
    - Add tracepoint_entry_free_old.
    - Change __TO_TRACE to get rid of the 'i' iterator.

    Masami Hiramatsu :
    Tested on x86-64.

    Performance impact of a tracepoint : same as markers, except that it
    adds about 70 bytes of instructions in an unlikely branch of each
    instrumented function (the for loop, the stack setup and the function
    call). It currently adds a memory read, a test and a conditional branch
    at the instrumentation site (in the hot path). Immediate values will
    eventually change this into a load immediate, test and branch, which
    removes the memory read which will make the i-cache impact smaller
    (changing the memory read for a load immediate removes 3-4 bytes per
    site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
    also saves the d-cache hit).

    About the performance impact of tracepoints (which is comparable to
    markers), even without immediate values optimizations, tests done by
    Hideo Aoki on ia64 show no regression. His test case was using hackbench
    on a kernel where scheduler instrumentation (about 5 events in code
    scheduler code) was added.

    Quoting Hideo Aoki about Markers :

    I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
    tree, which includes several markers for LTTng, using an ia64 server.

    While the immediate trace mark feature isn't implemented on ia64, there
    is no major performance regression. So, I think that we don't have any
    issues to propose merging marker point patches into Linus's tree from
    the viewpoint of performance impact.

    I prepared two kernels to evaluate. The first one was compiled without
    CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.

    I downloaded the original hackbench from the following URL:
    http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c

    I ran hackbench 5 times in each condition and calculated the average and
    difference between the kernels.

    The parameter of hackbench: every 50 from 50 to 800
    The number of CPUs of the server: 2, 4, and 8

    Below is the results. As you can see, major performance regression
    wasn't found in any case. Even if number of processes increases,
    differences between marker-enabled kernel and marker- disabled kernel
    doesn't increase. Moreover, if number of CPUs increases, the differences
    doesn't increase either.

    Curiously, marker-enabled kernel is better than marker-disabled kernel
    in more than half cases, although I guess it comes from the difference
    of memory access pattern.

    * 2 CPUs

    Number of | without | with | diff | diff |
    processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
    --------------------------------------------------------------
    50 | 4.811 | 4.872 | +0.061 | +1.27 |
    100 | 9.854 | 10.309 | +0.454 | +4.61 |
    150 | 15.602 | 15.040 | -0.562 | -3.6 |
    200 | 20.489 | 20.380 | -0.109 | -0.53 |
    250 | 25.798 | 25.652 | -0.146 | -0.56 |
    300 | 31.260 | 30.797 | -0.463 | -1.48 |
    350 | 36.121 | 35.770 | -0.351 | -0.97 |
    400 | 42.288 | 42.102 | -0.186 | -0.44 |
    450 | 47.778 | 47.253 | -0.526 | -1.1 |
    500 | 51.953 | 52.278 | +0.325 | +0.63 |
    550 | 58.401 | 57.700 | -0.701 | -1.2 |
    600 | 63.334 | 63.222 | -0.112 | -0.18 |
    650 | 68.816 | 68.511 | -0.306 | -0.44 |
    700 | 74.667 | 74.088 | -0.579 | -0.78 |
    750 | 78.612 | 79.582 | +0.970 | +1.23 |
    800 | 85.431 | 85.263 | -0.168 | -0.2 |
    --------------------------------------------------------------

    * 4 CPUs

    Number of | without | with | diff | diff |
    processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
    --------------------------------------------------------------
    50 | 2.586 | 2.584 | -0.003 | -0.1 |
    100 | 5.254 | 5.283 | +0.030 | +0.56 |
    150 | 8.012 | 8.074 | +0.061 | +0.76 |
    200 | 11.172 | 11.000 | -0.172 | -1.54 |
    250 | 13.917 | 14.036 | +0.119 | +0.86 |
    300 | 16.905 | 16.543 | -0.362 | -2.14 |
    350 | 19.901 | 20.036 | +0.135 | +0.68 |
    400 | 22.908 | 23.094 | +0.186 | +0.81 |
    450 | 26.273 | 26.101 | -0.172 | -0.66 |
    500 | 29.554 | 29.092 | -0.461 | -1.56 |
    550 | 32.377 | 32.274 | -0.103 | -0.32 |
    600 | 35.855 | 35.322 | -0.533 | -1.49 |
    650 | 39.192 | 38.388 | -0.804 | -2.05 |
    700 | 41.744 | 41.719 | -0.025 | -0.06 |
    750 | 45.016 | 44.496 | -0.520 | -1.16 |
    800 | 48.212 | 47.603 | -0.609 | -1.26 |
    --------------------------------------------------------------

    * 8 CPUs

    Number of | without | with | diff | diff |
    processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
    --------------------------------------------------------------
    50 | 2.094 | 2.072 | -0.022 | -1.07 |
    100 | 4.162 | 4.273 | +0.111 | +2.66 |
    150 | 6.485 | 6.540 | +0.055 | +0.84 |
    200 | 8.556 | 8.478 | -0.078 | -0.91 |
    250 | 10.458 | 10.258 | -0.200 | -1.91 |
    300 | 12.425 | 12.750 | +0.325 | +2.62 |
    350 | 14.807 | 14.839 | +0.032 | +0.22 |
    400 | 16.801 | 16.959 | +0.158 | +0.94 |
    450 | 19.478 | 19.009 | -0.470 | -2.41 |
    500 | 21.296 | 21.504 | +0.208 | +0.98 |
    550 | 23.842 | 23.979 | +0.137 | +0.57 |
    600 | 26.309 | 26.111 | -0.198 | -0.75 |
    650 | 28.705 | 28.446 | -0.259 | -0.9 |
    700 | 31.233 | 31.394 | +0.161 | +0.52 |
    750 | 34.064 | 33.720 | -0.344 | -1.01 |
    800 | 36.320 | 36.114 | -0.206 | -0.57 |
    --------------------------------------------------------------

    Signed-off-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: 'Peter Zijlstra'
    Signed-off-by: Ingo Molnar

    Mathieu Desnoyers
     

29 Jul, 2008

1 commit


26 Jul, 2008

1 commit

  • Build kernel/profile.o only if CONFIG_PROFILING is enabled.

    This makes CONFIG_PROFILING=n kernels smaller.

    As a bonus, some profile_tick() calls and one branch from schedule() are
    now eliminated with CONFIG_PROFILING=n (but I doubt these are
    measurable effects).

    This patch changes the effects of CONFIG_PROFILING=n, but I don't think
    having more than two choices would be the better choice.

    This patch also adds the name of the first parameter to the prototypes
    of profile_{hits,tick}() since I anyway had to add them for the dummy
    functions.

    Signed-off-by: Adrian Bunk
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

19 Jul, 2008

1 commit


18 Jul, 2008

1 commit


17 Jul, 2008

1 commit


16 Jul, 2008

1 commit

  • Conflicts:

    arch/powerpc/Kconfig
    arch/s390/kernel/time.c
    arch/x86/kernel/apic_32.c
    arch/x86/kernel/cpu/perfctr-watchdog.c
    arch/x86/kernel/i8259_64.c
    arch/x86/kernel/ldt.c
    arch/x86/kernel/nmi_64.c
    arch/x86/kernel/smpboot.c
    arch/x86/xen/smp.c
    include/asm-x86/hw_irq_32.h
    include/asm-x86/hw_irq_64.h
    include/asm-x86/mach-default/irq_vectors.h
    include/asm-x86/mach-voyager/irq_vectors.h
    include/asm-x86/smp.h
    kernel/Makefile

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

14 Jul, 2008

1 commit


11 Jul, 2008

1 commit

  • After the sched_clock code has been removed from sched.c we can now trace
    the scheduler. The scheduler has a lot of functions that would be worth
    tracing.

    Signed-off-by: Steven Rostedt
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

30 Jun, 2008

1 commit


26 Jun, 2008

1 commit

  • This adds kernel/smp.c which contains helpers for IPI function calls. In
    addition to supporting the existing smp_call_function() in a more efficient
    manner, it also adds a more scalable variant called smp_call_function_single()
    for calling a given function on a single CPU only.

    The core of this is based on the x86-64 patch from Nick Piggin, lots of
    changes since then. "Alan D. Brunelle" has
    contributed lots of fixes and suggestions as well. Also thanks to
    Paul E. McKenney for reviewing RCU usage
    and getting rid of the data allocation fallback deadlock.

    Acked-by: Ingo Molnar
    Reviewed-by: Paul E. McKenney
    Signed-off-by: Jens Axboe

    Jens Axboe
     

06 Jun, 2008

2 commits

  • kernel/cpu.c seems a more logical place for those maps since they do not really
    have much to do with the scheduler these days.

    kernel/cpu.c is now built for the UP kernel too, but it does not affect the size
    the kernel sections.

    $ size vmlinux

    before
    text data bss dec hex filename
    3313797 307060 310352 3931209 3bfc49 vmlinux

    after
    text data bss dec hex filename
    3313797 307060 310352 3931209 3bfc49 vmlinux

    Signed-off-by: Max Krasnyansky
    Cc: pj@sgi.com
    Cc: menage@google.com
    Cc: rostedt@goodmis.org
    Cc: mingo@elte.hu
    Acked-by: Peter Zijlstra
    Signed-off-by: Thomas Gleixner

    Max Krasnyansky
     
  • The current code use a linear algorithm which causes scaling issues
    on larger SMP machines. This patch replaces that algorithm with a
    2-dimensional bitmap to reduce latencies in the wake-up path.

    Signed-off-by: Gregory Haskins
    Acked-by: Steven Rostedt
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Gregory Haskins
     

24 May, 2008

4 commits

  • This patch removes the Makefile turd and uses the nice CFLAGS_REMOVE macro
    in the kernel directory.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Steven Rostedt
     
  • This patch removes the "notrace" annotation from lockdep and adds the debugging
    files in the kernel director to those that should not be compiled with
    "-pg" mcount tracing.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Steven Rostedt
     
  • This patch adds the latency tracer infrastructure. This patch
    does not add anything that will select and turn it on, but will
    be used by later patches.

    If it were to be compiled, it would add the following files
    to the debugfs:

    The root tracing directory:

    /debugfs/tracing/

    This patch also adds the following files:

    available_tracers
    list of available tracers. Currently no tracers are
    available. Looking into this file only shows
    "none" which is used to unregister all tracers.

    current_tracer
    The trace that is currently active. Empty on start up.
    To switch to a tracer simply echo one of the tracers that
    are listed in available_tracers:

    example: (used with later patches)

    echo function > /debugfs/tracing/current_tracer

    To disable the tracer:

    echo disable > /debugfs/tracing/current_tracer

    tracing_enabled
    echoing "1" into this file starts the ftrace function tracing
    (if sysctl kernel.ftrace_enabled=1)
    echoing "0" turns it off.

    latency_trace
    This file is readonly and holds the result of the trace.

    trace
    This file outputs a easier to read version of the trace.

    iter_ctrl
    Controls the way the output of traces look.
    So far there's two controls:
    echoing in "symonly" will only show the kallsyms variables
    without the addresses (if kallsyms was configured)
    echoing in "verbose" will change the output to show
    a lot more data, but not very easy to understand by
    humans.
    echoing in "nosymonly" turns off symonly.
    echoing in "noverbose" turns off verbose.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Steven Rostedt
     
  • If CONFIG_FTRACE is selected and /proc/sys/kernel/ftrace_enabled is
    set to a non-zero value the ftrace routine will be called everytime
    we enter a kernel function that is not marked with the "notrace"
    attribute.

    The ftrace routine will then call a registered function if a function
    happens to be registered.

    [ This code has been highly hacked by Steven Rostedt and Ingo Molnar,
    so don't blame Arnaldo for all of this ;-) ]

    Update:
    It is now possible to register more than one ftrace function.
    If only one ftrace function is registered, that will be the
    function that ftrace calls directly. If more than one function
    is registered, then ftrace will call a function that will loop
    through the functions to call.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Arnaldo Carvalho de Melo
     

06 May, 2008

1 commit

  • this replaces the rq->clock stuff (and possibly cpu_clock()).

    - architectures that have an 'imperfect' hardware clock can set
    CONFIG_HAVE_UNSTABLE_SCHED_CLOCK

    - the 'jiffie' window might be superfulous when we update tick_gtod
    before the __update_sched_clock() call in sched_clock_tick()

    - cpu_clock() might be implemented as:

    sched_clock_cpu(smp_processor_id())

    if the accuracy proves good enough - how far can TSC drift in a
    single jiffie when considering the filtering and idle hooks?

    [ mingo@elte.hu: various fixes and cleanups ]

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

29 Apr, 2008

1 commit


18 Apr, 2008

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb:
    kgdb: always use icache flush for sw breakpoints
    kgdb: fix SMP NMI kgdb_handle_exception exit race
    kgdb: documentation fixes
    kgdb: allow static kgdbts boot configuration
    kgdb: add documentation
    kgdb: Kconfig fix
    kgdb: add kgdb internal test suite
    kgdb: fix several kgdb regressions
    kgdb: kgdboc pl011 I/O module
    kgdb: fix optional arch functions and probe_kernel_*
    kgdb: add x86 HW breakpoints
    kgdb: print breakpoint removed on exception
    kgdb: clocksource watchdog
    kgdb: fix NMI hangs
    kgdb: fix kgdboc dynamic module configuration
    kgdb: document parameters
    x86: kgdb support
    consoles: polling support, kgdboc
    kgdb: core
    uaccess: add probe_kernel_write()

    Linus Torvalds
     
  • kgdb core code. Handles the protocol and the arch details.

    [ mingo@elte.hu: heavily modified, simplified and cleaned up. ]
    [ xemul@openvz.org: use find_task_by_pid_ns ]

    Signed-off-by: Jason Wessel
    Signed-off-by: Ingo Molnar
    Signed-off-by: Jan Kiszka
    Reviewed-by: Thomas Gleixner

    Jason Wessel
     

17 Apr, 2008

1 commit

  • Semaphores are no longer performance-critical, so a generic C
    implementation is better for maintainability, debuggability and
    extensibility. Thanks to Peter Zijlstra for fixing the lockdep
    warning. Thanks to Harvey Harrison for pointing out that the
    unlikely() was unnecessary.

    Signed-off-by: Matthew Wilcox
    Acked-by: Ingo Molnar

    Matthew Wilcox