24 Sep, 2009

1 commit

  • While it's architecturally clean to have the cgroup debug subsystem be
    completely independent of the cgroups framework, it limits its usefulness
    for debugging the contents of internal data structures. Move the debug
    subsystem code into the scope of all the cgroups data structures to make
    more detailed debugging possible.

    Signed-off-by: Paul Menage
    Reviewed-by: Li Zefan
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc: Dhaval Giani
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     

22 Sep, 2009

1 commit

  • …ux/kernel/git/tip/linux-2.6-tip

    * 'perfcounters-rename-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf: Tidy up after the big rename
    perf: Do the big rename: Performance Counters -> Performance Events
    perf_counter: Rename 'event' to event_id/hw_event
    perf_counter: Rename list_entry -> group_entry, counter_list -> group_list

    Manually resolved some fairly trivial conflicts with the tracing tree in
    include/trace/ftrace.h and kernel/trace/trace_syscalls.c.

    Linus Torvalds
     

21 Sep, 2009

1 commit

  • Bye-bye Performance Counters, welcome Performance Events!

    In the past few months the perfcounters subsystem has grown out its
    initial role of counting hardware events, and has become (and is
    becoming) a much broader generic event enumeration, reporting, logging,
    monitoring, analysis facility.

    Naming its core object 'perf_counter' and naming the subsystem
    'perfcounters' has become more and more of a misnomer. With pending
    code like hw-breakpoints support the 'counter' name is less and
    less appropriate.

    All in one, we've decided to rename the subsystem to 'performance
    events' and to propagate this rename through all fields, variables
    and API names. (in an ABI compatible fashion)

    The word 'event' is also a bit shorter than 'counter' - which makes
    it slightly more convenient to write/handle as well.

    Thanks goes to Stephane Eranian who first observed this misnomer and
    suggested a rename.

    User-space tooling and ABI compatibility is not affected - this patch
    should be function-invariant. (Also, defconfigs were not touched to
    keep the size down.)

    This patch has been generated via the following script:

    FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

    sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

    for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
    done

    FILES=$(find . -name perf_event.*)

    sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

    ... to keep it as correct as possible. This script can also be
    used by anyone who has pending perfcounters patches - it converts
    a Linux kernel tree over to the new naming. We tried to time this
    change to the point in time where the amount of pending patches
    is the smallest: the end of the merge window.

    Namespace clashes were fixed up in a preparatory patch - and some
    stylistic fallout will be fixed up in a subsequent patch.

    ( NOTE: 'counters' are still the proper terminology when we deal
    with hardware registers - and these sed scripts are a bit
    over-eager in renaming them. I've undone some of that, but
    in case there's something left where 'counter' would be
    better than 'event' we can undo that on an individual basis
    instead of touching an otherwise nicely automated patch. )

    Suggested-by: Stephane Eranian
    Acked-by: Peter Zijlstra
    Acked-by: Paul Mackerras
    Reviewed-by: Arjan van de Ven
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: Kyle McMartin
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

19 Sep, 2009

1 commit

  • Now that the last users of markers have migrated to the event
    tracer we can kill off the (now orphan) support code.

    Signed-off-by: Christoph Hellwig
    Acked-by: Mathieu Desnoyers
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Christoph Hellwig
     

16 Sep, 2009

1 commit


15 Sep, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-kconfig:
    kconfig: add missing dependency of conf to localyesconfig
    kconfig: test if a .config already exists
    kconfig: make local .config default for streamline_config
    kconfig: test for /boot/config-uname after /proc/config.gz in localconfig
    kconfig: unset IKCONFIG_PROC and clean up nesting
    kconfig: search for a config to base the local(mod|yes)config on
    kconfig: keep config.gz around even if CONFIG_IKCONFIG_PROC is not set
    kconfig: have extract-ikconfig read ELF files
    kconfig: add check if end exists in extract-ikconfig
    kconfig: enable CONFIG_IKCONFIG from streamline_config.pl
    kconfig: do not warn about modules built in
    kconfig: streamline_config.pl do not stop with no depends
    kconfig: add make localyesconfig option
    kconfig: make localmodconfig to run streamline_config.pl
    kconfig: add streamline_config.pl to scripts

    Linus Torvalds
     

23 Aug, 2009

2 commits

  • Now that CONFIG_TREE_PREEMPT_RCU is in place, there is no
    further need for CONFIG_PREEMPT_RCU. Remove it, along with
    whatever subtle bugs it may (or may not) contain.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Create a kernel/rcutree_plugin.h file that contains definitions
    for preemptable RCU (or, under the #else branch of the #ifdef,
    empty definitions for the classic non-preemptable semantics).
    These definitions fit into plugins defined in kernel/rcutree.c
    for this purpose.

    This variant of preemptable RCU uses a new algorithm whose
    read-side expense is roughly that of classic hierarchical RCU
    under CONFIG_PREEMPT. This new algorithm's update-side expense
    is similar to that of classic hierarchical RCU, and, in absence
    of read-side preemption or blocking, is exactly that of classic
    hierarchical RCU. Perhaps more important, this new algorithm
    has a much simpler implementation, saving well over 1,000 lines
    of code compared to mainline's implementation of preemptable
    RCU, which will hopefully be retired in favor of this new
    algorithm.

    The simplifications are obtained by maintaining per-task
    nesting state for running tasks, and using a simple
    lock-protected algorithm to handle accounting when tasks block
    within RCU read-side critical sections, making use of lessons
    learned while creating numerous user-level RCU implementations
    over the past 18 months.

    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josht@linux.vnet.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

19 Aug, 2009

1 commit


16 Aug, 2009

1 commit


29 Jun, 2009

1 commit

  • …nel/git/tip/linux-2.6-tip

    * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    ftrace: Fix the output of profile
    ring-buffer: Make it generally available
    ftrace: Remove duplicate newline
    tracing: Fix trace_buf_size boot option
    ftrace: Fix t_hash_start()
    ftrace: Don't manipulate @pos in t_start()
    ftrace: Don't increment @pos in g_start()
    tracing: Reset iterator in t_start()
    trace_stat: Don't increment @pos in seq start()
    tracing_bprintk: Don't increment @pos in t_start()
    tracing/events: Don't increment @pos in s_start()

    Linus Torvalds
     

25 Jun, 2009

2 commits

  • In hunting down the cause for the hwlat_detector ring buffer spew in
    my failed -next builds it became obvious that folks are now treating
    ring_buffer as something that is generic independent of tracing and thus,
    suitable for public driver consumption.

    Given that there are only a few minor areas in ring_buffer that have any
    reliance on CONFIG_TRACING or CONFIG_FUNCTION_TRACER, provide stubs for
    those and make it generally available.

    Signed-off-by: Paul Mundt
    Cc: Jon Masters
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul Mundt
     
  • Even though one cannot make use of the audit watch code without
    CONFIG_AUDIT_SYSCALL the spaghetti nature of the audit code means that
    the audit rule filtering requires that it at least be compiled.

    Thus build the audit_watch code when we build auditfilter like it was
    before cfcad62c74abfef83762dc05a556d21bdf3980a2

    Clearly this is a point of potential future cleanup..

    Reported-by: Frans Pop
    Signed-off-by: Eric Paris
    Signed-off-by: Al Viro

    Eric Paris
     

24 Jun, 2009

2 commits

  • Remove Classic RCU, given that the combination of Tree RCU and
    the proposed Bloatwatch RCU do everything that Classic RCU can
    with fewer bugs.

    Tree RCU has been default in x86 builds for almost six months,
    and seems to be quite reliable, so there does not seem to be
    much justification for keeping the Classic RCU code and config
    complexity around anymore.

    Signed-off-by: Paul E. McKenney
    Cc: akpm@linux-foundation.org
    Cc: niv@us.ibm.com
    Cc: dvhltc@us.ibm.com
    Cc: dipankar@in.ibm.com
    Cc: dhowells@redhat.com
    Cc: lethal@linux-sh.org
    Cc: kernel@wantstofly.org
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • In preparation for converting audit to use fsnotify instead of inotify we
    seperate the inode watching code into it's own file. This is similar to
    how the audit tree watching code is already seperated into audit_tree.c

    Signed-off-by: Eric Paris

    Eric Paris
     

19 Jun, 2009

1 commit

  • Enable the use of GCC's coverage testing tool gcov [1] with the Linux
    kernel. gcov may be useful for:

    * debugging (has this code been reached at all?)
    * test improvement (how do I change my test to cover these lines?)
    * minimizing kernel configurations (do I need this option if the
    associated code is never run?)

    The profiling patch incorporates the following changes:

    * change kbuild to include profiling flags
    * provide functions needed by profiling code
    * present profiling data as files in debugfs

    Note that on some architectures, enabling gcc's profiling option
    "-fprofile-arcs" for the entire kernel may trigger compile/link/
    run-time problems, some of which are caused by toolchain bugs and
    others which require adjustment of architecture code.

    For this reason profiling the entire kernel is initially restricted
    to those architectures for which it is known to work without changes.
    This restriction can be lifted once an architecture has been tested
    and found compatible with gcc's profiling. Profiling of single files
    or directories is still available on all platforms (see config help
    text).

    [1] http://gcc.gnu.org/onlinedocs/gcc/Gcov.html

    Signed-off-by: Peter Oberparleiter
    Cc: Andi Kleen
    Cc: Huang Ying
    Cc: Li Wei
    Cc: Michael Ellerman
    Cc: Ingo Molnar
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Rusty Russell
    Cc: WANG Cong
    Cc: Sam Ravnborg
    Cc: Jeff Dike
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Oberparleiter
     

17 Jun, 2009

1 commit

  • Move supplementary groups implementation to kernel/groups.c .
    kernel/sys.c already accumulated quite a few random stuff.

    Do strictly copy/paste + add required headers to compile. Compile-tested
    on many configs and archs.

    Signed-off-by: Alexey Dobriyan
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

11 Jun, 2009

1 commit


24 Apr, 2009

1 commit


15 Apr, 2009

1 commit

  • Jeremy Fitzhardinge reported this build failure:

    LD .tmp_vmlinux1
    arch/x86/kernel/built-in.o: In function `ds_take_timestamp':
    git/linux/arch/x86/kernel/ds.c:1380: undefined reference to `trace_clock_global'
    git/linux/arch/x86/kernel/ds.c:1380: undefined reference to `trace_clock_global'

    Which is due to !CONFIG_TRACING && CONFIG_X86_DS=y.

    Expose the trace clock code to CONFIG_X86_DS as well.

    [ Unfortunately librarizing doesnt work well - ancient architectures
    with no raw_local_irq_save() primitive break the build. ]

    Reported-by: Jeremy Fitzhardinge
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

08 Apr, 2009

1 commit


07 Apr, 2009

1 commit


06 Apr, 2009

1 commit

  • Merge reason: we have gathered quite a few conflicts, need to merge upstream

    Conflicts:
    arch/powerpc/kernel/Makefile
    arch/x86/ia32/ia32entry.S
    arch/x86/include/asm/hardirq.h
    arch/x86/include/asm/unistd_32.h
    arch/x86/include/asm/unistd_64.h
    arch/x86/kernel/cpu/common.c
    arch/x86/kernel/irq.c
    arch/x86/kernel/syscall_table_32.S
    arch/x86/mm/iomap_32.c
    include/linux/sched.h
    kernel/Makefile

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

03 Apr, 2009

1 commit

  • Create a dynamically sized pool of threads for doing very slow work items, such
    as invoking mkdir() or rmdir() - things that may take a long time and may
    sleep, holding mutexes/semaphores and hogging a thread, and are thus unsuitable
    for workqueues.

    The number of threads is always at least a settable minimum, but more are
    started when there's more work to do, up to a limit. Because of the nature of
    the load, it's not suitable for a 1-thread-per-CPU type pool. A system with
    one CPU may well want several threads.

    This is used by FS-Cache to do slow caching operations in the background, such
    as looking up, creating or deleting cache objects.

    Signed-off-by: David Howells
    Acked-by: Serge Hallyn
    Acked-by: Steve Dickson
    Acked-by: Trond Myklebust
    Acked-by: Al Viro
    Tested-by: Daire Byrne

    David Howells
     

26 Feb, 2009

1 commit


19 Feb, 2009

1 commit

  • Compilation of kprobes.c with CONFIG_PM unset is broken due to some broken
    config dependncies. Fix that.

    Signed-off-by: Rafael J. Wysocki
    Reported-by: Ingo Molnar
    Tested-by: Masami Hiramatsu
    Cc: Len Brown
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     

12 Feb, 2009

1 commit


21 Jan, 2009

1 commit


19 Jan, 2009

1 commit

  • Conflicts:
    arch/x86/include/asm/pda.h

    We merge tip/core/percpu into tip/perfcounters/core because of a
    semantic and contextual conflict: the former eliminates the PDA,
    while the latter extends it with apic_perf_irqs field.

    Resolve the conflict by moving the new field to the irq_cpustat
    structure on 64-bit too.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

16 Jan, 2009

1 commit

  • Decoupling allows:

    * hung tasks check to happen at very low priority

    * hung tasks check and softlockup to be enabled/disabled independently
    at compile and/or run-time

    * individual panic settings to be enabled disabled independently
    at compile and/or run-time

    * softlockup threshold to be reduced without increasing hung tasks
    poll frequency (hung task check is expensive relative to softlock watchdog)

    * hung task check to be zero over-head when disabled at run-time

    Signed-off-by: Mandeep Singh Baines
    Signed-off-by: Ingo Molnar

    Mandeep Singh Baines
     

15 Jan, 2009

1 commit


11 Jan, 2009

2 commits


08 Jan, 2009

1 commit

  • Right now, most of the kernel boot is strictly synchronous, such that
    various hardware delays are done sequentially.

    In order to make the kernel boot faster, this patch introduces
    infrastructure to allow doing some of the initialization steps
    asynchronously, which will hide significant portions of the hardware delays
    in practice.

    In order to not change device order and other similar observables, this
    patch does NOT do full parallel initialization.

    Rather, it operates more in the way an out of order CPU does; the work may
    be done out of order and asynchronous, but the observable effects
    (instruction retiring for the CPU) are still done in the original sequence.

    Signed-off-by: Arjan van de Ven

    Arjan van de Ven
     

31 Dec, 2008

1 commit

  • * 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
    stacktrace: provide save_stack_trace_tsk() weak alias
    rcu: provide RCU options on non-preempt architectures too
    printk: fix discarding message when recursion_bug
    futex: clean up futex_(un)lock_pi fault handling
    "Tree RCU": scalable classic RCU implementation
    futex: rename field in futex_q to clarify single waiter semantics
    x86/swiotlb: add default swiotlb_arch_range_needs_mapping
    x86/swiotlb: add default physbus conversion
    x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
    x86: add swiotlb allocation functions
    swiotlb: consolidate swiotlb info message printing
    swiotlb: support bouncing of HighMem pages
    swiotlb: factor out copy to/from device
    swiotlb: add arch hook to force mapping
    swiotlb: allow architectures to override physbusphys conversions
    swiotlb: add comment where we handle the overflow of a dma mask on 32 bit
    rcu: fix rcutorture behavior during reboot
    resources: skip sanity check of busy resources
    swiotlb: move some definitions to header
    swiotlb: allow architectures to override swiotlb pool allocation
    ...

    Fix up trivial conflicts in
    arch/x86/kernel/Makefile
    arch/x86/mm/init_32.c
    include/linux/hardirq.h
    as per Ingo's suggestions.

    Linus Torvalds
     

29 Dec, 2008

2 commits

  • Conflicts:
    fs/exec.c
    include/linux/init_task.h

    Simple context conflicts.

    Ingo Molnar
     
  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
    sched: fix warning in fs/proc/base.c
    schedstat: consolidate per-task cpu runtime stats
    sched: use RCU variant of list traversal in for_each_leaf_rt_rq()
    sched, cpuacct: export percpu cpuacct cgroup stats
    sched, cpuacct: refactoring cpuusage_read / cpuusage_write
    sched: optimize update_curr()
    sched: fix wakeup preemption clock
    sched: add missing arch_update_cpu_topology() call
    sched: let arch_update_cpu_topology indicate if topology changed
    sched: idle_balance() does not call load_balance_newidle()
    sched: fix sd_parent_degenerate on non-numa smp machine
    sched: add uid information to sched_debug for CONFIG_USER_SCHED
    sched: move double_unlock_balance() higher
    sched: update comment for move_task_off_dead_cpu
    sched: fix inconsistency when redistribute per-cpu tg->cfs_rq shares
    sched/rt: removed unneeded defintion
    sched: add hierarchical accounting to cpu accounting controller
    sched: include group statistics in /proc/sched_debug
    sched: rename SCHED_NO_NO_OMIT_FRAME_POINTER => SCHED_OMIT_FRAME_POINTER
    sched: clean up SCHED_CPUMASK_ALLOC
    ...

    Linus Torvalds
     

19 Dec, 2008

1 commit

  • This patch fixes a long-standing performance bug in classic RCU that
    results in massive internal-to-RCU lock contention on systems with
    more than a few hundred CPUs. Although this patch creates a separate
    flavor of RCU for ease of review and patch maintenance, it is intended
    to replace classic RCU.

    This patch still handles stress better than does mainline, so I am still
    calling it ready for inclusion. This patch is against the -tip tree.
    Nevertheless, experience on an actual 1000+ CPU machine would still be
    most welcome.

    Most of the changes noted below were found while creating an rcutiny
    (which should permit ejecting the current rcuclassic) and while doing
    detailed line-by-line documentation.

    Updates from v9 (http://lkml.org/lkml/2008/12/2/334):

    o Fixes from remainder of line-by-line code walkthrough,
    including comment spelling, initialization, undesirable
    narrowing due to type conversion, removing redundant memory
    barriers, removing redundant local-variable initialization,
    and removing redundant local variables.

    I do not believe that any of these fixes address the CPU-hotplug
    issues that Andi Kleen was seeing, but please do give it a whirl
    in case the machine is smarter than I am.

    A writeup from the walkthrough may be found at the following
    URL, in case you are suffering from terminal insomnia or
    masochism:

    http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf

    o Made rcutree tracing use seq_file, as suggested some time
    ago by Lai Jiangshan.

    o Added a .csv variant of the rcudata debugfs trace file, to allow
    people having thousands of CPUs to drop the data into
    a spreadsheet. Tested with oocalc and gnumeric. Updated
    documentation to suit.

    Updates from v8 (http://lkml.org/lkml/2008/11/15/139):

    o Fix a theoretical race between grace-period initialization and
    force_quiescent_state() that could occur if more than three
    jiffies were required to carry out the grace-period
    initialization. Which it might, if you had enough CPUs.

    o Apply Ingo's printk-standardization patch.

    o Substitute local variables for repeated accesses to global
    variables.

    o Fix comment misspellings and redundant (but harmless) increments
    of ->n_rcu_pending (this latter after having explicitly added it).

    o Apply checkpatch fixes.

    Updates from v7 (http://lkml.org/lkml/2008/10/10/291):

    o Fixed a number of problems noted by Gautham Shenoy, including
    the cpu-stall-detection bug that he was having difficulty
    convincing me was real. ;-)

    o Changed cpu-stall detection to wait for ten seconds rather than
    three in order to reduce false positive, as suggested by Ingo
    Molnar.

    o Produced a design document (http://lwn.net/Articles/305782/).
    The act of writing this document uncovered a number of both
    theoretical and "here and now" bugs as noted below.

    o Fix dynticks_nesting accounting confusion, simplify WARN_ON()
    condition, fix kerneldoc comments, and add memory barriers
    in dynticks interface functions.

    o Add more data to tracing.

    o Remove unused "rcu_barrier" field from rcu_data structure.

    o Count calls to rcu_pending() from scheduling-clock interrupt
    to use as a surrogate timebase should jiffies stop counting.

    o Fix a theoretical race between force_quiescent_state() and
    grace-period initialization. Yes, initialization does have to
    go on for some jiffies for this race to occur, but given enough
    CPUs...

    Updates from v6 (http://lkml.org/lkml/2008/9/23/448):

    o Fix a number of checkpatch.pl complaints.

    o Apply review comments from Ingo Molnar and Lai Jiangshan
    on the stall-detection code.

    o Fix several bugs in !CONFIG_SMP builds.

    o Fix a misspelled config-parameter name so that RCU now announces
    at boot time if stall detection is configured.

    o Run tests on numerous combinations of configurations parameters,
    which after the fixes above, now build and run correctly.

    Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line):

    o Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a
    changeset some time ago, and finally got around to retesting
    this option).

    o Fix some tracing bugs in rcupreempt that caused incorrect
    totals to be printed.

    o I now test with a more brutal random-selection online/offline
    script (attached). Probably more brutal than it needs to be
    on the people reading it as well, but so it goes.

    o A number of optimizations and usability improvements:

    o Make rcu_pending() ignore the grace-period timeout when
    there is no grace period in progress.

    o Make force_quiescent_state() avoid going for a global
    lock in the case where there is no grace period in
    progress.

    o Rearrange struct fields to improve struct layout.

    o Make call_rcu() initiate a grace period if RCU was
    idle, rather than waiting for the next scheduling
    clock interrupt.

    o Invoke rcu_irq_enter() and rcu_irq_exit() only when
    idle, as suggested by Andi Kleen. I still don't
    completely trust this change, and might back it out.

    o Make CONFIG_RCU_TRACE be the single config variable
    manipulated for all forms of RCU, instead of the prior
    confusion.

    o Document tracing files and formats for both rcupreempt
    and rcutree.

    Updates from v4 for those missing v5 given its bad subject line:

    o Separated dynticks interface so that NMIs and irqs call separate
    functions, greatly simplifying it. In particular, this code
    no longer requires a proof of correctness. ;-)

    o Separated dynticks state out into its own per-CPU structure,
    avoiding the duplicated accounting.

    o The case where a dynticks-idle CPU runs an irq handler that
    invokes call_rcu() is now correctly handled, forcing that CPU
    out of dynticks-idle mode.

    o Review comments have been applied (thank you all!!!).
    For but one example, fixed the dynticks-ordering issue that
    Manfred pointed out, saving me much debugging. ;-)

    o Adjusted rcuclassic and rcupreempt to handle dynticks changes.

    Attached is an updated patch to Classic RCU that applies a hierarchy,
    greatly reducing the contention on the top-level lock for large machines.
    This passes 10-hour concurrent rcutorture and online-offline testing on
    128-CPU ppc64 without dynticks enabled, and exposes some timekeeping
    bugs in presence of dynticks (exciting working on a system where
    "sleep 1" hangs until interrupted...), which were fixed in the
    2.6.27 kernel. It is getting more reliable than mainline by some
    measures, so the next version will be against -tip for inclusion.
    See also Manfred Spraul's recent patches (or his earlier work from
    2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2).
    We will converge onto a common patch in the fullness of time, but are
    currently exploring different regions of the design space. That said,
    I have already gratefully stolen quite a few of Manfred's ideas.

    This patch provides CONFIG_RCU_FANOUT, which controls the bushiness
    of the RCU hierarchy. Defaults to 32 on 32-bit machines and 64 on
    64-bit machines. If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT,
    there is no hierarchy. By default, the RCU initialization code will
    adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA
    architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable
    this balancing, allowing the hierarchy to be exactly aligned to the
    underlying hardware. Up to two levels of hierarchy are permitted
    (in addition to the root node), allowing up to 16,384 CPUs on 32-bit
    systems and up to 262,144 CPUs on 64-bit systems. I just know that I
    am going to regret saying this, but this seems more than sufficient
    for the foreseeable future. (Some architectures might wish to set
    CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs.
    If this becomes a real problem, additional levels can be added, but I
    doubt that it will make a significant difference on real hardware.)

    In the common case, a given CPU will manipulate its private rcu_data
    structure and the rcu_node structure that it shares with its immediate
    neighbors. This can reduce both lock and memory contention by multiple
    orders of magnitude, which should eliminate the need for the strange
    manipulations that are reported to be required when running Linux on
    very large systems.

    Some shortcomings:

    o More bugs will probably surface as a result of an ongoing
    line-by-line code inspection.

    Patches will be provided as required.

    o There are probably hangs, rcutorture failures, &c. Seems
    quite stable on a 128-CPU machine, but that is kind of small
    compared to 4096 CPUs. However, seems to do better than
    mainline.

    Patches will be provided as required.

    o The memory footprint of this version is several KB larger
    than rcuclassic.

    A separate UP-only rcutiny patch will be provided, which will
    reduce the memory footprint significantly, even compared
    to the old rcuclassic. One such patch passes light testing,
    and has a memory footprint smaller even than rcuclassic.
    Initial reaction from various embedded guys was "it is not
    worth it", so am putting it aside.

    Credits:

    o Manfred Spraul for ideas, review comments, and bugs spotted,
    as well as some good friendly competition. ;-)

    o Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers,
    Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton
    for reviews and comments.

    o Thomas Gleixner for much-needed help with some timer issues
    (see patches below).

    o Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos,
    Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton
    Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines
    alive despite my heavy abuse^Wtesting.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

08 Dec, 2008

1 commit

  • Implement the core kernel bits of Performance Counters subsystem.

    The Linux Performance Counter subsystem provides an abstraction of
    performance counter hardware capabilities. It provides per task and per
    CPU counters, and it provides event capabilities on top of those.

    Performance counters are accessed via special file descriptors.
    There's one file descriptor per virtual counter used.

    The special file descriptor is opened via the perf_counter_open()
    system call:

    int
    perf_counter_open(u32 hw_event_type,
    u32 hw_event_period,
    u32 record_type,
    pid_t pid,
    int cpu);

    The syscall returns the new fd. The fd can be used via the normal
    VFS system calls: read() can be used to read the counter, fcntl()
    can be used to set the blocking mode, etc.

    Multiple counters can be kept open at a time, and the counters
    can be poll()ed.

    See more details in Documentation/perf-counters.txt.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

19 Nov, 2008

1 commit