06 Dec, 2011

1 commit

  • This patch changes fields in cpustat from a structure, to an
    u64 array. Math gets easier, and the code is more flexible.

    Signed-off-by: Glauber Costa
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Paul Tuner
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1322498719-2255-2-git-send-email-glommer@parallels.com
    Signed-off-by: Ingo Molnar

    Glauber Costa
     

14 Jan, 2011

1 commit

  • Use modern per_cpu API to increment {soft|hard}irq counters, and use
    per_cpu allocation for (struct irq_desc)->kstats_irq instead of an array.

    This gives better SMP/NUMA locality and saves few instructions per irq.

    With small nr_cpuids values (8 for example), kstats_irq was a small array
    (less than L1_CACHE_BYTES), potentially source of false sharing.

    In the !CONFIG_SPARSE_IRQ case, remove the huge, NUMA/cache unfriendly
    kstat_irqs_all[NR_IRQS][NR_CPUS] array.

    Note: we still populate kstats_irq for all possible irqs in
    early_irq_init(). We probably could use on-demand allocations. (Code
    included in alloc_descs()). Problem is not all IRQS are used with a prior
    alloc_descs() call.

    kstat_irqs_this_cpu() is not used anymore, remove it.

    Signed-off-by: Eric Dumazet
    Reviewed-by: Christoph Lameter
    Cc: Ingo Molnar
    Cc: Andi Kleen
    Cc: Tejun Heo
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

17 Dec, 2010

1 commit

  • __get_cpu_var() can be replaced with this_cpu_read and will then use a
    single read instruction with implied address calculation to access the
    correct per cpu instance.

    However, the address of a per cpu variable passed to __this_cpu_read()
    cannot be determined (since it's an implied address conversion through
    segment prefixes). Therefore apply this only to uses of __get_cpu_var
    where the address of the variable is not used.

    Cc: Pekka Enberg
    Cc: Hugh Dickins
    Cc: Thomas Gleixner
    Acked-by: H. Peter Anvin
    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     

28 Oct, 2010

2 commits

  • In /proc/stat, the number of per-IRQ event is shown by making a sum each
    irq's events on all cpus. But we can make use of kstat_irqs().

    kstat_irqs() do the same calculation, If !CONFIG_GENERIC_HARDIRQ,
    it's not a big cost. (Both of the number of cpus and irqs are small.)

    If a system is very big and CONFIG_GENERIC_HARDIRQ, it does

    for_each_irq()
    for_each_cpu()
    - look up a radix tree
    - read desc->irq_stat[cpu]
    This seems not efficient. This patch adds kstat_irqs() for
    CONFIG_GENRIC_HARDIRQ and change the calculation as

    for_each_irq()
    look up radix tree
    for_each_cpu()
    - read desc->irq_stat[cpu]

    This reduces cost.

    A test on (4096cpusp, 256 nodes, 4592 irqs) host (by Jack Steiner)

    %time cat /proc/stat > /dev/null

    Before Patch: 2.459 sec
    After Patch : .561 sec

    [akpm@linux-foundation.org: unexport kstat_irqs, coding-style tweaks]
    [akpm@linux-foundation.org: fix unused variable 'per_irq_sum']
    Signed-off-by: KAMEZAWA Hiroyuki
    Tested-by: Jack Steiner
    Acked-by: Jack Steiner
    Cc: Yinghai Lu
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • /proc/stat shows the total number of all interrupts to each cpu. But when
    the number of IRQs are very large, it take very long time and 'cat
    /proc/stat' takes more than 10 secs. This is because sum of all irq
    events are counted when /proc/stat is read. This patch adds "sum of all
    irq" counter percpu and reduce read costs.

    The cost of reading /proc/stat is important because it's used by major
    applications as 'top', 'ps', 'w', etc....

    A test on a mechin (4096cpu, 256 nodes, 4592 irqs) shows

    %time cat /proc/stat > /dev/null
    Before Patch: 12.627 sec
    After Patch: 2.459 sec

    Signed-off-by: KAMEZAWA Hiroyuki
    Tested-by: Jack Steiner
    Acked-by: Jack Steiner
    Cc: Yinghai Lu
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

26 Oct, 2009

1 commit

  • CPU time of a guest is always accounted in 'user' time
    without concern for the nice value of its counterpart
    process although the guest is scheduled under the nice
    value.

    This patch fixes the defect and accounts cpu time of
    a niced guest in 'nice' time as same as a niced process.

    And also the patch adds 'guest_nice' to cpuacct. The
    value provides niced guest cpu time which is like 'nice'
    to 'user'.

    The original discussions can be found here:

    http://www.mail-archive.com/kvm@vger.kernel.org/msg23982.html
    http://www.mail-archive.com/kvm@vger.kernel.org/msg23860.html

    Signed-off-by: Ryota Ozaki
    Acked-by: Avi Kivity
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ryota Ozaki
     

19 Jun, 2009

1 commit

  • Statistics for softirq doesn't exist.
    It will be helpful like statistics for interrupts.
    This patch introduces counting the number of softirq,
    which will be exported in /proc/softirqs.

    When softirq handler consumes much CPU time,
    /proc/stat is like the following.

    $ while :; do cat /proc/stat | head -n1 ; sleep 10 ; done
    cpu 88 0 408 739665 583 28 2 0 0
    cpu 450 0 1090 740970 594 28 1294 0 0
    ^^^^
    softirq

    In such a situation,
    /proc/softirqs shows us which softirq handler is invoked.
    We can see the increase rate of softirqs.

    $ cat /proc/softirqs
    CPU0 CPU1 CPU2 CPU3
    HI 0 0 0 0
    TIMER 462850 462805 462782 462718
    NET_TX 0 0 0 365
    NET_RX 2472 2 2 40
    BLOCK 0 0 381 1164
    TASKLET 0 0 0 224
    SCHED 462654 462689 462698 462427
    RCU 3046 2423 3367 3173

    $ cat /proc/softirqs
    CPU0 CPU1 CPU2 CPU3
    HI 0 0 0 0
    TIMER 463361 465077 465056 464991
    NET_TX 53 0 1 365
    NET_RX 3757 2 2 40
    BLOCK 0 0 398 1170
    TASKLET 0 0 0 224
    SCHED 463074 464318 464612 463330
    RCU 3505 2948 3947 3673

    When CPU TIME of softirq is high,
    the rates of increase is the following.
    TIMER : 220/sec : CPU1-3
    NET_TX : 5/sec : CPU0
    NET_RX : 120/sec : CPU0
    SCHED : 40-200/sec : all CPU
    RCU : 45-58/sec : all CPU

    The rates of increase in an idle mode is the following.
    TIMER : 250/sec
    SCHED : 250/sec
    RCU : 2/sec

    It seems many softirqs for receiving packets and rcu are invoked. This
    gives us help for checking system.

    Signed-off-by: Keika Kobayashi
    Reviewed-by: Hiroshi Shimamoto
    Reviewed-by: KOSAKI Motohiro
    Cc: Ingo Molnar
    Cc: Eric Dumazet
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keika Kobayashi
     

21 Apr, 2009

1 commit


07 Apr, 2009

1 commit

  • Now that all the task runtime clock users are gone, remove the ugly
    rq->lock usage from perf counters, which solves the nasty deadlock
    seen when a software task clock counter was read from an NMI overflow
    context.

    Signed-off-by: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Corey Ashford
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

06 Apr, 2009

1 commit

  • Merge reason: we have gathered quite a few conflicts, need to merge upstream

    Conflicts:
    arch/powerpc/kernel/Makefile
    arch/x86/ia32/ia32entry.S
    arch/x86/include/asm/hardirq.h
    arch/x86/include/asm/unistd_32.h
    arch/x86/include/asm/unistd_64.h
    arch/x86/kernel/cpu/common.c
    arch/x86/kernel/irq.c
    arch/x86/kernel/syscall_table_32.S
    arch/x86/mm/iomap_32.c
    include/linux/sched.h
    kernel/Makefile

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

22 Jan, 2009

1 commit

  • David Miller suggested, related to a kstat_irqs related build breakage:

    > Either linux/kernel_stat.h provides the kstat_incr_irqs_this_cpu
    > interface or linux/irq.h does, not both.

    So move them to kernel_stat.h.

    Signed-off-by: Ingo Molnar

    Yinghai Lu
     

11 Jan, 2009

2 commits


31 Dec, 2008

2 commits

  • The cpu time spent by the idle process actually doing something is
    currently accounted as idle time. This is plain wrong, the architectures
    that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
    time spent doing nothing and the time spent by idle doing work. The first
    is accounted with account_idle_time and the second with account_system_time.
    The architectures that use the account_xxx_time interface directly and not
    the account_xxx_ticks interface now need to do the check for the idle
    process in their arch code. In particular to improve the system vs true
    idle time accounting the arch code needs to measure the true idle time
    instead of just testing for the idle process.
    To improve the tick based accounting as well we would need an architecture
    primitive that can tell us if the pt_regs of the interrupted context
    points to the magic instruction that halts the cpu.

    In addition idle time is no more added to the stime of the idle process.
    This field now contains the system time of the idle process as it should
    be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
    every tick that occurs while idle is running will be accounted as idle
    time.

    This patch contains the necessary common code changes to be able to
    distinguish idle system time and true idle time. The architectures with
    support for VIRT_CPU_ACCOUNTING need some changes to exploit this.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • The utimescaled / stimescaled fields in the task structure and the
    global cpustat should be set on all architectures. On s390 the calls
    to account_user_time_scaled and account_system_time_scaled never have
    been added. In addition system time that is accounted as guest time
    to the user time of a process is accounted to the scaled system time
    instead of the scaled user time.
    To fix the bugs and to prevent future forgetfulness this patch merges
    account_system_time_scaled into account_system_time and
    account_user_time_scaled into account_user_time.

    Cc: Benjamin Herrenschmidt
    Cc: Hidetoshi Seto
    Cc: Tony Luck
    Cc: Jeremy Fitzhardinge
    Cc: Chris Wright
    Cc: Michael Neuling
    Acked-by: Paul Mackerras
    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

23 Dec, 2008

1 commit


08 Dec, 2008

1 commit

  • Impact: new feature

    Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
    NR_CPUS set to large values. The goal is to be able to scale up to much
    larger NR_IRQS value without impacting the (important) common case.

    To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
    irq_desc pointers.

    When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
    this also makes the IRQ descriptors NUMA-local (to the site that calls
    request_irq()).

    This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
    uses desc->chip_data for x86 to store irq_cfg.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     

21 Oct, 2008

1 commit

  • …/git/tip/linux-2.6-tip

    This merges branches irq/genirq, irq/sparseirq-v4, timers/hpet-percpu
    and x86/uv.

    The sparseirq branch is just preliminary groundwork: no sparse IRQs are
    actually implemented by this tree anymore - just the new APIs are added
    while keeping the old way intact as well (the new APIs map 1:1 to
    irq_desc[]). The 'real' sparse IRQ support will then be a relatively
    small patch ontop of this - with a v2.6.29 merge target.

    * 'genirq-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (178 commits)
    genirq: improve include files
    intr_remapping: fix typo
    io_apic: make irq_mis_count available on 64-bit too
    genirq: fix name space collisions of nr_irqs in arch/*
    genirq: fix name space collision of nr_irqs in autoprobe.c
    genirq: use iterators for irq_desc loops
    proc: fixup irq iterator
    genirq: add reverse iterator for irq_desc
    x86: move ack_bad_irq() to irq.c
    x86: unify show_interrupts() and proc helpers
    x86: cleanup show_interrupts
    genirq: cleanup the sparseirq modifications
    genirq: remove artifacts from sparseirq removal
    genirq: revert dynarray
    genirq: remove irq_to_desc_alloc
    genirq: remove sparse irq code
    genirq: use inline function for irq_to_desc
    genirq: consolidate nr_irqs and for_each_irq_desc()
    x86: remove sparse irq from Kconfig
    genirq: define nr_irqs for architectures with GENERIC_HARDIRQS=n
    ...

    Linus Torvalds
     

16 Oct, 2008

5 commits


23 Sep, 2008

1 commit

  • This is the second resubmission of the posix timer rework patch, posted
    a few days ago.

    This includes the changes from the previous resubmittion, which addressed
    Oleg Nesterov's comments, removing the RCU stuff from the patch and
    un-inlining the thread_group_cputime() function for SMP.

    In addition, per Ingo Molnar it simplifies the UP code, consolidating much
    of it with the SMP version and depending on lower-level SMP/UP handling to
    take care of the differences.

    It also cleans up some UP compile errors, moves the scheduler stats-related
    macros into kernel/sched_stats.h, cleans up a merge error in
    kernel/fork.c and has a few other minor fixes and cleanups as suggested
    by Oleg and Ingo. Thanks for the review, guys.

    Signed-off-by: Frank Mayhar
    Cc: Roland McGrath
    Cc: Alexey Dobriyan
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar

    Frank Mayhar
     

13 May, 2008

1 commit

  • On machines with very large numbers of cpus, tables that are dimensioned
    by NR_IRQS get very large, especially the irq_desc table. They are also
    very sparsely used. When the cpu count is > MAX_IO_APICS, use MAX_IO_APICS
    to set NR_IRQS, otherwise use NR_CPUS.

    Signed-off-by: Alan Mayer
    Reviewed-by: Christoph Lameter
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Alan Mayer
     

19 Oct, 2007

1 commit

  • This adds items to the taststats struct to account for user and system
    time based on scaling the CPU frequency and instruction issue rates.

    Adds account_(user|system)_time_scaled callbacks which architectures
    can use to account for time using this mechanism.

    Signed-off-by: Michael Neuling
    Cc: Balbir Singh
    Cc: Jay Lan
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Neuling
     

15 Oct, 2007

1 commit


26 Apr, 2006

1 commit


29 Mar, 2006

1 commit


07 Nov, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds