23 Aug, 2018

1 commit

  • Number of CPUs is never high enough to force 64-bit arithmetic.
    Save couple of bytes on x86_64.

    Link: http://lkml.kernel.org/r/20180627200710.GC18434@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

02 Mar, 2017

2 commits

  • …linux/sched/cputime.h>

    Introduce a trivial, mostly empty <linux/sched/cputime.h> header
    to prepare for the moving of cputime functionality out of sched.h.

    Update all code that relies on these facilities.

    Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • We are going to split out of , which
    will have to be picked up from other headers and a couple of .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

01 Feb, 2017

2 commits

  • This way we don't need to deal with cputime_t details from the core code.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Fenghua Yu
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-32-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Kernel CPU stats are stored in cputime_t which is an architecture
    defined type, and hence a bit opaque and requiring accessors and mutators
    for any operation.

    Converting them to nsecs simplifies the code and is one step toward
    the removal of cputime_t in the core code.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-4-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

08 Oct, 2016

1 commit

  • Allow some seq_puts removals by taking a string instead of a single
    char.

    [akpm@linux-foundation.org: update vmstat_show(), per Joe]
    Link: http://lkml.kernel.org/r/667e1cf3d436de91a5698170a1e98d882905e956.1470704995.git.joe@perches.com
    Signed-off-by: Joe Perches
    Cc: Joe Perches
    Cc: Andi Kleen
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

03 Aug, 2016

1 commit

  • /proc/stat shows (among lots of other things) the current boottime (i.e.
    number of seconds since boot). While a 32-bit number is sufficient for
    this particular case, we want to get rid of the 'struct timespec'
    suffers from a 32-bit overflow in 2038.

    This changes the code to use a struct timespec64, which is known to be
    safe in all cases.

    Link: http://lkml.kernel.org/r/20160617201247.2292101-1-arnd@arndb.de
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     

13 Dec, 2014

1 commit

  • Since the rework of the sparse interrupt code to actually free the
    unused interrupt descriptors there exists a race between the /proc
    interfaces to the irq subsystem and the code which frees the interrupt
    descriptor.

    CPU0 CPU1
    show_interrupts()
    desc = irq_to_desc(X);
    free_desc(desc)
    remove_from_radix_tree();
    kfree(desc);
    raw_spinlock_irq(&desc->lock);

    /proc/interrupts is the only interface which can actively corrupt
    kernel memory via the lock access. /proc/stat can only read from freed
    memory. Extremly hard to trigger, but possible.

    The interfaces in /proc/irq/N/ are not affected by this because the
    removal of the proc file is serialized in procfs against concurrent
    readers/writers. The removal happens before the descriptor is freed.

    For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue
    as the descriptor is never freed. It's merely cleared out with the irq
    descriptor lock held. So any concurrent proc access will either see
    the old correct value or the cleared out ones.

    Protect the lookup and access to the irq descriptor in
    show_interrupts() with the sparse_irq_lock.

    Provide kstat_irqs_usr() which is protecting the lookup and access
    with sparse_irq_lock and switch /proc/stat to use it.

    Document the existing kstat_irqs interfaces so it's clear that the
    caller needs to take care about protection. The users of these
    interfaces are either not affected due to SPARSE_IRQ=n or already
    protected against removal.

    Fixes: 1f5a5b87f78f "genirq: Implement a sane sparse_irq allocator"
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org

    Thomas Gleixner
     

04 Jul, 2014

1 commit

  • These two patches are supposed to "fix" failed order-4 memory
    allocations which have been observed when reading /proc/stat. The
    problem has been observed on s390 as well as on x86.

    To address the problem change the seq_file memory allocations to
    fallback to use vmalloc, so that allocations also work if memory is
    fragmented.

    This approach seems to be simpler and less intrusive than changing
    /proc/stat to use an interator. Also it "fixes" other users as well,
    which use seq_file's single_open() interface.

    This patch (of 2):

    Use seq_file's single_open_size() to preallocate a buffer that is large
    enough to hold the whole output, instead of open coding it. Also
    calculate the requested size using the number of online cpus instead of
    possible cpus, since the size of the output only depends on the number
    of online cpus.

    Signed-off-by: Heiko Carstens
    Acked-by: David Rientjes
    Cc: Ian Kent
    Cc: Hendrik Brueckner
    Cc: Thorsten Diehl
    Cc: Andrea Righi
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Stefan Bader
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     

13 Mar, 2014

1 commit

  • The architectures that override cputime_t (s390, ppc) don't provide
    any version of nsecs_to_cputime(). Indeed this cputime_t implementation
    by backend only happens when CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y under
    which the core code doesn't make any use of nsecs_to_cputime().

    At least for now.

    We are going to make a broader use of it so lets provide a default
    version with a per usecs granularity. It should be good enough for most
    usecases.

    Cc: Ingo Molnar
    Cc: Marcelo Tosatti
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Acked-by: Rik van Riel
    Signed-off-by: Frederic Weisbecker

    Frederic Weisbecker
     

24 Jan, 2014

1 commit

  • PROC_FS is a bool, so this code is either present or absent. It will
    never be modular, so using module_init as an alias for __initcall is
    rather misleading.

    Fix this up now, so that we can relocate module_init from init.h into
    module.h in the future. If we don't do this, we'd have to add module.h to
    obviously non-modular code, and that would be ugly at best.

    Note that direct use of __initcall is discouraged, vs. one of the
    priority categorized subgroups. As __initcall gets mapped onto
    device_initcall, our use of fs_initcall (which makes sense for fs code)
    will thus change these registrations from level 6-device to level 5-fs
    (i.e. slightly earlier). However no observable impact of that small
    difference has been observed during testing, or is expected.

    Also note that this change uncovers a missing semicolon bug in the
    registration of vmcore_init as an initcall.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Gortmaker
     

01 Feb, 2013

1 commit


10 Oct, 2012

1 commit

  • Git commit 09a1d34f8535ecf9 "nohz: Make idle/iowait counter update
    conditional" introduced a bug in regard to cpu hotplug. The effect is
    that the number of idle ticks in the cpu summary line in /proc/stat is
    still counting ticks for offline cpus.

    Reproduction is easy, just start a workload that keeps all cpus busy,
    switch off one or more cpus and then watch the idle field in top.
    On a dual-core with one cpu 100% busy and one offline cpu you will get
    something like this:

    %Cpu(s): 48.7 us, 1.3 sy, 0.0 ni, 50.0 id, 0.0 wa, 0.0 hi, 0.0 si,
    %0.0 st

    The problem is that an offline cpu still has ts->idle_active == 1.
    To fix this we should make sure that the cpu is online when calling
    get_cpu_idle_time_us and get_cpu_iowait_time_us.

    [Srivatsa: Rebased to current mainline]

    Reported-by: Martin Schwidefsky
    Signed-off-by: Michal Hocko
    Reviewed-by: Srivatsa S. Bhat
    Signed-off-by: Srivatsa S. Bhat
    Link: http://lkml.kernel.org/r/20121010061820.8999.57245.stgit@srivatsabhat.in.ibm.com
    Cc: deepthi@linux.vnet.ibm.com
    Cc: stable@vger.kernel.org
    Signed-off-by: Thomas Gleixner

    Michal Hocko
     

30 Mar, 2012

1 commit

  • Git commit a25cac5198d4ff28 "proc: Consider NO_HZ when printing idle and
    iowait times" changes the code for /proc/stat to use get_cpu_idle_time_us
    and get_cpu_iowait_time_us if the system is running with nohz enabled.
    For architectures which define arch_idle_time (currently s390 only)
    this is a change for the worse. The result of arch_idle_time is supposed
    to be the exact sleep time of the target cpu and should be used instead
    of the value kept by the scheduler.

    Signed-off-by: Martin Schwidefsky
    Reviewed-by: Michal Hocko
    Reviewed-by: Srivatsa S. Bhat
    Link: http://lkml.kernel.org/r/20120330122308.18720283@de.ibm.com
    Signed-off-by: Thomas Gleixner

    Martin Schwidefsky
     

24 Mar, 2012

2 commits

  • == stat_check.py
    num = 0
    with open("/proc/stat") as f:
    while num < 1000 :
    data = f.read()
    f.seek(0, 0)
    num = num + 1
    ==

    perf shows

    20.39% stat_check.py [kernel.kallsyms] [k] format_decode
    13.41% stat_check.py [kernel.kallsyms] [k] number
    12.61% stat_check.py [kernel.kallsyms] [k] vsnprintf
    10.85% stat_check.py [kernel.kallsyms] [k] memcpy
    4.85% stat_check.py [kernel.kallsyms] [k] radix_tree_lookup
    4.43% stat_check.py [kernel.kallsyms] [k] seq_printf

    This patch removes most of calls to vsnprintf() by adding num_to_str()
    and seq_print_decimal_ull(), which prints decimal numbers without rich
    functions provided by printf().

    On my 8cpu box.
    == Before patch ==
    [root@bluextal test]# time ./stat_check.py

    real 0m0.150s
    user 0m0.026s
    sys 0m0.121s

    == After patch ==
    [root@bluextal test]# time ./stat_check.py

    real 0m0.055s
    user 0m0.022s
    sys 0m0.030s

    [akpm@linux-foundation.org: remove incorrect comment, use less statck in num_to_str(), move comment from .h to .c, simplify seq_put_decimal_ull()]
    [andrea@betterlinux.com: avoid breaking the ABI in /proc/stat]
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrea Righi
    Cc: Eric Dumazet
    Cc: Glauber Costa
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Paul Turner
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • On a typical 16 cpus machine, "cat /proc/stat" gives more than 4096 bytes,
    and is slow :

    # strace -T -o /tmp/STRACE cat /proc/stat | wc -c
    5826
    # grep "cpu " /tmp/STRACE
    read(0, "cpu 1949310 19 2144714 12117253"..., 32768) = 5826

    Thats partly because show_stat() must be called twice since initial
    buffer size is too small (4096 bytes for less than 32 possible cpus)

    Fix this by :

    1) Taking into account nr_irqs in the initial buffer sizing.

    2) Using ksize() to allow better filling of initial buffer.

    Signed-off-by: Eric Dumazet
    Cc: Glauber Costa
    Cc: Russell King - ARM Linux
    Cc: KAMEZAWA Hiroyuki
    Cc: Paul Turner
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

16 Jan, 2012

1 commit

  • Commit 3292beb340c7688 ("sched/accounting: Change cpustat fields to an array")
    deleted the code which provides us with the sum of all interrupts in the
    system, causing vmstat to report zero interrupts occuring in the system.

    Fix this by restoring the code.

    Signed-off-by: Russell King
    Tested-by: Russell King # [on ARM]
    Tested-by: Tony Luck
    Tested-by: Steven Rostedt
    Cc: Glauber Costa
    Cc: KAMEZAWA Hiroyuki
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Paul Tuner
    Cc: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Russell King
     

07 Jan, 2012

1 commit

  • * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
    sched/tracing: Add a new tracepoint for sleeptime
    sched: Disable scheduler warnings during oopses
    sched: Fix cgroup movement of waking process
    sched: Fix cgroup movement of newly created process
    sched: Fix cgroup movement of forking process
    sched: Remove cfs bandwidth period check in tg_set_cfs_period()
    sched: Fix load-balance lock-breaking
    sched: Replace all_pinned with a generic flags field
    sched: Only queue remote wakeups when crossing cache boundaries
    sched: Add missing rcu_dereference() around ->real_parent usage
    [S390] fix cputime overflow in uptime_proc_show
    [S390] cputime: add sparse checking and cleanup
    sched: Mark parent and real_parent as __rcu
    sched, nohz: Fix missing RCU read lock
    sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer
    sched, nohz: Fix the idle cpu check in nohz_idle_balance
    sched: Use jump_labels for sched_feat
    sched/accounting: Fix parameter passing in task_group_account_field
    sched/accounting: Fix user/system tick double accounting
    sched/accounting: Re-use scheduler statistics for the root cgroup
    ...

    Fix up conflicts in
    - arch/ia64/include/asm/cputime.h, include/asm-generic/cputime.h
    usecs_to_cputime64() vs the sparse cleanups
    - kernel/sched/fair.c, kernel/time/tick-sched.c
    scheduler changes in multiple branches

    Linus Torvalds
     

30 Dec, 2011

1 commit

  • Commit 2a95ea6c0d129b4 ("procfs: do not overflow get_{idle,iowait}_time
    for nohz") did not take into account that one some architectures jiffies
    and cputime use different units.

    This causes get_idle_time() to return numbers in the wrong units, making
    the idle time fields in /proc/stat wrong.

    Instead of converting the usec value returned by
    get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function
    usecs_to_cputime64 to convert it to the correct unit of cputime64_t.

    Signed-off-by: Andreas Schwab
    Acked-by: Michal Hocko
    Cc: Arnd Bergmann
    Cc: "Artem S. Tashkinov"
    Cc: Dave Jones
    Cc: Alexey Dobriyan
    Cc: Thomas Gleixner
    Cc: "Luck, Tony"
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Schwab
     

15 Dec, 2011

1 commit


09 Dec, 2011

1 commit

  • Since commit a25cac5198d4 ("proc: Consider NO_HZ when printing idle and
    iowait times") we are reporting idle/io_wait time also while a CPU is
    tickless. We rely on get_{idle,iowait}_time functions to retrieve
    proper data.

    These functions, however, use usecs_to_cputime to translate micro
    seconds time to cputime64_t. This is just an alias to usecs_to_jiffies
    which reduces the data type from u64 to unsigned int and also checks
    whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET)
    and returns MAX_JIFFY_OFFSET in that case.

    When we overflow depends on CONFIG_HZ but especially for CONFIG_HZ_300
    it is quite low (1431649781) so we are getting MAX_JIFFY_OFFSET for
    >3000s! until we overflow unsigned int. Just for reference
    CONFIG_HZ_100 has an overflow window around 20s, CONFIG_HZ_250 ~8s and
    CONFIG_HZ_1000 ~2s.

    This results in a bug when people saw [h]top going mad reporting 100%
    CPU usage even though there was basically no CPU load. The reason was
    simply that /proc/stat stopped reporting idle/io_wait changes (and
    reported MAX_JIFFY_OFFSET) and so the only change happening was for user
    system time.

    Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision
    to 32b type and it is much more appropriate for cumulative time values
    (unlike usecs_to_jiffies which intended for timeout calculations).

    Signed-off-by: Michal Hocko
    Tested-by: Artem S. Tashkinov
    Cc: Dave Jones
    Cc: Arnd Bergmann
    Cc: Alexey Dobriyan
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

06 Dec, 2011

1 commit

  • This patch changes fields in cpustat from a structure, to an
    u64 array. Math gets easier, and the code is more flexible.

    Signed-off-by: Glauber Costa
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Paul Tuner
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1322498719-2255-2-git-send-email-glommer@parallels.com
    Signed-off-by: Ingo Molnar

    Glauber Costa
     

08 Sep, 2011

1 commit

  • show_stat handler of the /proc/stat file relies on kstat_cpu(cpu)
    statistics when priting information about idle and iowait times.
    This is OK if we are not using tickless kernel (CONFIG_NO_HZ) because
    counters are updated periodically.
    With NO_HZ things got more tricky because we are not doing idle/iowait
    accounting while we are tickless so the value might get outdated.
    Users of /proc/stat will notice that by unchanged idle/iowait values
    which is then interpreted as 0% idle/iowait time. From the user space
    POV this is an unexpected behavior and a change of the interface.

    Let's fix this by using get_cpu_{idle,iowait}_time_us which accounts the
    total idle/iowait time since boot and it doesn't rely on sampling or any
    other periodic activity. Fall back to the previous behavior if NO_HZ is
    disabled or not configured.

    Signed-off-by: Michal Hocko
    Cc: Dave Jones
    Cc: Arnd Bergmann
    Cc: Alexey Dobriyan
    Link: http://lkml.kernel.org/r/39181366adac1b39cb6aa3cd53ff0f7c78d32676.1314172057.git.mhocko@suse.cz
    Signed-off-by: Thomas Gleixner

    Michal Hocko
     

27 May, 2011

1 commit


14 Jan, 2011

1 commit

  • For string without format specifiers, use seq_puts().
    For seq_printf("\n"), use seq_putc('\n').

    text data bss dec hex filename
    61866 488 112 62466 f402 fs/proc/proc.o
    61729 488 112 62329 f379 fs/proc/proc.o
    ----------------------------------------------------
    -139

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

28 Oct, 2010

2 commits

  • In /proc/stat, the number of per-IRQ event is shown by making a sum each
    irq's events on all cpus. But we can make use of kstat_irqs().

    kstat_irqs() do the same calculation, If !CONFIG_GENERIC_HARDIRQ,
    it's not a big cost. (Both of the number of cpus and irqs are small.)

    If a system is very big and CONFIG_GENERIC_HARDIRQ, it does

    for_each_irq()
    for_each_cpu()
    - look up a radix tree
    - read desc->irq_stat[cpu]
    This seems not efficient. This patch adds kstat_irqs() for
    CONFIG_GENRIC_HARDIRQ and change the calculation as

    for_each_irq()
    look up radix tree
    for_each_cpu()
    - read desc->irq_stat[cpu]

    This reduces cost.

    A test on (4096cpusp, 256 nodes, 4592 irqs) host (by Jack Steiner)

    %time cat /proc/stat > /dev/null

    Before Patch: 2.459 sec
    After Patch : .561 sec

    [akpm@linux-foundation.org: unexport kstat_irqs, coding-style tweaks]
    [akpm@linux-foundation.org: fix unused variable 'per_irq_sum']
    Signed-off-by: KAMEZAWA Hiroyuki
    Tested-by: Jack Steiner
    Acked-by: Jack Steiner
    Cc: Yinghai Lu
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • /proc/stat shows the total number of all interrupts to each cpu. But when
    the number of IRQs are very large, it take very long time and 'cat
    /proc/stat' takes more than 10 secs. This is because sum of all irq
    events are counted when /proc/stat is read. This patch adds "sum of all
    irq" counter percpu and reduce read costs.

    The cost of reading /proc/stat is important because it's used by major
    applications as 'top', 'ps', 'w', etc....

    A test on a mechin (4096cpu, 256 nodes, 4592 irqs) shows

    %time cat /proc/stat > /dev/null
    Before Patch: 12.627 sec
    After Patch: 2.459 sec

    Signed-off-by: KAMEZAWA Hiroyuki
    Tested-by: Jack Steiner
    Acked-by: Jack Steiner
    Cc: Yinghai Lu
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

26 Oct, 2009

1 commit

  • CPU time of a guest is always accounted in 'user' time
    without concern for the nice value of its counterpart
    process although the guest is scheduled under the nice
    value.

    This patch fixes the defect and accounts cpu time of
    a niced guest in 'nice' time as same as a niced process.

    And also the patch adds 'guest_nice' to cpuacct. The
    value provides niced guest cpu time which is like 'nice'
    to 'user'.

    The original discussions can be found here:

    http://www.mail-archive.com/kvm@vger.kernel.org/msg23982.html
    http://www.mail-archive.com/kvm@vger.kernel.org/msg23860.html

    Signed-off-by: Ryota Ozaki
    Acked-by: Avi Kivity
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ryota Ozaki
     

19 Jun, 2009

1 commit

  • Export statistics for softirq in /proc/softirqs and /proc/stat.

    1. /proc/softirqs
    Implement /proc/softirqs which shows the number of softirq
    for each CPU like /proc/interrupts.

    2. /proc/stat
    Add the "softirq" line to /proc/stat.
    This line shows the number of softirq for all cpu.
    The first column is the total of all softirqs and
    each subsequent column is the total for particular softirq.

    [kosaki.motohiro@jp.fujitsu.com: remove redundant for_each_possible_cpu() loop]
    Signed-off-by: Keika Kobayashi
    Reviewed-by: Hiroshi Shimamoto
    Cc: KOSAKI Motohiro
    Cc: Ingo Molnar
    Cc: Eric Dumazet
    Cc: Alexey Dobriyan
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keika Kobayashi
     

23 Apr, 2009

1 commit

  • The cpu idle field in the output of /proc/stat is too small for cpus
    that have been idle for more than a tick. Add the architecture hook
    arch_idle_time that allows to add the not accounted idle time of a
    sleeping cpu without waking the cpu.

    The s390 implementation of arch_idle_time uses the already existing
    s390_idle_data per_cpu variable to find the sleep time of a neighboring
    idle cpu.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

26 Dec, 2008

1 commit


16 Dec, 2008

1 commit

  • Impact: restructure code to fix compiler warning

    commit 240d367b4e6c6e3c5075e034db14dba60a6f5fa7 moved desc usage point
    into #ifdef CONFIG_SPARSE_IRQ.

    Eliminate the desc variable, otherwise following warning happens:

    fs/proc/stat.c: In function 'show_stat':
    fs/proc/stat.c:31: warning: unused variable 'desc'

    [ akpm: cleaned up the patch to remove #ifdef ]

    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    KOSAKI Motohiro
     

09 Dec, 2008

1 commit

  • Impact: build fix on Alpha

    -tip testing found this build failure on the Alpha defconfig:

    /home/mingo/tip/fs/proc/stat.c: In function 'show_stat':
    /home/mingo/tip/fs/proc/stat.c:48: error: implicit declaration of function 'for_each_irq_desc'
    /home/mingo/tip/fs/proc/stat.c:48: error: expected ';' before '{' token

    can not use irq_desc() in stat.c on older architectures.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     

08 Dec, 2008

1 commit

  • Impact: new feature

    Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
    NR_CPUS set to large values. The goal is to be able to scale up to much
    larger NR_IRQS value without impacting the (important) common case.

    To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
    irq_desc pointers.

    When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
    this also makes the IRQ descriptors NUMA-local (to the site that calls
    request_irq()).

    This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
    uses desc->chip_data for x86 to store irq_cfg.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     

23 Oct, 2008

1 commit