Eric Lee / smarc-fsl-linux-kernel

23 Aug, 2018

1 commit

9a27e97aa proc: use "unsigned int" in /proc/stat hook ... Browse Code »

Number of CPUs is never high enough to force 64-bit arithmetic.
Save couple of bytes on x86_64.

Link: http://lkml.kernel.org/r/20180627200710.GC18434@avx2
Signed-off-by: Alexey Dobriyan
Reviewed-by: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2018-08-23 01:52:46 +0800

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-11-02 18:10:55 +0800

02 Mar, 2017

2 commits

32ef5517c sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <… ... Browse Code »

…linux/sched/cputime.h>

Introduce a trivial, mostly empty <linux/sched/cputime.h> header
to prepare for the moving of cputime functionality out of sched.h.

Update all code that relies on these facilities.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2017-03-02 15:42:39 +0800
03441a348 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/stat.h> ... Browse Code »

We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-03-02 15:42:34 +0800

01 Feb, 2017

2 commits

42b425b33 s390, sched/cputime: Make arch_cpu_idle_time() to return nsecs ... Browse Code »

This way we don't need to deal with cputime_t details from the core code.

Signed-off-by: Frederic Weisbecker
Cc: Benjamin Herrenschmidt
Cc: Fenghua Yu
Cc: Heiko Carstens
Cc: Linus Torvalds
Cc: Martin Schwidefsky
Cc: Michael Ellerman
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Stanislaw Gruszka
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Wanpeng Li
Link: http://lkml.kernel.org/r/1485832191-26889-32-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2017-02-01 16:14:03 +0800
7fb1327ee sched/cputime: Convert kcpustat to nsecs ... Browse Code »

Kernel CPU stats are stored in cputime_t which is an architecture
defined type, and hence a bit opaque and requiring accessors and mutators
for any operation.

Converting them to nsecs simplifies the code and is one step toward
the removal of cputime_t in the core code.

Signed-off-by: Frederic Weisbecker
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: Tony Luck
Cc: Fenghua Yu
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Stanislaw Gruszka
Cc: Wanpeng Li
Link: http://lkml.kernel.org/r/1485832191-26889-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2017-02-01 16:13:47 +0800

08 Oct, 2016

1 commit

75ba1d07f seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char ... Browse Code »

Allow some seq_puts removals by taking a string instead of a single
char.

[akpm@linux-foundation.org: update vmstat_show(), per Joe]
Link: http://lkml.kernel.org/r/667e1cf3d436de91a5698170a1e98d882905e956.1470704995.git.joe@perches.com
Signed-off-by: Joe Perches
Cc: Joe Perches
Cc: Andi Kleen
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2016-10-08 09:46:30 +0800

03 Aug, 2016

1 commit

519ded5a8 procfs: avoid 32-bit time_t in /proc/*/stat ... Browse Code »

/proc/stat shows (among lots of other things) the current boottime (i.e.
number of seconds since boot). While a 32-bit number is sufficient for
this particular case, we want to get rid of the 'struct timespec'
suffers from a 32-bit overflow in 2038.

This changes the code to use a struct timespec64, which is known to be
safe in all cases.

Link: http://lkml.kernel.org/r/20160617201247.2292101-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arnd Bergmann
2016-08-03 05:31:41 +0800

13 Dec, 2014

1 commit

c291ee622 genirq: Prevent proc race against freeing of irq descriptors ... Browse Code »

Since the rework of the sparse interrupt code to actually free the
unused interrupt descriptors there exists a race between the /proc
interfaces to the irq subsystem and the code which frees the interrupt
descriptor.

CPU0 CPU1
show_interrupts()
desc = irq_to_desc(X);
free_desc(desc)
remove_from_radix_tree();
kfree(desc);
raw_spinlock_irq(&desc->lock);

/proc/interrupts is the only interface which can actively corrupt
kernel memory via the lock access. /proc/stat can only read from freed
memory. Extremly hard to trigger, but possible.

The interfaces in /proc/irq/N/ are not affected by this because the
removal of the proc file is serialized in procfs against concurrent
readers/writers. The removal happens before the descriptor is freed.

For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue
as the descriptor is never freed. It's merely cleared out with the irq
descriptor lock held. So any concurrent proc access will either see
the old correct value or the cleared out ones.

Protect the lookup and access to the irq descriptor in
show_interrupts() with the sparse_irq_lock.

Provide kstat_irqs_usr() which is protecting the lookup and access
with sparse_irq_lock and switch /proc/stat to use it.

Document the existing kstat_irqs interfaces so it's clear that the
caller needs to take care about protection. The users of these
interfaces are either not affected due to SPARSE_IRQ=n or already
protected against removal.

Fixes: 1f5a5b87f78f "genirq: Implement a sane sparse_irq allocator"
Signed-off-by: Thomas Gleixner
Cc: stable@vger.kernel.org

Thomas Gleixner
2014-12-13 20:33:07 +0800

04 Jul, 2014

1 commit

f74373a5c /proc/stat: convert to single_open_size() ... Browse Code »

These two patches are supposed to "fix" failed order-4 memory
allocations which have been observed when reading /proc/stat. The
problem has been observed on s390 as well as on x86.

To address the problem change the seq_file memory allocations to
fallback to use vmalloc, so that allocations also work if memory is
fragmented.

This approach seems to be simpler and less intrusive than changing
/proc/stat to use an interator. Also it "fixes" other users as well,
which use seq_file's single_open() interface.

This patch (of 2):

Use seq_file's single_open_size() to preallocate a buffer that is large
enough to hold the whole output, instead of open coding it. Also
calculate the requested size using the number of online cpus instead of
possible cpus, since the size of the output only depends on the number
of online cpus.

Signed-off-by: Heiko Carstens
Acked-by: David Rientjes
Cc: Ian Kent
Cc: Hendrik Brueckner
Cc: Thorsten Diehl
Cc: Andrea Righi
Cc: Christoph Hellwig
Cc: Al Viro
Cc: Stefan Bader
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heiko Carstens
2014-07-04 00:21:54 +0800

13 Mar, 2014

1 commit

bfc3f0281 cputime: Default implementation of nsecs -> cputime conversion ... Browse Code »

The architectures that override cputime_t (s390, ppc) don't provide
any version of nsecs_to_cputime(). Indeed this cputime_t implementation
by backend only happens when CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y under
which the core code doesn't make any use of nsecs_to_cputime().

At least for now.

We are going to make a broader use of it so lets provide a default
version with a per usecs granularity. It should be good enough for most
usecases.

Cc: Ingo Molnar
Cc: Marcelo Tosatti
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Acked-by: Rik van Riel
Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2014-03-13 22:56:43 +0800

24 Jan, 2014

1 commit

abaf3787a fs/proc: don't use module_init for non-modular core code ... Browse Code »

PROC_FS is a bool, so this code is either present or absent. It will
never be modular, so using module_init as an alias for __initcall is
rather misleading.

Fix this up now, so that we can relocate module_init from init.h into
module.h in the future. If we don't do this, we'd have to add module.h to
obviously non-modular code, and that would be ugly at best.

Note that direct use of __initcall is discouraged, vs. one of the
priority categorized subgroups. As __initcall gets mapped onto
device_initcall, our use of fs_initcall (which makes sense for fs code)
will thus change these registrations from level 6-device to level 5-fs
(i.e. slightly earlier). However no observable impact of that small
difference has been observed during testing, or is expected.

Also note that this change uncovers a missing semicolon bug in the
registration of vmcore_init as an initcall.

Signed-off-by: Paul Gortmaker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Gortmaker
2014-01-24 08:37:02 +0800

01 Feb, 2013

1 commit

9e5e8deca stat: Use size_t for sizes instead of unsigned ... Browse Code »

On some platforms (such as IA64) the large page size may results in
slab allocations to be allowed of numbers that do not fit in 32 bit.

Acked-by: Glauber Costa
Signed-off-by: Christoph Lameter
Signed-off-by: Pekka Enberg

Christoph Lameter
2013-02-01 18:32:08 +0800

10 Oct, 2012

1 commit

7386cdbf2 nohz: Fix idle ticks in cpu summary line of /proc/stat ... Browse Code »

Git commit 09a1d34f8535ecf9 "nohz: Make idle/iowait counter update
conditional" introduced a bug in regard to cpu hotplug. The effect is
that the number of idle ticks in the cpu summary line in /proc/stat is
still counting ticks for offline cpus.

Reproduction is easy, just start a workload that keeps all cpus busy,
switch off one or more cpus and then watch the idle field in top.
On a dual-core with one cpu 100% busy and one offline cpu you will get
something like this:

%Cpu(s): 48.7 us, 1.3 sy, 0.0 ni, 50.0 id, 0.0 wa, 0.0 hi, 0.0 si,
%0.0 st

The problem is that an offline cpu still has ts->idle_active == 1.
To fix this we should make sure that the cpu is online when calling
get_cpu_idle_time_us and get_cpu_iowait_time_us.

[Srivatsa: Rebased to current mainline]

Reported-by: Martin Schwidefsky
Signed-off-by: Michal Hocko
Reviewed-by: Srivatsa S. Bhat
Signed-off-by: Srivatsa S. Bhat
Link: http://lkml.kernel.org/r/20121010061820.8999.57245.stgit@srivatsabhat.in.ibm.com
Cc: deepthi@linux.vnet.ibm.com
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner

Michal Hocko
2012-10-10 20:05:21 +0800

30 Mar, 2012

1 commit

cb85a6ed6 proc: stats: Use arch_idle_time for idle and iowait times if available ... Browse Code »

Git commit a25cac5198d4ff28 "proc: Consider NO_HZ when printing idle and
iowait times" changes the code for /proc/stat to use get_cpu_idle_time_us
and get_cpu_iowait_time_us if the system is running with nohz enabled.
For architectures which define arch_idle_time (currently s390 only)
this is a change for the worse. The result of arch_idle_time is supposed
to be the exact sleep time of the target cpu and should be used instead
of the value kept by the scheduler.

Signed-off-by: Martin Schwidefsky
Reviewed-by: Michal Hocko
Reviewed-by: Srivatsa S. Bhat
Link: http://lkml.kernel.org/r/20120330122308.18720283@de.ibm.com
Signed-off-by: Thomas Gleixner

Martin Schwidefsky
2012-03-30 21:43:33 +0800

24 Mar, 2012

2 commits

1ac101a5d procfs: add num_to_str() to speed up /proc/stat ... Browse Code »

== stat_check.py
num = 0
with open("/proc/stat") as f:
while num < 1000 :
data = f.read()
f.seek(0, 0)
num = num + 1
==

perf shows

20.39% stat_check.py [kernel.kallsyms] [k] format_decode
13.41% stat_check.py [kernel.kallsyms] [k] number
12.61% stat_check.py [kernel.kallsyms] [k] vsnprintf
10.85% stat_check.py [kernel.kallsyms] [k] memcpy
4.85% stat_check.py [kernel.kallsyms] [k] radix_tree_lookup
4.43% stat_check.py [kernel.kallsyms] [k] seq_printf

This patch removes most of calls to vsnprintf() by adding num_to_str()
and seq_print_decimal_ull(), which prints decimal numbers without rich
functions provided by printf().

On my 8cpu box.
== Before patch ==
[root@bluextal test]# time ./stat_check.py

real 0m0.150s
user 0m0.026s
sys 0m0.121s

== After patch ==
[root@bluextal test]# time ./stat_check.py

real 0m0.055s
user 0m0.022s
sys 0m0.030s

[akpm@linux-foundation.org: remove incorrect comment, use less statck in num_to_str(), move comment from .h to .c, simplify seq_put_decimal_ull()]
[andrea@betterlinux.com: avoid breaking the ABI in /proc/stat]
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrea Righi
Cc: Eric Dumazet
Cc: Glauber Costa
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Paul Turner
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2012-03-24 07:58:42 +0800
59a32e2ce proc: speed up /proc/stat handling ... Browse Code »

On a typical 16 cpus machine, "cat /proc/stat" gives more than 4096 bytes,
and is slow :

# strace -T -o /tmp/STRACE cat /proc/stat | wc -c
5826
# grep "cpu " /tmp/STRACE
read(0, "cpu 1949310 19 2144714 12117253"..., 32768) = 5826

Thats partly because show_stat() must be called twice since initial
buffer size is too small (4096 bytes for less than 32 possible cpus)

Fix this by :

1) Taking into account nr_irqs in the initial buffer sizing.

2) Using ksize() to allow better filling of initial buffer.

Signed-off-by: Eric Dumazet
Cc: Glauber Costa
Cc: Russell King - ARM Linux
Cc: KAMEZAWA Hiroyuki
Cc: Paul Turner
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2012-03-24 07:58:42 +0800

16 Jan, 2012

1 commit

f7e6746eb sched/accounting, proc: Fix /proc/stat interrupts sum ... Browse Code »

Commit 3292beb340c7688 ("sched/accounting: Change cpustat fields to an array")
deleted the code which provides us with the sum of all interrupts in the
system, causing vmstat to report zero interrupts occuring in the system.

Fix this by restoring the code.

Signed-off-by: Russell King
Tested-by: Russell King # [on ARM]
Tested-by: Tony Luck
Tested-by: Steven Rostedt
Cc: Glauber Costa
Cc: KAMEZAWA Hiroyuki
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Paul Tuner
Cc: Peter Zijlstra
Signed-off-by: Ingo Molnar

Russell King
2012-01-16 15:13:27 +0800

07 Jan, 2012

1 commit

0db49b72b Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
sched/tracing: Add a new tracepoint for sleeptime
sched: Disable scheduler warnings during oopses
sched: Fix cgroup movement of waking process
sched: Fix cgroup movement of newly created process
sched: Fix cgroup movement of forking process
sched: Remove cfs bandwidth period check in tg_set_cfs_period()
sched: Fix load-balance lock-breaking
sched: Replace all_pinned with a generic flags field
sched: Only queue remote wakeups when crossing cache boundaries
sched: Add missing rcu_dereference() around ->real_parent usage
[S390] fix cputime overflow in uptime_proc_show
[S390] cputime: add sparse checking and cleanup
sched: Mark parent and real_parent as __rcu
sched, nohz: Fix missing RCU read lock
sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer
sched, nohz: Fix the idle cpu check in nohz_idle_balance
sched: Use jump_labels for sched_feat
sched/accounting: Fix parameter passing in task_group_account_field
sched/accounting: Fix user/system tick double accounting
sched/accounting: Re-use scheduler statistics for the root cgroup
...

Fix up conflicts in
- arch/ia64/include/asm/cputime.h, include/asm-generic/cputime.h
usecs_to_cputime64() vs the sparse cleanups
- kernel/sched/fair.c, kernel/time/tick-sched.c
scheduler changes in multiple branches

Linus Torvalds
2012-01-07 00:44:54 +0800

30 Dec, 2011

1 commit

34845636a procfs: do not confuse jiffies with cputime64_t ... Browse Code »

Commit 2a95ea6c0d129b4 ("procfs: do not overflow get_{idle,iowait}_time
for nohz") did not take into account that one some architectures jiffies
and cputime use different units.

This causes get_idle_time() to return numbers in the wrong units, making
the idle time fields in /proc/stat wrong.

Instead of converting the usec value returned by
get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function
usecs_to_cputime64 to convert it to the correct unit of cputime64_t.

Signed-off-by: Andreas Schwab
Acked-by: Michal Hocko
Cc: Arnd Bergmann
Cc: "Artem S. Tashkinov"
Cc: Dave Jones
Cc: Alexey Dobriyan
Cc: Thomas Gleixner
Cc: "Luck, Tony"
Cc: Benjamin Herrenschmidt
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andreas Schwab
2011-12-30 08:31:57 +0800

15 Dec, 2011

1 commit

6a54aebf6 Merge commit 'v3.2-rc5' into sched/core ... Browse Code »

Merge reason: Pick up the latest fixes.

Signed-off-by: Ingo Molnar

Ingo Molnar
2011-12-15 15:21:30 +0800

09 Dec, 2011

1 commit

2a95ea6c0 procfs: do not overflow get_{idle,iowait}_time for nohz ... Browse Code »

Since commit a25cac5198d4 ("proc: Consider NO_HZ when printing idle and
iowait times") we are reporting idle/io_wait time also while a CPU is
tickless. We rely on get_{idle,iowait}_time functions to retrieve
proper data.

These functions, however, use usecs_to_cputime to translate micro
seconds time to cputime64_t. This is just an alias to usecs_to_jiffies
which reduces the data type from u64 to unsigned int and also checks
whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET)
and returns MAX_JIFFY_OFFSET in that case.

When we overflow depends on CONFIG_HZ but especially for CONFIG_HZ_300
it is quite low (1431649781) so we are getting MAX_JIFFY_OFFSET for
>3000s! until we overflow unsigned int. Just for reference
CONFIG_HZ_100 has an overflow window around 20s, CONFIG_HZ_250 ~8s and
CONFIG_HZ_1000 ~2s.

This results in a bug when people saw [h]top going mad reporting 100%
CPU usage even though there was basically no CPU load. The reason was
simply that /proc/stat stopped reporting idle/io_wait changes (and
reported MAX_JIFFY_OFFSET) and so the only change happening was for user
system time.

Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision
to 32b type and it is much more appropriate for cumulative time values
(unlike usecs_to_jiffies which intended for timeout calculations).

Signed-off-by: Michal Hocko
Tested-by: Artem S. Tashkinov
Cc: Dave Jones
Cc: Arnd Bergmann
Cc: Alexey Dobriyan
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-12-09 23:50:29 +0800

06 Dec, 2011

1 commit

3292beb34 sched/accounting: Change cpustat fields to an array ... Browse Code »
46

This patch changes fields in cpustat from a structure, to an
u64 array. Math gets easier, and the code is more flexible.

Signed-off-by: Glauber Costa
Reviewed-by: KAMEZAWA Hiroyuki
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Paul Tuner
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1322498719-2255-2-git-send-email-glommer@parallels.com
Signed-off-by: Ingo Molnar

Glauber Costa
2011-12-06 16:06:38 +0800

08 Sep, 2011

1 commit

a25cac519 proc: Consider NO_HZ when printing idle and iowait times ... Browse Code »
46

show_stat handler of the /proc/stat file relies on kstat_cpu(cpu)
statistics when priting information about idle and iowait times.
This is OK if we are not using tickless kernel (CONFIG_NO_HZ) because
counters are updated periodically.
With NO_HZ things got more tricky because we are not doing idle/iowait
accounting while we are tickless so the value might get outdated.
Users of /proc/stat will notice that by unchanged idle/iowait values
which is then interpreted as 0% idle/iowait time. From the user space
POV this is an unexpected behavior and a change of the interface.

Let's fix this by using get_cpu_{idle,iowait}_time_us which accounts the
total idle/iowait time since boot and it doesn't rely on sampling or any
other periodic activity. Fall back to the previous behavior if NO_HZ is
disabled or not configured.

Signed-off-by: Michal Hocko
Cc: Dave Jones
Cc: Arnd Bergmann
Cc: Alexey Dobriyan
Link: http://lkml.kernel.org/r/39181366adac1b39cb6aa3cd53ff0f7c78d32676.1314172057.git.mhocko@suse.cz
Signed-off-by: Thomas Gleixner

Michal Hocko
2011-09-08 17:10:55 +0800

27 May, 2011

1 commit

a4dbf0ec2 proc/stat: use defined macro KMALLOC_MAX_SIZE ... Browse Code »

There is a macro for the max size kmalloc can allocate, so use it instead
of a hardcoded number.

Signed-off-by: Yuanhan Liu
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yuanhan Liu
2011-05-27 08:12:37 +0800

14 Jan, 2011

1 commit

9d6de12f7 proc: use seq_puts()/seq_putc() where possible ... Browse Code »

For string without format specifiers, use seq_puts().
For seq_printf("\n"), use seq_putc('\n').

text data bss dec hex filename
61866 488 112 62466 f402 fs/proc/proc.o
61729 488 112 62329 f379 fs/proc/proc.o
----------------------------------------------------
-139

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2011-01-14 00:03:16 +0800

28 Oct, 2010

2 commits

478735e38 /proc/stat: fix scalability of irq sum of all cpu ... Browse Code »

In /proc/stat, the number of per-IRQ event is shown by making a sum each
irq's events on all cpus. But we can make use of kstat_irqs().

kstat_irqs() do the same calculation, If !CONFIG_GENERIC_HARDIRQ,
it's not a big cost. (Both of the number of cpus and irqs are small.)

If a system is very big and CONFIG_GENERIC_HARDIRQ, it does

for_each_irq()
for_each_cpu()
- look up a radix tree
- read desc->irq_stat[cpu]
This seems not efficient. This patch adds kstat_irqs() for
CONFIG_GENRIC_HARDIRQ and change the calculation as

for_each_irq()
look up radix tree
for_each_cpu()
- read desc->irq_stat[cpu]

This reduces cost.

A test on (4096cpusp, 256 nodes, 4592 irqs) host (by Jack Steiner)

%time cat /proc/stat > /dev/null

Before Patch: 2.459 sec
After Patch : .561 sec

[akpm@linux-foundation.org: unexport kstat_irqs, coding-style tweaks]
[akpm@linux-foundation.org: fix unused variable 'per_irq_sum']
Signed-off-by: KAMEZAWA Hiroyuki
Tested-by: Jack Steiner
Acked-by: Jack Steiner
Cc: Yinghai Lu
Cc: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-10-28 09:03:13 +0800
f2c66cd8e /proc/stat: scalability of irq num per cpu ... Browse Code »

/proc/stat shows the total number of all interrupts to each cpu. But when
the number of IRQs are very large, it take very long time and 'cat
/proc/stat' takes more than 10 secs. This is because sum of all irq
events are counted when /proc/stat is read. This patch adds "sum of all
irq" counter percpu and reduce read costs.

The cost of reading /proc/stat is important because it's used by major
applications as 'top', 'ps', 'w', etc....

A test on a mechin (4096cpu, 256 nodes, 4592 irqs) shows

%time cat /proc/stat > /dev/null
Before Patch: 12.627 sec
After Patch: 2.459 sec

Signed-off-by: KAMEZAWA Hiroyuki
Tested-by: Jack Steiner
Acked-by: Jack Steiner
Cc: Yinghai Lu
Cc: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-10-28 09:03:13 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

26 Oct, 2009

1 commit

ce0e7b28f sched, cpuacct: Fix niced guest time accounting ... Browse Code »

CPU time of a guest is always accounted in 'user' time
without concern for the nice value of its counterpart
process although the guest is scheduled under the nice
value.

This patch fixes the defect and accounts cpu time of
a niced guest in 'nice' time as same as a niced process.

And also the patch adds 'guest_nice' to cpuacct. The
value provides niced guest cpu time which is like 'nice'
to 'user'.

The original discussions can be found here:

http://www.mail-archive.com/kvm@vger.kernel.org/msg23982.html
http://www.mail-archive.com/kvm@vger.kernel.org/msg23860.html

Signed-off-by: Ryota Ozaki
Acked-by: Avi Kivity
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Ryota Ozaki
2009-10-26 00:31:30 +0800

19 Jun, 2009

1 commit

d3d64df21 proc: export statistics for softirq to /proc ... Browse Code »

Export statistics for softirq in /proc/softirqs and /proc/stat.

1. /proc/softirqs
Implement /proc/softirqs which shows the number of softirq
for each CPU like /proc/interrupts.

2. /proc/stat
Add the "softirq" line to /proc/stat.
This line shows the number of softirq for all cpu.
The first column is the total of all softirqs and
each subsequent column is the total for particular softirq.

[kosaki.motohiro@jp.fujitsu.com: remove redundant for_each_possible_cpu() loop]
Signed-off-by: Keika Kobayashi
Reviewed-by: Hiroshi Shimamoto
Cc: KOSAKI Motohiro
Cc: Ingo Molnar
Cc: Eric Dumazet
Cc: Alexey Dobriyan
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Keika Kobayashi
2009-06-19 04:03:41 +0800

23 Apr, 2009

1 commit

e1c805309 [S390] /proc/stat idle field for idle cpus ... Browse Code »

The cpu idle field in the output of /proc/stat is too small for cpus
that have been idle for more than a tick. Add the architecture hook
arch_idle_time that allows to add the not accounted idle time of a
sleeping cpu without waking the cpu.

The s390 implementation of arch_idle_time uses the already existing
s390_idle_data per_cpu variable to find the sleep time of a neighboring
idle cpu.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2009-04-23 19:58:17 +0800

26 Dec, 2008

1 commit

26ddd8d5c proc: remove ifdef CONFIG_SPARSE_IRQ from stat.c ... Browse Code »

Impact: cleanup

irq_desc can be NULL when CONFIG_SPARSE_IRQ=y only.
therefore, NULL checking can move into kstat_irqs_cpu() of SPARSE_IRQ version.

Signed-off-by: KOSAKI Motohiro
Acked-by: "Yinghai Lu"
Signed-off-by: Ingo Molnar

KOSAKI Motohiro
2008-12-26 16:48:18 +0800

16 Dec, 2008

1 commit

13bd41bc2 proc: enclose desc variable of show_stat() in CONFIG_SPARSE_IRQ ... Browse Code »

Impact: restructure code to fix compiler warning

commit 240d367b4e6c6e3c5075e034db14dba60a6f5fa7 moved desc usage point
into #ifdef CONFIG_SPARSE_IRQ.

Eliminate the desc variable, otherwise following warning happens:

fs/proc/stat.c: In function 'show_stat':
fs/proc/stat.c:31: warning: unused variable 'desc'

[ akpm: cleaned up the patch to remove #ifdef ]

Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar

KOSAKI Motohiro
2008-12-16 18:24:14 +0800

09 Dec, 2008

1 commit

240d367b4 sparseirq: fix Alpha build failure ... Browse Code »

Impact: build fix on Alpha

-tip testing found this build failure on the Alpha defconfig:

/home/mingo/tip/fs/proc/stat.c: In function 'show_stat':
/home/mingo/tip/fs/proc/stat.c:48: error: implicit declaration of function 'for_each_irq_desc'
/home/mingo/tip/fs/proc/stat.c:48: error: expected ';' before '{' token

can not use irq_desc() in stat.c on older architectures.

Signed-off-by: Yinghai Lu
Signed-off-by: Ingo Molnar

Yinghai Lu
2008-12-09 11:16:54 +0800

08 Dec, 2008

1 commit

0b8f1efad sparse irq_desc[] array: core kernel and x86 changes ... Browse Code »

Impact: new feature

Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
NR_CPUS set to large values. The goal is to be able to scale up to much
larger NR_IRQS value without impacting the (important) common case.

To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
irq_desc pointers.

When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
this also makes the IRQ descriptors NUMA-local (to the site that calls
request_irq()).

This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
uses desc->chip_data for x86 to store irq_cfg.

Signed-off-by: Yinghai Lu
Signed-off-by: Ingo Molnar

Yinghai Lu
2008-12-08 21:31:51 +0800

23 Oct, 2008

1 commit

df8106dbb proc: move /proc/stat to fs/proc/stat.c ... Browse Code »

Signed-off-by: Alexey Dobriyan

Alexey Dobriyan
2008-10-23 19:14:05 +0800