Eric Lee / smarc-fsl-linux-kernel

09 May, 2017

40 commits

92019efc6 powerpc/fadump: update documentation about crashkernel parameter reuse ... Browse Code »

As we are reusing crashkernel parameter instead of fadump_reserve_mem
parameter to specify the memory to reserve for fadump's crash kernel,
update the documentation accordingly.

Link: http://lkml.kernel.org/r/149035347559.6881.14224829694291758581.stgit@hbathini.in.ibm.com
Signed-off-by: Hari Bathini
Acked-by: Michael Ellerman
Cc: Fenghua Yu
Cc: Tony Luck
Cc: Dave Young
Cc: Eric Biederman
Cc: Mahesh Salgaonkar
Cc: Vivek Goyal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hari Bathini
2017-05-09 08:15:11 +0800
11550dc0a powerpc/fadump: reuse crashkernel parameter for fadump memory reservation ... Browse Code »

fadump supports specifying memory to reserve for fadump's crash kernel
with fadump_reserve_mem kernel parameter. This parameter currently
supports passing a fixed memory size, like fadump_reserve_mem=
only. This patch aims to add support for other syntaxes like
range-based memory size
:[,:,:,...] which allows
using the same parameter to boot the kernel with different system RAM
sizes.

As crashkernel parameter already supports the above mentioned syntaxes,
this patch deprecates fadump_reserve_mem parameter and reuses
crashkernel parameter instead, to specify memory for fadump's crash
kernel memory reservation as well. If any offset is provided in
crashkernel parameter, it will be ignored in case of fadump, as fadump
reserves memory at end of RAM.

Advantages using crashkernel parameter instead of fadump_reserve_mem
parameter are one less kernel parameter overall, code reuse and support
for multiple syntaxes to specify memory.

Suggested-by: Dave Young
Link: http://lkml.kernel.org/r/149035346749.6881.911095631212975718.stgit@hbathini.in.ibm.com
Signed-off-by: Hari Bathini
Reviewed-by: Mahesh Salgaonkar
Acked-by: Michael Ellerman
Cc: Fenghua Yu
Cc: Tony Luck
Cc: Dave Young
Cc: Eric Biederman
Cc: Vivek Goyal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hari Bathini
2017-05-09 08:15:11 +0800
22bd0177b powerpc/fadump: remove dependency with CONFIG_KEXEC ... Browse Code »

Now that crashkernel parameter parsing and vmcoreinfo related code is
moved under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE, remove
dependency with CONFIG_KEXEC for CONFIG_FA_DUMP. While here, get rid of
definitions of fadump_append_elf_note() & fadump_final_note() functions
to reuse similar functions compiled under CONFIG_CRASH_CORE.

Link: http://lkml.kernel.org/r/149035343956.6881.1536459326017709354.stgit@hbathini.in.ibm.com
Signed-off-by: Hari Bathini
Reviewed-by: Mahesh Salgaonkar
Acked-by: Michael Ellerman
Cc: Fenghua Yu
Cc: Tony Luck
Cc: Dave Young
Cc: Eric Biederman
Cc: Vivek Goyal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hari Bathini
2017-05-09 08:15:11 +0800
51dbd9252 ia64: reuse append_elf_note() and final_note() functions ... Browse Code »

Get rid of multiple definitions of append_elf_note() & final_note()
functions. Reuse these functions compiled under CONFIG_CRASH_CORE Also,
define Elf_Word and use it instead of generic u32 or the more specific
Elf64_Word.

Link: http://lkml.kernel.org/r/149035342324.6881.11667840929850361402.stgit@hbathini.in.ibm.com
Signed-off-by: Hari Bathini
Acked-by: Dave Young
Acked-by: Tony Luck
Cc: Fenghua Yu
Cc: Eric Biederman
Cc: Mahesh Salgaonkar
Cc: Vivek Goyal
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hari Bathini
2017-05-09 08:15:11 +0800
692f66f26 crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_CORE ... Browse Code »

Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and
reuse crashkernel parameter for fadump", v4.

Traditionally, kdump is used to save vmcore in case of a crash. Some
architectures like powerpc can save vmcore using architecture specific
support instead of kexec/kdump mechanism. Such architecture specific
support also needs to reserve memory, to be used by dump capture kernel.
crashkernel parameter can be a reused, for memory reservation, by such
architecture specific infrastructure.

This patchset removes dependency with CONFIG_KEXEC for crashkernel
parameter and vmcoreinfo related code as it can be reused without kexec
support. Also, crashkernel parameter is reused instead of
fadump_reserve_mem to reserve memory for fadump.

The first patch moves crashkernel parameter parsing and vmcoreinfo
related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE. The
second patch reuses the definitions of append_elf_note() & final_note()
functions under CONFIG_CRASH_CORE in IA64 arch code. The third patch
removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump)
in powerpc. The next patch reuses crashkernel parameter for reserving
memory for fadump, instead of the fadump_reserve_mem parameter. This
has the advantage of using all syntaxes crashkernel parameter supports,
for fadump as well. The last patch updates fadump kernel documentation
about use of crashkernel parameter.

This patch (of 5):

Traditionally, kdump is used to save vmcore in case of a crash. Some
architectures like powerpc can save vmcore using architecture specific
support instead of kexec/kdump mechanism. Such architecture specific
support also needs to reserve memory, to be used by dump capture kernel.
crashkernel parameter can be a reused, for memory reservation, by such
architecture specific infrastructure.

But currently, code related to vmcoreinfo and parsing of crashkernel
parameter is built under CONFIG_KEXEC_CORE. This patch introduces
CONFIG_CRASH_CORE and moves the above mentioned code under this config,
allowing code reuse without dependency on CONFIG_KEXEC. There is no
functional change with this patch.

Link: http://lkml.kernel.org/r/149035338104.6881.4550894432615189948.stgit@hbathini.in.ibm.com
Signed-off-by: Hari Bathini
Acked-by: Dave Young
Cc: Fenghua Yu
Cc: Tony Luck
Cc: Eric Biederman
Cc: Mahesh Salgaonkar
Cc: Vivek Goyal
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hari Bathini
2017-05-09 08:15:11 +0800
c311c7979 cpumask: make "nr_cpumask_bits" unsigned ... Browse Code »

Bit searching functions accept "unsigned long" indices but
"nr_cpumask_bits" is "int" which is signed, so inevitable sign
extensions occur on x86_64. Those MOVSX are #1 MOVSX bloat by number of
uses across whole kernel.

Change "nr_cpumask_bits" to unsigned, this number can't be negative
after all. It allows to do implicit zero-extension on x86_64 without
MOVSX.

Change signed comparisons into unsigned comparisons where necessary.

Other uses looks fine because it is either argument passed to a function
or comparison is already unsigned.

Net win on allyesconfig type of kernel: ~2.8 KB (!)

add/remove: 0/0 grow/shrink: 8/725 up/down: 93/-2926 (-2833)
function old new delta
xen_exit_mmap 691 735 +44
qstat_read 426 440 +14
__cpufreq_cooling_register 1678 1687 +9
trace_rb_cpu_prepare 447 455 +8
vermagic 54 60 +6
nfp_driver_version 54 60 +6
rcu_torture_stats_print 1147 1151 +4
find_next_push_cpu 267 269 +2
xen_irq_resume 961 960 -1
...
init_vp_index 946 906 -40
od_set_powersave_bias 328 281 -47
power_cpu_exit 193 139 -54
arch_show_interrupts 3538 3484 -54
select_idle_sibling 1558 1471 -87
Total: Before=158358910, After=158356077, chg -0.00%

Same arguments apply to "nr_cpu_ids" but I haven't yet found enough
courage to delve into this issue (and proper fix may require new type
"cpu_t" which is whole separate story).

Link: http://lkml.kernel.org/r/20170309205322.GA1728@avx2
Signed-off-by: Alexey Dobriyan
Cc: Rusty Russell
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2017-05-09 08:15:11 +0800
19659c59a fork: free vmapped stacks in cache when cpus are offline ... Browse Code »

Using virtually mapped stack, kernel stacks are allocated via vmalloc.

In the current implementation, two stacks per cpu can be cached when
tasks are freed and the cached stacks are used again in task
duplications. But the cached stacks may remain unfreed even when cpu
are offline. By adding a cpu hotplug callback to free the cached stacks
when a cpu goes offline, the pages of the cached stacks are not wasted.

Link: http://lkml.kernel.org/r/1487076043-17802-1-git-send-email-hoeun.ryu@gmail.com
Signed-off-by: Hoeun Ryu
Reviewed-by: Thomas Gleixner
Acked-by: Michal Hocko
Cc: Ingo Molnar
Cc: Andy Lutomirski
Cc: Kees Cook
Cc: "Eric W. Biederman"
Cc: Oleg Nesterov
Cc: Mateusz Guzik
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hoeun Ryu
2017-05-09 08:15:11 +0800
7fe6a42e8 reiserfs: use designated initializers ... Browse Code »

Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified
during allyesconfig builds of x86, arm, and arm64, with most initializer
fixes extracted from grsecurity.

Link: http://lkml.kernel.org/r/20170329210419.GA40066@beast
Signed-off-by: Kees Cook
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2017-05-09 08:15:11 +0800
f6950a735 checkpatch: improve the SUSPECT_CODE_INDENT test ... Browse Code »

The current SUSPECT_CODE_INDENT test does not recognize several
defective code style defects where code following a logical test is
inappropriately indented.

Before this patch, for code like:

if (foo)
bar();

checkpatch would not emit a warning.

Improve the test to warn when code after a logical test has the same
indentation as the logical test.

Perform the same indentation test for "else" blocks too.

Link: http://lkml.kernel.org/r/df2374b68c4a68af2b7ef08afe486584811f610a.1493683942.git.joe@perches.com
Signed-off-by: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2017-05-09 08:15:11 +0800
74fd4f347 checkpatch: improve the embedded function name test for patch contexts ... Browse Code »

The current test works only for a single patch context as it is done in
the foreach ($rawlines) loop that precedes the loop where the actual
$context_function variable is used.

Move the set of $context_function into the foreach (@lines) loop where
it is useful for each patch context.

Link: http://lkml.kernel.org/r/6c675a31c74fbfad4fc45b9f462303d60ca2a283.1493486091.git.joe@perches.com
Signed-off-by: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2017-05-09 08:15:11 +0800
75ad8c575 checkpatch: add --typedefsfile ... Browse Code »

When using checkpatch on out-of-tree code, it may occur that some
project-specific types are used, which will cause spurious warnings.
Add the --typedefsfile option as a way to extend the known types and
deal with this issue.

This was developed for OP-TEE [1]. We run a Travis job on all pull
requests [2], and checkpatch is part of that. The typical false warning
we get on a regular basis is with some pointers to functions returning
TEE_Result [3], which is a typedef from the GlobalPlatform APIs. We
consider it is acceptable to use GP types in the OP-TEE core
implementation, that's why this patch would be helpful for us.

[1] https://github.com/OP-TEE/optee_os
[2] https://travis-ci.org/OP-TEE/optee_os/builds
[3] https://travis-ci.org/OP-TEE/optee_os/builds/193355335#L1733

Link: http://lkml.kernel.org/r/ba1124d6dfa599bb0dd1d8919dd45dd09ce541a4.1492702192.git.jerome.forissier@linaro.org
Signed-off-by: Jerome Forissier
Cc: Joe Perches
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jerome Forissier
2017-05-09 08:15:11 +0800
1b4a2ed4c checkpatch: improve k.alloc with multiplication and sizeof test ... Browse Code »

Find multi-line uses of k.alloc by using the $stat variable and not the
$line variable. This can still --fix only the single line variant
though.

Link: http://lkml.kernel.org/r/3f4b23d37cd4c7d8628eefc25afe83ba8fb3ab55.1493167076.git.joe@perches.com
Signed-off-by: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2017-05-09 08:15:11 +0800
e882dbfc2 checkpatch: special audit for revert commit line ... Browse Code »

Currently checkpatch.pl does not recognize git's default commit revert
message and will complain about the hash format. Add special audit for
revert commit message line to fix it.

Link: http://lkml.kernel.org/r/20170411191532.74381-1-wvw@google.com
Signed-off-by: Wei Wang
Acked-by: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wei Wang
2017-05-09 08:15:11 +0800
e4b7d3091 checkpatch: clarify the EMBEDDED_FUNCTION_NAME message ... Browse Code »

Try to make the conversion of embedded function names to "%s: ", __func__
a bit clearer.

Add a bit more information to the comment describing the test too.

Link: http://lkml.kernel.org/r/38f5d32f0aec1cd98cb9ceeedd6a736cc9a802db.1491759835.git.joe@perches.com
Signed-off-by: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2017-05-09 08:15:11 +0800
e795556a5 checkpatch: improve MULTISTATEMENT_MACRO_USE_DO_WHILE test ... Browse Code »

The logic currrently misses macros that start with an if statement.

e.g.: #define foo(bar) if (bar) baz;

Add a test for macro content that starts with if

Link: http://lkml.kernel.org/r/a9d41aafe1673889caf1a9850208fb7fd74107a0.1491783914.git.joe@perches.com
Signed-off-by: Joe Perches
Reported-by: Andreas Mohr
Original-patch-by: Alfonso Lima
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2017-05-09 08:15:11 +0800
d9190e4e1 checkpatch: avoid suggesting struct definitions should be const ... Browse Code »

Many structs are generally used const and there is a known list of these
structs.

struct definitions should not be generally be declared const.

Add a test for the lack of an open brace immediately after the struct to
avoid definitions.

This avoids the false positive "struct foo should normally be const"
message only when the open brace is on the same line as the definition.

Link: http://lkml.kernel.org/r/0dce709150d712e66f1b90b03827634b53b28085.1491845946.git.joe@perches.com
Signed-off-by: Joe Perches
Cc: Arthur Brainville
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2017-05-09 08:15:11 +0800
eb3a58de3 checkpatch: allow space leading blank lines in email headers ... Browse Code »

Allow a leading space and otherwise blank link in the email headers as
it can be a line wrapped Spamassassin multiple line string or any other
valid rfc 2822/5322 email header.

The line with space causes checkpatch to erroneously think that it's in
the content body, as opposed to headers and thus flag a mail header as
an unwrapped long comment line.

Link: http://lkml.kernel.org/r/d75a9f0b78b3488078429f4037d9fff3bdfa3b78.1490247180.git.joe@perches.com
Signed-off-by: Joe Perches Reported-by: Darren Hart (VMware)
Tested-by: Darren Hart (VMware)
Reviewed-by: Darren Hart (VMware)
Original-patch-by: John 'Warthog9' Hawley (VMware)
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2017-05-09 08:15:11 +0800
4dbed76f2 checkpatch: improve EMBEDDED_FUNCTION_NAME test ... Browse Code »

The existing behavior relies on patch context to identify function
declarations. Add the ability to find function declarations when there
is an open brace in column 1.

This finds function declarations only in specific single line forms
where the function name is on a single line like:

int foo(args...)
{

and

int
foo(args...)
{

It does not recognize function declarations like:

int foo(int bar,
int baz)
{

Link: http://lkml.kernel.org/r/738d74bbbe1a06b80f11ed504818107c68903095.1488155636.git.joe@perches.com
Signed-off-by: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2017-05-09 08:15:11 +0800
0b523769e checkpatch: add ability to find bad uses of vsprintf %p<foo> extensions ... Browse Code »

%pK was at least once misused at %pk in an out-of-tree module. This
lead to some security concerns. Add the ability to track single and
multiple line statements for misuses of %p.

[akpm@linux-foundation.org: add helpful comment into lib/vsprintf.c]
[akpm@linux-foundation.org: text tweak]
Link: http://lkml.kernel.org/r/163a690510e636a23187c0dc9caa09ddac6d4cde.1488228427.git.joe@perches.com
Signed-off-by: Joe Perches
Acked-by: Kees Cook
Acked-by: William Roberts
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2017-05-09 08:15:10 +0800
cd8618ab3 checkpatch: remove obsolete CONFIG_EXPERIMENTAL checks ... Browse Code »

Config EXPERIMENTAL has been removed from kernel in 2013 (see commit
3d374d09f16f: "final removal of CONFIG_EXPERIMENTAL"), there is no any
reason to do these checks now.

Link: http://lkml.kernel.org/r/1488234097-20119-1-git-send-email-ruslan.bilovol@gmail.com
Signed-off-by: Ruslan Bilovol
Acked-by: Kees Cook
Acked-by: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ruslan Bilovol
2017-05-09 08:15:10 +0800
0cd5246bf firmware/Makefile: force recompilation if makefile changes ... Browse Code »

If you modify the target asm we currently do not force the recompilation
of the firmware files. The target asm is in the firmware/Makefile, peg
this file as a dependency to require re-compilation of firmware targets
when the asm changes.

Link: http://lkml.kernel.org/r/20170123150727.4883-1-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez
Cc: Masahiro Yamada
Cc: Michal Marek
Cc: Ming Lei
Cc: Greg Kroah-Hartman
Cc: Tom Gundersen
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luis R. Rodriguez
2017-05-09 08:15:10 +0800
e327fd7c8 lib: add module support to linked list sorting tests ... Browse Code »

Extract the linked list sorting test code into its own source file, to
allow to compile it either to a loadable module, or builtin into the
kernel.

Link: http://lkml.kernel.org/r/1488287219-15832-4-git-send-email-geert@linux-m68k.org
Signed-off-by: Geert Uytterhoeven
Reviewed-by: Andy Shevchenko
Cc: Arnd Bergmann
Cc: Paul Gortmaker
Cc: Shuah Khan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Geert Uytterhoeven
2017-05-09 08:15:10 +0800
5c4e67989 lib: add module support to array-based sort tests ... Browse Code »

Allow to compile the array-based sort test code either to a loadable
module, or builtin into the kernel.

Link: http://lkml.kernel.org/r/1488287219-15832-3-git-send-email-geert@linux-m68k.org
Signed-off-by: Geert Uytterhoeven
Reviewed-by: Andy Shevchenko
Cc: Arnd Bergmann
Cc: Paul Gortmaker
Cc: Shuah Khan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Geert Uytterhoeven
2017-05-09 08:15:10 +0800
ebd03a9aa Revert "lib/test_sort.c: make it explicitly non-modular" ... Browse Code »

Patch series "lib: add module support to sort tests".

This patch series allows to compile the array-based and linked list sort
test code either to loadable modules, or builtin into the kernel.

It's very valuable to have modular tests, so you can run them just by
insmodding the test modules, instead of needing a separate kernel that
runs them at boot.

This patch (of 3):

This reverts commit 8893f519330bb073a49c5b4676fce4be6f1be15d.

It's very valuable to have modular tests, so you can run them just by
insmodding the test modules, instead of needing a separate kernel that
runs them at boot.

Link: http://lkml.kernel.org/r/1488287219-15832-2-git-send-email-geert@linux-m68k.org
Signed-off-by: Geert Uytterhoeven
Reviewed-by: Andy Shevchenko
Cc: Arnd Bergmann
Cc: Paul Gortmaker
Cc: Shuah Khan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Geert Uytterhoeven
2017-05-09 08:15:10 +0800
8128a31ea drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR() ... Browse Code »

c2port_device_register() never returns NULL, it uses error pointers.

Link: http://lkml.kernel.org/r/20170412083321.GC3250@mwanda
Fixes: 65131cd52b9e ("c2port: add c2port support for Eurotech Duramar 2150")
Signed-off-by: Dan Carpenter
Acked-by: Rodolfo Giometti
Cc: Greg Kroah-Hartman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Carpenter
2017-05-09 08:15:10 +0800
146180c05 drivers/misc/vmw_vmci/vmci_queue_pair.c: fix a couple integer overflow tests ... Browse Code »

The "DIV_ROUND_UP(size, PAGE_SIZE)" operation can overflow if "size" is
more than ULLONG_MAX - PAGE_SIZE.

Link: http://lkml.kernel.org/r/20170322111950.GA11279@mwanda
Signed-off-by: Dan Carpenter
Cc: Jorgen Hansen
Cc: Masahiro Yamada
Cc: Michal Hocko
Cc: Greg Kroah-Hartman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Carpenter
2017-05-09 08:15:10 +0800
780cbcf28 kernel/hung_task.c: defer showing held locks ... Browse Code »

When I was running my testcase which may block hundreds of threads on fs
locks, I got lockup due to output from debug_show_all_locks() added by
commit b2d4c2edb2e4 ("locking/hung_task: Show all locks").

For example, if 1000 threads were blocked in TASK_UNINTERRUPTIBLE state
and 500 out of 1000 threads hold some lock, debug_show_all_locks() from
for_each_process_thread() loop will report locks held by 500 threads for
1000 times. This is a too much noise.

In order to make sure rcu_lock_break() is called frequently, we should
avoid calling debug_show_all_locks() from for_each_process_thread() loop
because debug_show_all_locks() effectively calls for_each_process_thread()
loop. Let's defer calling debug_show_all_locks() till before panic() or
leaving for_each_process_thread() loop.

Link: http://lkml.kernel.org/r/1489296834-60436-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa
Reviewed-by: Vegard Nossum
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tetsuo Handa
2017-05-09 08:15:10 +0800
31b8cc807 make help: add tools help target ... Browse Code »

Add a top-level Makefile help target for Userspace tools.

Also make each help "heading" end with a colon ':'.

Link: http://lkml.kernel.org/r/55c986ff-3966-3e47-2984-7349da2cce51@infradead.org
Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2017-05-09 08:15:10 +0800
7c30f352c jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp ... Browse Code »

jiffies_64 is defined in kernel/time/timer.c with
____cacheline_aligned_in_smp, however this macro is not part of the
declaration of jiffies and jiffies_64 in jiffies.h.

As a result clang generates the following warning:

kernel/time/timer.c:57:26: error: section does not match previous declaration [-Werror,-Wsection]
__visible u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES;
^
include/linux/cache.h:39:36: note: expanded from macro '__cacheline_aligned_in_smp'
^
include/linux/cache.h:34:4: note: expanded from macro '__cacheline_aligned'
__section__(".data..cacheline_aligned")))
^
include/linux/jiffies.h:77:12: note: previous attribute is here
extern u64 __jiffy_data jiffies_64;
^
include/linux/jiffies.h:70:38: note: expanded from macro '__jiffy_data'

Link: http://lkml.kernel.org/r/20170403190200.70273-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke
Cc: "Jason A . Donenfeld"
Cc: Grant Grundler
Cc: Michael Davidson
Cc: Greg Hackmann
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthias Kaehlcke
2017-05-09 08:15:10 +0800
3d88936f3 drivers/virt/fsl_hypervisor.c: use get_user_pages_unlocked() ... Browse Code »

Moving from get_user_pages() to get_user_pages_unlocked() simplifies the
code and takes advantage of VM_FAULT_RETRY functionality when faulting
in pages.

Link: http://lkml.kernel.org/r/20161101194332.23961-1-lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes
Cc: Michal Hocko
Cc: Paolo Bonzini
Cc: Kumar Gala
Cc: Mihai Caraman
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lorenzo Stoakes
2017-05-09 08:15:10 +0800
63259457a proc/sysctl: fix the int overflow for jiffies conversion ... Browse Code »

do_proc_dointvec_jiffies_conv() uses LONG_MAX/HZ as the max value to
avoid overflow. But actually the *valp is int type, so it still causes
overflow.

For example,

echo 2147483647 > ./sys/net/ipv4/tcp_keepalive_time

Then,

cat ./sys/net/ipv4/tcp_keepalive_time

The output is "-1", it is not expected.

Now use INT_MAX/HZ as the max value instead LONG_MAX/HZ to fix it.

Link: http://lkml.kernel.org/r/1490109532-9228-1-git-send-email-fgao@ikuai8.com
Signed-off-by: Gao Feng
Cc: Arnaldo Carvalho de Melo
Cc: Ingo Molnar
Cc: Alexey Dobriyan
Cc: Eric Dumazet
Cc: Josh Poimboeuf
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gao Feng
2017-05-09 08:15:10 +0800
f245e1c17 fs/proc/inode.c: remove cast from memory allocation ... Browse Code »

Coccinelle emits this warning:

WARNING: casting value returned by memory allocation function to (struct proc_inode *) is useless.

Remove unnecessary cast.

Link: http://lkml.kernel.org/r/1487745720-16967-1-git-send-email-me@tobin.cc
Signed-off-by: Tobin C. Harding
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tobin C. Harding
2017-05-09 08:15:10 +0800
baf6a9a1d mm, compaction: finish whole pageblock to reduce fragmentation ... Browse Code »

The main goal of direct compaction is to form a high-order page for
allocation, but it should also help against long-term fragmentation when
possible.

Most lower-than-pageblock-order compactions are for non-movable
allocations, which means that if we compact in a movable pageblock and
terminate as soon as we create the high-order page, it's unlikely that
the fallback heuristics will claim the whole block. Instead there might
be a single unmovable page in a pageblock full of movable pages, and the
next unmovable allocation might pick another pageblock and increase
long-term fragmentation.

To help against such scenarios, this patch changes the termination
criteria for compaction so that the current pageblock is finished even
though the high-order page already exists. Note that it might be
possible that the high-order page formed elsewhere in the zone due to
parallel activity, but this patch doesn't try to detect that.

This is only done with sync compaction, because async compaction is
limited to pageblock of the same migratetype, where it cannot result in
a migratetype fallback. (Async compaction also eagerly skips
order-aligned blocks where isolation fails, which is against the goal of
migrating away as much of the pageblock as possible.)

As a result of this patch, long-term memory fragmentation should be
reduced.

In testing based on 4.9 kernel with stress-highalloc from mmtests
configured for order-4 GFP_KERNEL allocations, this patch has reduced
the number of unmovable allocations falling back to movable pageblocks
by 20%. The number

Link: http://lkml.kernel.org/r/20170307131545.28577-9-vbabka@suse.cz
Signed-off-by: Vlastimil Babka
Acked-by: Mel Gorman
Acked-by: Johannes Weiner
Cc: Joonsoo Kim
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vlastimil Babka
2017-05-09 08:15:10 +0800
282722b0d mm, compaction: restrict async compaction to pageblocks of same migratetype ... Browse Code »

The migrate scanner in async compaction is currently limited to
MIGRATE_MOVABLE pageblocks. This is a heuristic intended to reduce
latency, based on the assumption that non-MOVABLE pageblocks are
unlikely to contain movable pages.

However, with the exception of THP's, most high-order allocations are
not movable. Should the async compaction succeed, this increases the
chance that the non-MOVABLE allocations will fallback to a MOVABLE
pageblock, making the long-term fragmentation worse.

This patch attempts to help the situation by changing async direct
compaction so that the migrate scanner only scans the pageblocks of the
requested migratetype. If it's a non-MOVABLE type and there are such
pageblocks that do contain movable pages, chances are that the
allocation can succeed within one of such pageblocks, removing the need
for a fallback. If that fails, the subsequent sync attempt will ignore
this restriction.

In testing based on 4.9 kernel with stress-highalloc from mmtests
configured for order-4 GFP_KERNEL allocations, this patch has reduced
the number of unmovable allocations falling back to movable pageblocks
by 30%. The number of movable allocations falling back is reduced by
12%.

Link: http://lkml.kernel.org/r/20170307131545.28577-8-vbabka@suse.cz
Signed-off-by: Vlastimil Babka
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: Joonsoo Kim
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vlastimil Babka
2017-05-09 08:15:10 +0800
d39773a06 mm, compaction: add migratetype to compact_control ... Browse Code »

Preparation patch. We are going to need migratetype at lower layers
than compact_zone() and compact_finished().

Link: http://lkml.kernel.org/r/20170307131545.28577-7-vbabka@suse.cz
Signed-off-by: Vlastimil Babka
Acked-by: Mel Gorman
Acked-by: Johannes Weiner
Cc: Joonsoo Kim
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vlastimil Babka
2017-05-09 08:15:10 +0800
b682debd9 mm, compaction: change migrate_async_suitable() to suitable_migration_source() ... Browse Code »

Preparation for making the decisions more complex and depending on
compact_control flags. No functional change.

Link: http://lkml.kernel.org/r/20170307131545.28577-6-vbabka@suse.cz
Signed-off-by: Vlastimil Babka
Acked-by: Mel Gorman
Acked-by: Johannes Weiner
Cc: Joonsoo Kim
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vlastimil Babka
2017-05-09 08:15:10 +0800
02aa0cdd7 mm, page_alloc: count movable pages when stealing from pageblock ... Browse Code »

When stealing pages from pageblock of a different migratetype, we count
how many free pages were stolen, and change the pageblock's migratetype
if more than half of the pageblock was free. This might be too
conservative, as there might be other pages that are not free, but were
allocated with the same migratetype as our allocation requested.

While we cannot determine the migratetype of allocated pages precisely
(at least without the page_owner functionality enabled), we can count
pages that compaction would try to isolate for migration - those are
either on LRU or __PageMovable(). The rest can be assumed to be
MIGRATE_RECLAIMABLE or MIGRATE_UNMOVABLE, which we cannot easily
distinguish. This counting can be done as part of free page stealing
with little additional overhead.

The page stealing code is changed so that it considers free pages plus
pages of the "good" migratetype for the decision whether to change
pageblock's migratetype.

The result should be more accurate migratetype of pageblocks wrt the
actual pages in the pageblocks, when stealing from semi-occupied
pageblocks. This should help the efficiency of page grouping by
mobility.

In testing based on 4.9 kernel with stress-highalloc from mmtests
configured for order-4 GFP_KERNEL allocations, this patch has reduced
the number of unmovable allocations falling back to movable pageblocks
by 47%. The number of movable allocations falling back to other
pageblocks are increased by 55%, but these events don't cause permanent
fragmentation, so the tradeoff should be positive. Later patches also
offset the movable fallback increase to some extent.

[akpm@linux-foundation.org: merge fix]
Link: http://lkml.kernel.org/r/20170307131545.28577-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka
Acked-by: Mel Gorman
Cc: Johannes Weiner
Cc: Joonsoo Kim
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vlastimil Babka
2017-05-09 08:15:10 +0800
3bc48f96c mm, page_alloc: split smallest stolen page in fallback ... Browse Code »

The __rmqueue_fallback() function is called when there's no free page of
requested migratetype, and we need to steal from a different one.

There are various heuristics to make this event infrequent and reduce
permanent fragmentation. The main one is to try stealing from a
pageblock that has the most free pages, and possibly steal them all at
once and convert the whole pageblock. Precise searching for such
pageblock would be expensive, so instead the heuristics walks the free
lists from MAX_ORDER down to requested order and assumes that the block
with highest-order free page is likely to also have the most free pages
in total.

Chances are that together with the highest-order page, we steal also
pages of lower orders from the same block. But then we still split the
highest order page. This is wasteful and can contribute to
fragmentation instead of avoiding it.

This patch thus changes __rmqueue_fallback() to just steal the page(s)
and put them on the freelist of the requested migratetype, and only
report whether it was successful. Then we pick (and eventually split)
the smallest page with __rmqueue_smallest(). This all happens under
zone lock, so nobody can steal it from us in the process. This should
reduce fragmentation due to fallbacks. At worst we are only stealing a
single highest-order page and waste some cycles by moving it between
lists and then removing it, but fallback is not exactly hot path so that
should not be a concern. As a side benefit the patch removes some
duplicate code by reusing __rmqueue_smallest().

[vbabka@suse.cz: fix endless loop in the modified __rmqueue()]
Link: http://lkml.kernel.org/r/59d71b35-d556-4fc9-ee2e-1574259282fd@suse.cz
Link: http://lkml.kernel.org/r/20170307131545.28577-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka
Acked-by: Mel Gorman
Acked-by: Johannes Weiner
Cc: Joonsoo Kim
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vlastimil Babka
2017-05-09 08:15:09 +0800
228d7e339 mm, compaction: remove redundant watermark check in compact_finished() ... Browse Code »

When detecting whether compaction has succeeded in forming a high-order
page, __compact_finished() employs a watermark check, followed by an own
search for a suitable page in the freelists. This is not ideal for two
reasons:

- The watermark check also searches high-order freelists, but has a
less strict criteria wrt fallback. It's therefore redundant and waste
of cycles. This was different in the past when high-order watermark
check attempted to apply reserves to high-order pages.

- The watermark check might actually fail due to lack of order-0 pages.
Compaction can't help with that, so there's no point in continuing
because of that. It's possible that high-order page still exists and
it terminates.

This patch therefore removes the watermark check. This should save some
cycles and terminate compaction sooner in some cases.

Link: http://lkml.kernel.org/r/20170307131545.28577-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka
Acked-by: Mel Gorman
Acked-by: Johannes Weiner
Cc: Joonsoo Kim
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vlastimil Babka
2017-05-09 08:15:09 +0800
f25ba6dcc mm, compaction: reorder fields in struct compact_control ... Browse Code »

Patch series "try to reduce fragmenting fallbacks", v3.

Last year, Johannes Weiner has reported a regression in page mobility
grouping [1] and while the exact cause was not found, I've come up with
some ways to improve it by reducing the number of allocations falling
back to different migratetype and causing permanent fragmentation.

The series was tested with mmtests stress-highalloc modified to do
GFP_KERNEL order-4 allocations, on 4.9 with "mm, vmscan: fix zone
balance check in prepare_kswapd_sleep" (without that, kcompactd indeed
wasn't woken up) on UMA machine with 4GB memory. There were 5 repeats
of each run, as the extfrag stats are quite volatile (note the stats
below are sums, not averages, as it was less perl hacking for me).

Success rate are the same, already high due to the low allocation order
used, so I'm not including them.

Compaction stats:
(the patches are stacked, and I haven't measured the non-functional-changes
patches separately)

patch 1 patch 2 patch 3 patch 4 patch 7 patch 8
Compaction stalls 22449 24680 24846 19765 22059 17480
Compaction success 12971 14836 14608 10475 11632 8757
Compaction failures 9477 9843 10238 9290 10426 8722
Page migrate success 3109022 3370438 3312164 1695105 1608435 2111379
Page migrate failure 911588 1149065 1028264 1112675 1077251 1026367
Compaction pages isolated 7242983 8015530 7782467 4629063 4402787 5377665
Compaction migrate scanned 980838938 987367943 957690188 917647238 947155598 1018922197
Compaction free scanned 557926893 598946443 602236894 594024490 541169699 763651731
Compaction cost 10243 10578 10304 8286 8398 9440

Compaction stats are mostly within noise until patch 4, which decreases
the number of compactions, and migrations. Part of that could be due to
more pageblocks marked as unmovable, and async compaction skipping
those. This changes a bit with patch 7, but not so much. Patch 8
increases free scanner stats and migrations, which comes from the
changed termination criteria. Interestingly number of compactions
decreases - probably the fully compacted pageblock satisfies multiple
subsequent allocations, so it amortizes.

Next comes the extfrag tracepoint, where "fragmenting" means that an
allocation had to fallback to a pageblock of another migratetype which
wasn't fully free (which is almost all of the fallbacks). I have
locally added another tracepoint for "Page steal" into
steal_suitable_fallback() which triggers in situations where we are
allowed to do move_freepages_block(). If we decide to also do
set_pageblock_migratetype(), it's "Pages steal with pageblock" with
break down for which allocation migratetype we are stealing and from
which fallback migratetype. The last part "due to counting" comes from
patch 4 and counts the events where the counting of movable pages
allowed us to change pageblock's migratetype, while the number of free
pages alone wouldn't be enough to cross the threshold.

patch 1 patch 2 patch 3 patch 4 patch 7 patch 8
Page alloc extfrag event 10155066 8522968 10164959 15622080 13727068 13140319
Extfrag fragmenting 10149231 8517025 10159040 15616925 13721391 13134792
Extfrag fragmenting for unmovable 159504 168500 184177 97835 70625 56948
Extfrag fragmenting unmovable placed with movable 153613 163549 172693 91740 64099 50917
Extfrag fragmenting unmovable placed with reclaim. 5891 4951 11484 6095 6526 6031
Extfrag fragmenting for reclaimable 4738 4829 6345 4822 5640 5378
Extfrag fragmenting reclaimable placed with movable 1836 1902 1851 1579 1739 1760
Extfrag fragmenting reclaimable placed with unmov. 2902 2927 4494 3243 3901 3618
Extfrag fragmenting for movable 9984989 8343696 9968518 15514268 13645126 13072466
Pages steal 179954 192291 210880 123254 94545 81486
Pages steal with pageblock 22153 18943 20154 33562 29969 33444
Pages steal with pageblock for unmovable 14350 12858 13256 20660 19003 20852
Pages steal with pageblock for unmovable from mov. 12812 11402 11683 19072 17467 19298
Pages steal with pageblock for unmovable from recl. 1538 1456 1573 1588 1536 1554
Pages steal with pageblock for movable 7114 5489 5965 11787 10012 11493
Pages steal with pageblock for movable from unmov. 6885 5291 5541 11179 9525 10885
Pages steal with pageblock for movable from recl. 229 198 424 608 487 608
Pages steal with pageblock for reclaimable 689 596 933 1115 954 1099
Pages steal with pageblock for reclaimable from unmov. 273 219 537 658 547 667
Pages steal with pageblock for reclaimable from mov. 416 377 396 457 407 432
Pages steal with pageblock due to counting 11834 10075 7530
... for unmovable 8993 7381 4616
... for movable 2792 2653 2851
... for reclaimable 49 41 63

What we can see is that "Extfrag fragmenting for unmovable" and "...
placed with movable" drops with almost each patch, which is good as we
are polluting less movable pageblocks with unmovable pages.

The most significant change is patch 4 with movable page counting. On
the other hand it increases "Extfrag fragmenting for movable" by 50%.
"Pages steal" drops though, so these movable allocation fallbacks find
only small free pages and are not allowed to steal whole pageblocks
back. "Pages steal with pageblock" raises, because the patch increases
the chances of pageblock migratetype changes to happen. This affects
all migratetypes.

The summary is that patch 4 is not a clear win wrt these stats, but I
believe that the tradeoff it makes is a good one. There's less
pollution of movable pageblocks by unmovable allocations. There's less
stealing between pageblock, and those that remain have higher chance of
changing migratetype also the pageblock itself, so it should more
faithfully reflect the migratetype of the pages within the pageblock.
The increase of movable allocations falling back to unmovable pageblock
might look dramatic, but those allocations can be migrated by compaction
when needed, and other patches in the series (7-9) improve that aspect.

Patches 7 and 8 continue the trend of reduced unmovable fallbacks and
also reduce the impact on movable fallbacks from patch 4.

[1] https://www.spinics.net/lists/linux-mm/msg114237.html

This patch (of 8):

While currently there are (mostly by accident) no holes in struct
compact_control (on x86_64), but we are going to add more bool flags, so
place them all together to the end of the structure. While at it, just
order all fields from largest to smallest.

Link: http://lkml.kernel.org/r/20170307131545.28577-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka
Acked-by: Mel Gorman
Acked-by: Johannes Weiner
Cc: Joonsoo Kim
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vlastimil Babka
2017-05-09 08:15:09 +0800