Eric Lee / smarc-fsl-linux-kernel

02 Dec, 2018

1 commit

4b7831767 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull STIBP fallout fixes from Thomas Gleixner:
"The performance destruction department finally got it's act together
and came up with a cure for the STIPB regression:

- Provide a command line option to control the spectre v2 user space
mitigations. Default is either seccomp or prctl (if seccomp is
disabled in Kconfig). prctl allows mitigation opt-in, seccomp
enables the migitation for sandboxed processes.

- Rework the code to handle the conditional STIBP/IBPB control and
remove the now unused ptrace_may_access_sched() optimization
attempt

- Disable STIBP automatically when SMT is disabled

- Optimize the switch_to() logic to avoid MSR writes and invocations
of __switch_to_xtra().

- Make the asynchronous speculation TIF updates synchronous to
prevent stale mitigation state.

As a general cleanup this also makes retpoline directly depend on
compiler support and removes the 'minimal retpoline' option which just
pretended to provide some form of security while providing none"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
x86/speculation: Provide IBPB always command line options
x86/speculation: Add seccomp Spectre v2 user space protection mode
x86/speculation: Enable prctl mode for spectre_v2_user
x86/speculation: Add prctl() control for indirect branch speculation
x86/speculation: Prepare arch_smt_update() for PRCTL mode
x86/speculation: Prevent stale SPEC_CTRL msr content
x86/speculation: Split out TIF update
ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS
x86/speculation: Prepare for conditional IBPB in switch_mm()
x86/speculation: Avoid __switch_to_xtra() calls
x86/process: Consolidate and simplify switch_to_xtra() code
x86/speculation: Prepare for per task indirect branch speculation control
x86/speculation: Add command line control for indirect branch speculation
x86/speculation: Unify conditional spectre v2 print functions
x86/speculataion: Mark command line parser data __initdata
x86/speculation: Mark string arrays const correctly
x86/speculation: Reorder the spec_v2 code
x86/l1tf: Show actual SMT state
x86/speculation: Rework SMT state change
sched/smt: Expose sched_smt_present static key
...

Linus Torvalds
2018-12-02 04:35:48 +0800

01 Dec, 2018

8 commits

d8f190ee8 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc fixes from Andrew Morton:
"31 fixes"

* emailed patches from Andrew Morton : (31 commits)
ocfs2: fix potential use after free
mm/khugepaged: fix the xas_create_range() error path
mm/khugepaged: collapse_shmem() do not crash on Compound
mm/khugepaged: collapse_shmem() without freezing new_page
mm/khugepaged: minor reorderings in collapse_shmem()
mm/khugepaged: collapse_shmem() remember to clear holes
mm/khugepaged: fix crashes due to misaccounted holes
mm/khugepaged: collapse_shmem() stop if punched or truncated
mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
mm/huge_memory: splitting set mapping+index before unfreeze
mm/huge_memory: rename freeze_page() to unmap_page()
initramfs: clean old path before creating a hardlink
kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
psi: make disabling/enabling easier for vendor kernels
proc: fixup map_files test on arm
debugobjects: avoid recursive calls with kmemleak
userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
userfaultfd: shmem: add i_size checks
userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem
...

Linus Torvalds
2018-12-01 10:45:49 +0800
1f817429b Merge tag 'gcc-plugins-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux ... Browse Code »

Pull stackleak plugin fix from Kees Cook:
"Fix crash by not allowing kprobing of stackleak_erase() (Alexander
Popov)"

* tag 'gcc-plugins-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
stackleak: Disable function tracing and kprobes for stackleak_erase()

Linus Torvalds
2018-12-01 10:36:30 +0800
903e8ff86 kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace ... Browse Code »

Since __sanitizer_cov_trace_pc() is marked as notrace, function calls in
__sanitizer_cov_trace_pc() shouldn't be traced either.
ftrace_graph_caller() gets called for each function that isn't marked
'notrace', like canonicalize_ip(). This is the call trace from a run:

[ 139.644550] ftrace_graph_caller+0x1c/0x24
[ 139.648352] canonicalize_ip+0x18/0x28
[ 139.652313] __sanitizer_cov_trace_pc+0x14/0x58
[ 139.656184] sched_clock+0x34/0x1e8
[ 139.659759] trace_clock_local+0x40/0x88
[ 139.663722] ftrace_push_return_trace+0x8c/0x1f0
[ 139.667767] prepare_ftrace_return+0xa8/0x100
[ 139.671709] ftrace_graph_caller+0x1c/0x24

Rework so that check_kcov_mode() and canonicalize_ip() that are called
from __sanitizer_cov_trace_pc() are also marked as notrace.

Link: http://lkml.kernel.org/r/20181128081239.18317-1-anders.roxell@linaro.org
Signed-off-by: Arnd Bergmann
Signen-off-by: Anders Roxell
Co-developed-by: Arnd Bergmann
Acked-by: Steven Rostedt (VMware)
Cc: Dmitry Vyukov
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anders Roxell
2018-12-01 06:56:14 +0800
e0c274472 psi: make disabling/enabling easier for vendor kernels ... Browse Code »

Mel Gorman reports a hackbench regression with psi that would prohibit
shipping the suse kernel with it default-enabled, but he'd still like
users to be able to opt in at little to no cost to others.

With the current combination of CONFIG_PSI and the psi_disabled bool set
from the commandline, this is a challenge. Do the following things to
make it easier:

1. Add a config option CONFIG_PSI_DEFAULT_DISABLED that allows distros
to enable CONFIG_PSI in their kernel but leave the feature disabled
unless a user requests it at boot-time.

To avoid double negatives, rename psi_disabled= to psi=.

2. Make psi_disabled a static branch to eliminate any branch costs
when the feature is disabled.

In terms of numbers before and after this patch, Mel says:

: The following is a comparision using CONFIG_PSI=n as a baseline against
: your patch and a vanilla kernel
:
: 4.20.0-rc4 4.20.0-rc4 4.20.0-rc4
: kconfigdisable-v1r1 vanilla psidisable-v1r1
: Amean 1 1.3100 ( 0.00%) 1.3923 ( -6.28%) 1.3427 ( -2.49%)
: Amean 3 3.8860 ( 0.00%) 4.1230 * -6.10%* 3.8860 ( -0.00%)
: Amean 5 6.8847 ( 0.00%) 8.0390 * -16.77%* 6.7727 ( 1.63%)
: Amean 7 9.9310 ( 0.00%) 10.8367 * -9.12%* 9.9910 ( -0.60%)
: Amean 12 16.6577 ( 0.00%) 18.2363 * -9.48%* 17.1083 ( -2.71%)
: Amean 18 26.5133 ( 0.00%) 27.8833 * -5.17%* 25.7663 ( 2.82%)
: Amean 24 34.3003 ( 0.00%) 34.6830 ( -1.12%) 32.0450 ( 6.58%)
: Amean 30 40.0063 ( 0.00%) 40.5800 ( -1.43%) 41.5087 ( -3.76%)
: Amean 32 40.1407 ( 0.00%) 41.2273 ( -2.71%) 39.9417 ( 0.50%)
:
: It's showing that the vanilla kernel takes a hit (as the bisection
: indicated it would) and that disabling PSI by default is reasonably
: close in terms of performance for this particular workload on this
: particular machine so;

Link: http://lkml.kernel.org/r/20181127165329.GA29728@cmpxchg.org
Signed-off-by: Johannes Weiner
Tested-by: Mel Gorman
Reported-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2018-12-01 06:56:14 +0800
a1b3cf6d9 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Misc fixes:

- counter freezing related regression fix

- uprobes race fix

- Intel PMU unusual event combination fix

- .. and diverse tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
uprobes: Fix handle_swbp() vs. unregister() + register() race once more
perf/x86/intel: Disallow precise_ip on BTS events
perf/x86/intel: Add generic branch tracing check to intel_pmu_has_bts()
perf/x86/intel: Move branch tracing setup to the Intel-specific source file
perf/x86/intel: Fix regression by default disabling perfmon v4 interrupt handling
perf tools beauty ioctl: Support new ISO7816 commands
tools uapi asm-generic: Synchronize ioctls.h
tools arch x86: Update tools's copy of cpufeatures.h
tools headers uapi: Synchronize i915_drm.h
perf tools: Restore proper cwd on return from mnt namespace
tools build feature: Check if get_current_dir_name() is available
perf tools: Fix crash on synthesizing the unit

Linus Torvalds
2018-12-01 03:31:48 +0800
49afe6614 Merge tag 'trace-v4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull more tracing fixes from Steven Rostedt:
"Two more fixes:

- Change idx variable in DO_TRACE macro to __idx to avoid name
conflicts. A kvm event had "idx" as a parameter and it confused the
macro.

- Fix a race where interrupts would be traced when set_graph_function
was set. The previous patch set increased a race window that
tricked the function graph tracer to think it should trace
interrupts when it really should not have.

The bug has been there before, but was seldom hit. Only the last
patch series made it more common"

* tag 'trace-v4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/fgraph: Fix set_graph_function from showing interrupts
tracepoint: Use __idx instead of idx in DO_TRACE macro to make it unique

Linus Torvalds
2018-12-01 02:40:11 +0800
0f1f69237 Merge tag 'trace-v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing fixes from Steven Rostedt:
"While rewriting the function graph tracer, I discovered a design flaw
that was introduced by a patch that tried to fix one bug, but by doing
so created another bug.

As both bugs corrupt the output (but they do not crash the kernel), I
decided to fix the design such that it could have both bugs fixed. The
original fix, fixed time reporting of the function graph tracer when
doing a max_depth of one. This was code that can test how much the
kernel interferes with userspace. But in doing so, it could corrupt
the time keeping of the function profiler.

The issue is that the curr_ret_stack variable was being used for two
different meanings. One was to keep track of the stack pointer on the
ret_stack (shadow stack used by the function graph tracer), and the
other use case was the graph call depth. Although, the two may be
closely related, where they got updated was the issue that lead to the
two different bugs that required the two use cases to be updated
differently.

The big issue with this fix is that it requires changing each
architecture. The good news is, I was able to remove a lot of code
that was duplicated within the architectures and place it into a
single location. Then I could make the fix in one place.

I pushed this code into linux-next to let it settle over a week, and
before doing so, I cross compiled all the affected architectures to
make sure that they built fine.

In the mean time, I also pulled in a patch that fixes the sched_switch
previous tasks state output, that was not actually correct"

* tag 'trace-v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
sched, trace: Fix prev_state output in sched_switch tracepoint
function_graph: Have profiler use curr_ret_stack and not depth
function_graph: Reverse the order of pushing the ret_stack and the callback
function_graph: Move return callback before update of curr_ret_stack
function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack
function_graph: Make ftrace_push_return_trace() static
sparc/function_graph: Simplify with function_graph_enter()
sh/function_graph: Simplify with function_graph_enter()
s390/function_graph: Simplify with function_graph_enter()
riscv/function_graph: Simplify with function_graph_enter()
powerpc/function_graph: Simplify with function_graph_enter()
parisc: function_graph: Simplify with function_graph_enter()
nds32: function_graph: Simplify with function_graph_enter()
MIPS: function_graph: Simplify with function_graph_enter()
microblaze: function_graph: Simplify with function_graph_enter()
arm64: function_graph: Simplify with function_graph_enter()
ARM: function_graph: Simplify with function_graph_enter()
x86/function_graph: Simplify with function_graph_enter()
function_graph: Create function_graph_enter() to consolidate architecture code

Linus Torvalds
2018-12-01 01:32:34 +0800
ef1a84093 stackleak: Disable function tracing and kprobes for stackleak_erase() ... Browse Code »

The stackleak_erase() function is called on the trampoline stack at the
end of syscall. This stack is not big enough for ftrace and kprobes
operations, e.g. it can be exhausted if we use kprobe_events for
stackleak_erase().

So let's disable function tracing and kprobes of stackleak_erase().

Reported-by: kernel test robot
Fixes: 10e9ae9fabaf ("gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack")
Signed-off-by: Alexander Popov
Reviewed-by: Steven Rostedt (VMware)
Reviewed-by: Masami Hiramatsu
Signed-off-by: Kees Cook

Alexander Popov
2018-12-01 01:05:07 +0800

30 Nov, 2018

1 commit

5cf99a0f3 tracing/fgraph: Fix set_graph_function from showing interrupts ... Browse Code »

The tracefs file set_graph_function is used to only function graph functions
that are listed in that file (or all functions if the file is empty). The
way this is implemented is that the function graph tracer looks at every
function, and if the current depth is zero and the function matches
something in the file then it will trace that function. When other functions
are called, the depth will be greater than zero (because the original
function will be at depth zero), and all functions will be traced where the
depth is greater than zero.

The issue is that when a function is first entered, and the handler that
checks this logic is called, the depth is set to zero. If an interrupt comes
in and a function in the interrupt handler is traced, its depth will be
greater than zero and it will automatically be traced, even if the original
function was not. But because the logic only looks at depth it may trace
interrupts when it should not be.

The recent design change of the function graph tracer to fix other bugs
caused the depth to be zero while the function graph callback handler is
being called for a longer time, widening the race of this happening. This
bug was actually there for a longer time, but because the race window was so
small it seldom happened. The Fixes tag below is for the commit that widen
the race window, because that commit belongs to a series that will also help
fix the original bug.

Cc: stable@kernel.org
Fixes: 39eb456dacb5 ("function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack")
Reported-by: Joe Lawrence
Tested-by: Joe Lawrence
Signed-off-by: Steven Rostedt (VMware)

Steven Rostedt (VMware)
2018-11-30 11:09:00 +0800

29 Nov, 2018

1 commit

60b548237 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) ARM64 JIT fixes for subprog handling from Daniel Borkmann.

2) Various sparc64 JIT bug fixes (fused branch convergance, frame
pointer usage detection logic, PSEODU call argument handling).

3) Fix to use BH locking in nf_conncount, from Taehee Yoo.

4) Fix race of TX skb freeing in ipheth driver, from Bernd Eckstein.

5) Handle return value of TX NAPI completion properly in lan743x
driver, from Bryan Whitehead.

6) MAC filter deletion in i40e driver clears wrong state bit, from
Lihong Yang.

7) Fix use after free in rionet driver, from Pan Bian.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
s390/qeth: fix length check in SNMP processing
net: hisilicon: remove unexpected free_netdev
rapidio/rionet: do not free skb before reading its length
i40e: fix kerneldoc for xsk methods
ixgbe: recognize 1000BaseLX SFP modules as 1Gbps
i40e: Fix deletion of MAC filters
igb: fix uninitialized variables
netfilter: nf_tables: deactivate expressions in rule replecement routine
lan743x: Enable driver to work with LAN7431
tipc: fix lockdep warning during node delete
lan743x: fix return value for lan743x_tx_napi_poll
net: via: via-velocity: fix spelling mistake "alignement" -> "alignment"
qed: fix spelling mistake "attnetion" -> "attention"
net: thunderx: fix NULL pointer dereference in nic_remove
sctp: increase sk_wmem_alloc when head->truesize is increased
firestream: fix spelling mistake: "Inititing" -> "Initializing"
net: phy: add workaround for issue where PHY driver doesn't bind to the device
usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2
sparc: Adjust bpf JIT prologue for PSEUDO calls.
bpf, doc: add entries of who looks over which jits
...

Linus Torvalds
2018-11-29 04:53:48 +0800

28 Nov, 2018

9 commits

46f7ecb1e ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS ... Browse Code »

The IBPB control code in x86 removed the usage. Remove the functionality
which was introduced for this.

Signed-off-by: Thomas Gleixner
Reviewed-by: Ingo Molnar
Cc: Peter Zijlstra
Cc: Andy Lutomirski
Cc: Linus Torvalds
Cc: Jiri Kosina
Cc: Tom Lendacky
Cc: Josh Poimboeuf
Cc: Andrea Arcangeli
Cc: David Woodhouse
Cc: Tim Chen
Cc: Andi Kleen
Cc: Dave Hansen
Cc: Casey Schaufler
Cc: Asit Mallick
Cc: Arjan van de Ven
Cc: Jon Masters
Cc: Waiman Long
Cc: Greg KH
Cc: Dave Stewart
Cc: Kees Cook
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185005.559149393@linutronix.de

Thomas Gleixner
2018-11-28 18:57:11 +0800
a74cfffb0 x86/speculation: Rework SMT state change ... Browse Code »

arch_smt_update() is only called when the sysfs SMT control knob is
changed. This means that when SMT is enabled in the sysfs control knob the
system is considered to have SMT active even if all siblings are offline.

To allow finegrained control of the speculation mitigations, the actual SMT
state is more interesting than the fact that siblings could be enabled.

Rework the code, so arch_smt_update() is invoked from each individual CPU
hotplug function, and simplify the update function while at it.

Signed-off-by: Thomas Gleixner
Reviewed-by: Ingo Molnar
Cc: Peter Zijlstra
Cc: Andy Lutomirski
Cc: Linus Torvalds
Cc: Jiri Kosina
Cc: Tom Lendacky
Cc: Josh Poimboeuf
Cc: Andrea Arcangeli
Cc: David Woodhouse
Cc: Tim Chen
Cc: Andi Kleen
Cc: Dave Hansen
Cc: Casey Schaufler
Cc: Asit Mallick
Cc: Arjan van de Ven
Cc: Jon Masters
Cc: Waiman Long
Cc: Greg KH
Cc: Dave Stewart
Cc: Kees Cook
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de

Thomas Gleixner
2018-11-28 18:57:07 +0800
321a874a7 sched/smt: Expose sched_smt_present static key ... Browse Code »

Make the scheduler's 'sched_smt_present' static key globaly available, so
it can be used in the x86 speculation control code.

Provide a query function and a stub for the CONFIG_SMP=n case.

Signed-off-by: Thomas Gleixner
Reviewed-by: Ingo Molnar
Cc: Peter Zijlstra
Cc: Andy Lutomirski
Cc: Linus Torvalds
Cc: Jiri Kosina
Cc: Tom Lendacky
Cc: Josh Poimboeuf
Cc: Andrea Arcangeli
Cc: David Woodhouse
Cc: Tim Chen
Cc: Andi Kleen
Cc: Dave Hansen
Cc: Casey Schaufler
Cc: Asit Mallick
Cc: Arjan van de Ven
Cc: Jon Masters
Cc: Waiman Long
Cc: Greg KH
Cc: Dave Stewart
Cc: Kees Cook
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.430168326@linutronix.de

Thomas Gleixner
2018-11-28 18:57:07 +0800
c5511d03e sched/smt: Make sched_smt_present track topology ... Browse Code »

Currently the 'sched_smt_present' static key is enabled when at CPU bringup
SMT topology is observed, but it is never disabled. However there is demand
to also disable the key when the topology changes such that there is no SMT
present anymore.

Implement this by making the key count the number of cores that have SMT
enabled.

In particular, the SMT topology bits are set before interrrupts are enabled
and similarly, are cleared after interrupts are disabled for the last time
and the CPU dies.

Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Thomas Gleixner
Reviewed-by: Ingo Molnar
Cc: Andy Lutomirski
Cc: Linus Torvalds
Cc: Jiri Kosina
Cc: Tom Lendacky
Cc: Josh Poimboeuf
Cc: Andrea Arcangeli
Cc: David Woodhouse
Cc: Tim Chen
Cc: Andi Kleen
Cc: Dave Hansen
Cc: Casey Schaufler
Cc: Asit Mallick
Cc: Arjan van de Ven
Cc: Jon Masters
Cc: Waiman Long
Cc: Greg KH
Cc: Dave Stewart
Cc: Kees Cook
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.246110444@linutronix.de

Peter Zijlstra (Intel)
2018-11-28 18:57:06 +0800
b1b35f2e2 function_graph: Have profiler use curr_ret_stack and not depth ... Browse Code »

The profiler uses trace->depth to find its entry on the ret_stack, but the
depth may not match the actual location of where its entry is (if an
interrupt were to preempt the processing of the profiler for another
function, the depth and the curr_ret_stack will be different).

Have it use the curr_ret_stack as the index to find its ret_stack entry
instead of using the depth variable, as that is no longer guaranteed to be
the same.

Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)

Steven Rostedt (VMware)
2018-11-28 09:31:55 +0800
7c6ea35ef function_graph: Reverse the order of pushing the ret_stack and the callback ... Browse Code »

The function graph profiler uses the ret_stack to store the "subtime" and
reuse it by nested functions and also on the return. But the current logic
has the profiler callback called before the ret_stack is updated, and it is
just modifying the ret_stack that will later be allocated (it's just lucky
that the "subtime" is not touched when it is allocated).

This could also cause a crash if we are at the end of the ret_stack when
this happens.

By reversing the order of the allocating the ret_stack and then calling the
callbacks attached to a function being traced, the ret_stack entry is no
longer used before it is allocated.

Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)

Steven Rostedt (VMware)
2018-11-28 09:31:54 +0800
552701dd0 function_graph: Move return callback before update of curr_ret_stack ... Browse Code »

In the past, curr_ret_stack had two functions. One was to denote the depth
of the call graph, the other is to keep track of where on the ret_stack the
data is used. Although they may be slightly related, there are two cases
where they need to be used differently.

The one case is that it keeps the ret_stack data from being corrupted by an
interrupt coming in and overwriting the data still in use. The other is just
to know where the depth of the stack currently is.

The function profiler uses the ret_stack to save a "subtime" variable that
is part of the data on the ret_stack. If curr_ret_stack is modified too
early, then this variable can be corrupted.

The "max_depth" option, when set to 1, will record the first functions going
into the kernel. To see all top functions (when dealing with timings), the
depth variable needs to be lowered before calling the return hook. But by
lowering the curr_ret_stack, it makes the data on the ret_stack still being
used by the return hook susceptible to being overwritten.

Now that there's two variables to handle both cases (curr_ret_depth), we can
move them to the locations where they can handle both cases.

Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)

Steven Rostedt (VMware)
2018-11-28 09:31:54 +0800
39eb456da function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack ... Browse Code »

Currently, the depth of the ret_stack is determined by curr_ret_stack index.
The issue is that there's a race between setting of the curr_ret_stack and
calling of the callback attached to the return of the function.

Commit 03274a3ffb44 ("tracing/fgraph: Adjust fgraph depth before calling
trace return callback") moved the calling of the callback to after the
setting of the curr_ret_stack, even stating that it was safe to do so, when
in fact, it was the reason there was a barrier() there (yes, I should have
commented that barrier()).

Not only does the curr_ret_stack keep track of the current call graph depth,
it also keeps the ret_stack content from being overwritten by new data.

The function profiler, uses the "subtime" variable of ret_stack structure
and by moving the curr_ret_stack, it allows for interrupts to use the same
structure it was using, corrupting the data, and breaking the profiler.

To fix this, there needs to be two variables to handle the call stack depth
and the pointer to where the ret_stack is being used, as they need to change
at two different locations.

Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)

Steven Rostedt (VMware)
2018-11-28 09:31:54 +0800
d125f3f86 function_graph: Make ftrace_push_return_trace() static ... Browse Code »

As all architectures now call function_graph_enter() to do the entry work,
no architecture should ever call ftrace_push_return_trace(). Make it static.

This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.

Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)

Steven Rostedt (VMware)
2018-11-28 09:31:54 +0800

27 Nov, 2018

2 commits

e2c95a616 bpf, ppc64: generalize fetching subprog into bpf_jit_get_func_addr ... Browse Code »

Make fetching of the BPF call address from ppc64 JIT generic. ppc64
was using a slightly different variant rather than through the insns'
imm field encoding as the target address would not fit into that space.
Therefore, the target subprog number was encoded into the insns' offset
and fetched through fp->aux->func[off]->bpf_func instead. Given there
are other JITs with this issue and the mechanism of fetching the address
is JIT-generic, move it into the core as a helper instead. On the JIT
side, we get information on whether the retrieved address is a fixed
one, that is, not changing through JIT passes, or a dynamic one. For
the former, JITs can optimize their imm emission because this doesn't
change jump offsets throughout JIT process.

Signed-off-by: Daniel Borkmann
Reviewed-by: Sandipan Das
Tested-by: Sandipan Das
Signed-off-by: Alexei Starovoitov

Daniel Borkmann
2018-11-27 09:34:24 +0800
8114865ff function_graph: Create function_graph_enter() to consolidate architecture code ... Browse Code »

Currently all the architectures do basically the same thing in preparing the
function graph tracer on entry to a function. This code can be pulled into a
generic location and then this will allow the function graph tracer to be
fixed, as well as extended.

Create a new function graph helper function_graph_enter() that will call the
hook function (ftrace_graph_entry) and the shadow stack operation
(ftrace_push_return_trace), and remove the need of the architecture code to
manage the shadow stack.

This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.

Cc: stable@kernel.org
Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt (VMware)

Steven Rostedt (VMware)
2018-11-27 05:18:04 +0800

26 Nov, 2018

1 commit

695001274 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf ... Browse Code »

Daniel Borkmann says:

====================
pull-request: bpf 2018-11-25

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix an off-by-one bug when adjusting subprog start offsets after
patching, from Edward.

2) Fix several bugs such as overflow in size allocation in queue /
stack map creation, from Alexei.

3) Fix wrong IPv6 destination port byte order in bpf_sk_lookup_udp
helper, from Andrey.

4) Fix several bugs in bpftool such as preventing an infinite loop
in get_fdinfo, error handling and man page references, from Quentin.

5) Fix a warning in bpf_trace_printk() that wasn't catching an
invalid format string, from Martynas.

6) Fix a bug in BPF cgroup local storage where non-atomic allocation
was used in atomic context, from Roman.

7) Fix a NULL pointer dereference bug in bpftool from reallocarray()
error handling, from Jakub and Wen.

8) Add a copy of pkt_cls.h and tc_bpf.h uapi headers to the tools
include infrastructure so that bpftool compiles on older RHEL7-like
user space which does not ship these headers, from Yonghong.

9) Fix BPF kselftests for user space where to get ping test working
with ping6 and ping -6, from Li.
====================

Signed-off-by: David S. Miller

David S. Miller
2018-11-26 12:04:58 +0800

24 Nov, 2018

1 commit

1efb6ee3e bpf: fix check of allowed specifiers in bpf_trace_printk ... Browse Code »

A format string consisting of "%p" or "%s" followed by an invalid
specifier (e.g. "%p%\n" or "%s%") could pass the check which
would make format_decode (lib/vsprintf.c) to warn.

Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call bpf_trace_printk()")
Reported-by: syzbot+1ec5c5ec949c4adaa0c4@syzkaller.appspotmail.com
Signed-off-by: Martynas Pumputis
Signed-off-by: Daniel Borkmann

Martynas Pumputis
2018-11-24 04:54:14 +0800

23 Nov, 2018

2 commits

09d3f015d uprobes: Fix handle_swbp() vs. unregister() + register() race once more ... Browse Code »

Commit:

142b18ddc8143 ("uprobes: Fix handle_swbp() vs unregister() + register() race")

added the UPROBE_COPY_INSN flag, and corresponding smp_wmb() and smp_rmb()
memory barriers, to ensure that handle_swbp() uses fully-initialized
uprobes only.

However, the smp_rmb() is mis-placed: this barrier should be placed
after handle_swbp() has tested for the flag, thus guaranteeing that
(program-order) subsequent loads from the uprobe can see the initial
stores performed by prepare_uprobe().

Move the smp_rmb() accordingly. Also amend the comments associated
to the two memory barriers to indicate their actual locations.

Signed-off-by: Andrea Parri
Acked-by: Oleg Nesterov
Cc: Alexander Shishkin
Cc: Andrew Morton
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Namhyung Kim
Cc: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: stable@kernel.org
Fixes: 142b18ddc8143 ("uprobes: Fix handle_swbp() vs unregister() + register() race")
Link: http://lkml.kernel.org/r/20181122161031.15179-1-andrea.parri@amarulasolutions.com
Signed-off-by: Ingo Molnar

Andrea Parri
2018-11-23 15:31:19 +0800
813961de3 bpf: fix integer overflow in queue_stack_map ... Browse Code »

Fix the following issues:

- allow queue_stack_map for root only
- fix u32 max_entries overflow
- disallow value_size == 0

Fixes: f1a2e44a3aec ("bpf: add queue and stack maps")
Reported-by: Wei Wu
Signed-off-by: Alexei Starovoitov
Cc: Mauricio Vasquez B
Signed-off-by: Daniel Borkmann

Alexei Starovoitov
2018-11-23 04:29:40 +0800

22 Nov, 2018

1 commit

cb216b84d swiotlb: Skip cache maintenance on map error ... Browse Code »

If swiotlb_bounce_page() failed, calling arch_sync_dma_for_device() may
lead to such delights as performing cache maintenance on whatever
address phys_to_virt(SWIOTLB_MAP_ERROR) looks like, which is typically
outside the kernel memory map and goes about as well as expected.

Don't do that.

Fixes: a4a4330db46a ("swiotlb: add support for non-coherent DMA")
Tested-by: John Stultz
Signed-off-by: Robin Murphy
Signed-off-by: Christoph Hellwig

Robin Murphy
2018-11-22 01:47:58 +0800

19 Nov, 2018

3 commits

c67a98c00 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge misc fixes from Andrew Morton:
"16 fixes"

* emailed patches from Andrew Morton :
mm/memblock.c: fix a typo in __next_mem_pfn_range() comments
mm, page_alloc: check for max order in hot path
scripts/spdxcheck.py: make python3 compliant
tmpfs: make lseek(SEEK_DATA/SEK_HOLE) return ENXIO with a negative offset
lib/ubsan.c: don't mark __ubsan_handle_builtin_unreachable as noreturn
mm/vmstat.c: fix NUMA statistics updates
mm/gup.c: fix follow_page_mask() kerneldoc comment
ocfs2: free up write context when direct IO failed
scripts/faddr2line: fix location of start_kernel in comment
mm: don't reclaim inodes with many attached pages
mm, memory_hotplug: check zone_movable in has_unmovable_pages
mm/swapfile.c: use kvzalloc for swap_info_struct allocation
MAINTAINERS: update OMAP MMC entry
hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444!
kernel/sched/psi.c: simplify cgroup_move_task()
z3fold: fix possible reclaim races

Linus Torvalds
2018-11-19 03:31:26 +0800
03582f338 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fix from Ingo Molnar:
"Fix an exec() related scalability/performance regression, which was
caused by incorrectly calculating load and migrating tasks on exec()
when they shouldn't be"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix cpu_util_wake() for 'execl' type workloads

Linus Torvalds
2018-11-19 02:58:20 +0800
8fcb2312d kernel/sched/psi.c: simplify cgroup_move_task() ... Browse Code »

The existing code triggered an invalid warning about 'rq' possibly being
used uninitialized. Instead of doing the silly warning suppression by
initializa it to NULL, refactor the code to bail out early instead.

Warning was:

kernel/sched/psi.c: In function `cgroup_move_task':
kernel/sched/psi.c:639:13: warning: `rq' may be used uninitialized in this function [-Wmaybe-uninitialized]

Link: http://lkml.kernel.org/r/20181103183339.8669-1-olof@lixom.net
Fixes: 2ce7135adc9ad ("psi: cgroup support")
Signed-off-by: Olof Johansson
Reviewed-by: Andrew Morton
Acked-by: Johannes Weiner
Cc: Ingo Molnar
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Olof Johansson
2018-11-19 02:15:09 +0800

17 Nov, 2018

2 commits

569a933b0 bpf: allocate local storage buffers using GFP_ATOMIC ... Browse Code »

Naresh reported an issue with the non-atomic memory allocation of
cgroup local storage buffers:

[ 73.047526] BUG: sleeping function called from invalid context at
/srv/oe/build/tmp-rpb-glibc/work-shared/intel-corei7-64/kernel-source/mm/slab.h:421
[ 73.060915] in_atomic(): 1, irqs_disabled(): 0, pid: 3157, name: test_cgroup_sto
[ 73.068342] INFO: lockdep is turned off.
[ 73.072293] CPU: 2 PID: 3157 Comm: test_cgroup_sto Not tainted
4.20.0-rc2-next-20181113 #1
[ 73.080548] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[ 73.088018] Call Trace:
[ 73.090463] dump_stack+0x70/0xa5
[ 73.093783] ___might_sleep+0x152/0x240
[ 73.097619] __might_sleep+0x4a/0x80
[ 73.101191] __kmalloc_node+0x1cf/0x2f0
[ 73.105031] ? cgroup_storage_update_elem+0x46/0x90
[ 73.109909] cgroup_storage_update_elem+0x46/0x90

cgroup_storage_update_elem() (as well as other update map update
callbacks) is called with disabled preemption, so GFP_ATOMIC
allocation should be used: e.g. alloc_htab_elem() in hashtab.c.

Reported-by: Naresh Kamboju
Tested-by: Naresh Kamboju
Signed-off-by: Roman Gushchin
Cc: Alexei Starovoitov
Cc: Daniel Borkmann
Signed-off-by: Alexei Starovoitov

Roman Gushchin
2018-11-17 13:17:49 +0800
afd594240 bpf: fix off-by-one error in adjust_subprog_starts ... Browse Code »

When patching in a new sequence for the first insn of a subprog, the start
of that subprog does not change (it's the first insn of the sequence), so
adjust_subprog_starts should check start < off).
Also added a test to test_verifier.c (it's essentially the syz reproducer).

Fixes: cc8b0b92a169 ("bpf: introduce function calls (function boundaries)")
Reported-by: syzbot+4fc427c7af994b0948be@syzkaller.appspotmail.com
Signed-off-by: Edward Cree
Acked-by: Yonghong Song
Signed-off-by: Alexei Starovoitov

Edward Cree
2018-11-17 13:10:00 +0800

14 Nov, 2018

6 commits

646558ff1 kdb: kdb_support: mark expected switch fall-throughs ... Browse Code »

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in this particular case, I replaced the code comments with
a proper "fall through" annotation, which is what GCC is expecting
to find.

Signed-off-by: Gustavo A. R. Silva
Reviewed-by: Daniel Thompson
Signed-off-by: Daniel Thompson

Gustavo A. R. Silva
2018-11-14 04:38:50 +0800
01cb37351 kdb: kdb_keyboard: mark expected switch fall-throughs ... Browse Code »

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in this particular case, I replaced the code comments with
a proper "fall through" annotation, which is what GCC is expecting
to find.

Signed-off-by: Gustavo A. R. Silva
Reviewed-by: Daniel Thompson
Signed-off-by: Daniel Thompson

Gustavo A. R. Silva
2018-11-14 04:38:50 +0800
9eb62f0e1 kdb: kdb_main: refactor code in kdb_md_line ... Browse Code »

Replace the whole switch statement with a for loop. This makes the
code clearer and easy to read.

This also addresses the following Coverity warnings:

Addresses-Coverity-ID: 115090 ("Missing break in switch")
Addresses-Coverity-ID: 115091 ("Missing break in switch")
Addresses-Coverity-ID: 114700 ("Missing break in switch")

Suggested-by: Daniel Thompson
Signed-off-by: Gustavo A. R. Silva
Reviewed-by: Daniel Thompson
[daniel.thompson@linaro.org: Tiny grammar change in description]
Signed-off-by: Daniel Thompson

Gustavo A. R. Silva
2018-11-14 04:37:53 +0800
c2b94c72d kdb: Use strscpy with destination buffer size ... Browse Code »

gcc 8.1.0 warns with:

kernel/debug/kdb/kdb_support.c: In function ‘kallsyms_symbol_next’:
kernel/debug/kdb/kdb_support.c:239:4: warning: ‘strncpy’ specified bound depends on the length of the source argument [-Wstringop-overflow=]
strncpy(prefix_name, name, strlen(name)+1);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/debug/kdb/kdb_support.c:239:31: note: length computed here

Use strscpy() with the destination buffer size, and use ellipses when
displaying truncated symbols.

v2: Use strscpy()

Signed-off-by: Prarit Bhargava
Cc: Jonathan Toppins
Cc: Jason Wessel
Cc: Daniel Thompson
Cc: kgdb-bugreport@lists.sourceforge.net
Reviewed-by: Daniel Thompson
Signed-off-by: Daniel Thompson

Prarit Bhargava
2018-11-14 04:27:53 +0800
568fb6f42 kdb: print real address of pointers instead of hashed addresses ... Browse Code »

Since commit ad67b74d2469 ("printk: hash addresses printed with %p"),
all pointers printed with %p are printed with hashed addresses
instead of real addresses in order to avoid leaking addresses in
dmesg and syslog. But this applies to kdb too, with is unfortunate:

Entering kdb (current=0x(ptrval), pid 329) due to Keyboard Entry
kdb> ps
15 sleeping system daemon (state M) processes suppressed,
use 'ps A' to see all.
Task Addr Pid Parent [*] cpu State Thread Command
0x(ptrval) 329 328 1 0 R 0x(ptrval) *sh

0x(ptrval) 1 0 0 0 S 0x(ptrval) init
0x(ptrval) 3 2 0 0 D 0x(ptrval) rcu_gp
0x(ptrval) 4 2 0 0 D 0x(ptrval) rcu_par_gp
0x(ptrval) 5 2 0 0 D 0x(ptrval) kworker/0:0
0x(ptrval) 6 2 0 0 D 0x(ptrval) kworker/0:0H
0x(ptrval) 7 2 0 0 D 0x(ptrval) kworker/u2:0
0x(ptrval) 8 2 0 0 D 0x(ptrval) mm_percpu_wq
0x(ptrval) 10 2 0 0 D 0x(ptrval) rcu_preempt

The whole purpose of kdb is to debug, and for debugging real addresses
need to be known. In addition, data displayed by kdb doesn't go into
dmesg.

This patch replaces all %p by %px in kdb in order to display real
addresses.

Fixes: ad67b74d2469 ("printk: hash addresses printed with %p")
Cc:
Signed-off-by: Christophe Leroy
Signed-off-by: Daniel Thompson

Christophe Leroy
2018-11-14 04:27:37 +0800
dded2e159 kdb: use correct pointer when 'btc' calls 'btt' ... Browse Code »

On a powerpc 8xx, 'btc' fails as follows:

Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0x0

when booting the kernel with 'debug_boot_weak_hash', it fails as well

Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0xba99ad80

On other platforms, Oopses have been observed too, see
https://github.com/linuxppc/linux/issues/139

This is due to btc calling 'btt' with %p pointer as an argument.

This patch replaces %p by %px to get the real pointer value as
expected by 'btt'

Fixes: ad67b74d2469 ("printk: hash addresses printed with %p")
Cc:
Signed-off-by: Christophe Leroy
Reviewed-by: Daniel Thompson
Signed-off-by: Daniel Thompson

Christophe Leroy
2018-11-14 04:27:16 +0800

12 Nov, 2018

2 commits

c469933e7 sched/fair: Fix cpu_util_wake() for 'execl' type workloads ... Browse Code »

A ~10% regression has been reported for UnixBench's execl throughput
test by Aaron Lu and Ye Xiaolong:

https://lkml.org/lkml/2018/10/30/765

That test is pretty simple, it does a "recursive" execve() syscall on the
same binary. Starting from the syscall, this sequence is possible:

do_execve()
do_execveat_common()
__do_execve_file()
sched_exec()
select_task_rq_fair()
Reported-by: Ye Xiaolong
Tested-by: Aaron Lu
Signed-off-by: Patrick Bellasi
Signed-off-by: Peter Zijlstra (Intel)
Cc: Dietmar Eggemann
Cc: Juri Lelli
Cc: Linus Torvalds
Cc: Morten Rasmussen
Cc: Peter Zijlstra
Cc: Quentin Perret
Cc: Steve Muckle
Cc: Suren Baghdasaryan
Cc: Thomas Gleixner
Cc: Todd Kjos
Cc: Vincent Guittot
Fixes: f9be3e5961c5 (sched/fair: Use util_est in LB and WU paths)
Link: https://lore.kernel.org/lkml/20181025093100.GB13236@e110439-lin/
Signed-off-by: Ingo Molnar

Patrick Bellasi
2018-11-12 12:00:46 +0800
08b527865 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer fix from Thomas Gleixner:
"Just the removal of a redundant call into the sched deadline overrun
check"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Remove useless call to check_dl_overrun()

Linus Torvalds
2018-11-12 06:37:41 +0800