23 Aug, 2016
4 commits
-
Currently, __note_gp_changes() checks to see if the CPU has slept through
multiple grace periods. If it has, it resynchronizes that CPU's view
of the grace-period state, which includes whether or not the current
grace period needs a quiescent state from this CPU. The fact of this
need (or lack thereof) needs to be in two places, rdp->cpu_no_qs.b.norm
and rdp->core_needs_qs. The former tells RCU's context-switch code to
go get a quiescent state and the latter says that it needs to be reported.
The current code unconditionally sets the former to true, but correctly
sets the latter.This does not result in failures, but it does unnecessarily increase
the amount of work done on average at context-switch time. This commit
therefore correctly sets both fields.Signed-off-by: Paul E. McKenney
-
The Kconfig currently controlling compilation of tree.c is:
init/Kconfig:config TREE_RCU
init/Kconfig: bool...and update.c and sync.c are "obj-y" meaning that none are ever
built as a module by anyone.Since MODULE_ALIAS is a no-op for non-modular code, we can remove
them from these files.We leave moduleparam.h behind since the files instantiate some boot
time configuration parameters with module_param() still.Cc: "Paul E. McKenney"
Cc: Josh Triplett
Cc: Steven Rostedt
Cc: Mathieu Desnoyers
Cc: Lai Jiangshan
Signed-off-by: Paul Gortmaker
Signed-off-by: Paul E. McKenney -
Both timers and hrtimers are maintained on the outgoing CPU until
CPU_DEAD time, at which point they are migrated to a surviving CPU. If a
mod_timer() executes between CPU_DYING and CPU_DEAD time, x86 systems
will splat in native_smp_send_reschedule() when attempting to wake up
the just-now-offlined CPU, as shown below from a NO_HZ_FULL kernel:[ 7976.741556] WARNING: CPU: 0 PID: 661 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x39/0x40
[ 7976.741595] Modules linked in:
[ 7976.741595] CPU: 0 PID: 661 Comm: rcu_torture_rea Not tainted 4.7.0-rc2+ #1
[ 7976.741595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 7976.741595] 0000000000000000 ffff88000002fcc8 ffffffff8138ab2e 0000000000000000
[ 7976.741595] 0000000000000000 ffff88000002fd08 ffffffff8105cabc 0000007d1fd0ee18
[ 7976.741595] 0000000000000001 ffff88001fd16d40 ffff88001fd0ee00 ffff88001fd0ee00
[ 7976.741595] Call Trace:
[ 7976.741595] [] dump_stack+0x67/0x99
[ 7976.741595] [] __warn+0xcc/0xf0
[ 7976.741595] [] warn_slowpath_null+0x18/0x20
[ 7976.741595] [] native_smp_send_reschedule+0x39/0x40
[ 7976.741595] [] wake_up_nohz_cpu+0x82/0x190
[ 7976.741595] [] internal_add_timer+0x7a/0x80
[ 7976.741595] [] mod_timer+0x187/0x2b0
[ 7976.741595] [] rcu_torture_reader+0x33d/0x380
[ 7976.741595] [] ? sched_torture_read_unlock+0x30/0x30
[ 7976.741595] [] ? rcu_bh_torture_read_lock+0x80/0x80
[ 7976.741595] [] kthread+0xdf/0x100
[ 7976.741595] [] ret_from_fork+0x1f/0x40
[ 7976.741595] [] ? kthread_create_on_node+0x200/0x200However, in this case, the wakeup is redundant, because the timer
migration will reprogram timer hardware as needed. Note that the fact
that preemption is disabled does not avoid the splat, as the offline
operation has already passed both the synchronize_sched() and the
stop_machine() that would be blocked by disabled preemption.This commit therefore modifies wake_up_nohz_cpu() to avoid attempting
to wake up offline CPUs. It also adds a comment stating that the
caller must tolerate lost wakeups when the target CPU is going offline,
and suggesting the CPU_DEAD notifier as a recovery mechanism.Signed-off-by: Paul E. McKenney
Cc: Peter Zijlstra
Cc: Frederic Weisbecker
Cc: Thomas Gleixner -
Commit abedf8e2419f ("rcu: Use simple wait queues where possible in
rcutree") converts Tree RCU's wait queues to simple wait queues,
but it incorrectly reverts the commit 2aa792e6faf1 ("rcu: Use
rcu_gp_kthread_wake() to wake up grace period kthreads"). This can
result in redundant self-wakeups.This commit therefore replaces the simple wait-queue wakeups with
rcu_gp_kthread_wake(), thus avoiding the redundant wakeups.Signed-off-by: Jisheng Zhang
Signed-off-by: Paul E. McKenney
22 Aug, 2016
1 commit
-
Carrying out the following steps results in a softlockup in the
RCU callback-offload (rcuo) kthreads:1. Connect to ixgbevf, and set the speed to 10Gb/s.
2. Use ifconfig to bring the nic up and down repeatedly.[ 317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[ 368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
[ 368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
[ 368.106005] RIP: 0010:[] [] fib_table_lookup+0x14/0x390
[ 368.106005] RSP: 0018:ffff88061fc83ce8 EFLAGS: 00000286
[ 368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
[ 368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
[ 368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
[ 368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
[ 368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
[ 368.106005] FS: 0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
[ 368.106005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
[ 368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 368.106005] Stack:
[ 368.106005] 00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
[ 368.106005] ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
[ 368.106005] ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
[ 368.106005] Call Trace:
[ 368.106005]
[ 368.106005]
[ 368.106005] [] ip_route_input_noref+0x516/0xbd0
[ 368.106005] [] ? skb_release_data+0xd6/0x110
[ 368.106005] [] ? kfree_skb+0x3a/0xa0
[ 368.106005] [] ip_rcv_finish+0x29f/0x350
[ 368.106005] [] ip_rcv+0x234/0x380
[ 368.106005] [] __netif_receive_skb_core+0x676/0x870
[ 368.106005] [] __netif_receive_skb+0x18/0x60
[ 368.106005] [] process_backlog+0xae/0x180
[ 368.106005] [] net_rx_action+0x152/0x240
[ 368.106005] [] __do_softirq+0xef/0x280
[ 368.106005] [] call_softirq+0x1c/0x30
[ 368.106005]
[ 368.106005]
[ 368.106005] [] do_softirq+0x65/0xa0
[ 368.106005] [] local_bh_enable+0x94/0xa0
[ 368.106005] [] rcu_nocb_kthread+0x232/0x370
[ 368.106005] [] ? wake_up_bit+0x30/0x30
[ 368.106005] [] ? rcu_start_gp+0x40/0x40
[ 368.106005] [] kthread+0xcf/0xe0
[ 368.106005] [] ? kthread_create_on_node+0x140/0x140
[ 368.106005] [] ret_from_fork+0x58/0x90
[ 368.106005] [] ? kthread_create_on_node+0x140/0x140==================================cut here==============================
It turns out that the rcuos callback-offload kthread is busy processing
a very large quantity of RCU callbacks, and it is not reliquishing the
CPU while doing so. This commit therefore adds an cond_resched_rcu_qs()
within the loop to allow other tasks to run.Signed-off-by: Ding Tianhong
[ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
Signed-off-by: Paul E. McKenney
08 Aug, 2016
1 commit
-
Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.No intended functional changes in this commit.
Signed-off-by: Jens Axboe
06 Aug, 2016
1 commit
-
Pull perf updates from Ingo Molnar:
"Mostly tooling fixes and some late tooling updates, plus two perf
related printk message fixes"* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tests bpf: Use SyS_epoll_wait alias
perf tests: objdump output can contain multi byte chunks
perf record: Add --sample-cpu option
perf hists: Introduce output_resort_cb method
perf tools: Move config/Makefile into Makefile.config
perf tests: Add test for bitmap_scnprintf function
tools lib: Add bitmap_and function
tools lib: Add bitmap_scnprintf function
tools lib: Add bitmap_alloc function
tools lib traceevent: Ignore generated library files
perf tools: Fix build failure on perl script context
perf/core: Change log level for duration warning to KERN_INFO
perf annotate: Plug filename string leak
perf annotate: Introduce strerror for handling symbol__disassemble() errors
perf annotate: Rename symbol__annotate() to symbol__disassemble()
perf/x86: Modify error message in virtualized environment
perf target: str_error_r() always returns the buffer it receives
perf annotate: Use pipe + fork instead of popen
perf evsel: Introduce constructor for cycles event
05 Aug, 2016
1 commit
-
Pull more powerpc updates from Michael Ellerman:
"These were delayed for various reasons, so I let them sit in next a
bit longer, rather than including them in my first pull request.Fixes:
- Fix early access to cpu_spec relocation from Benjamin Herrenschmidt
- Fix incorrect event codes in power9-event-list from Madhavan Srinivasan
- Move register_process_table() out of ppc_md from Michael EllermanUse jump_label use for [cpu|mmu]_has_feature():
- Add mmu_early_init_devtree() from Michael Ellerman
- Move disable_radix handling into mmu_early_init_devtree() from Michael Ellerman
- Do hash device tree scanning earlier from Michael Ellerman
- Do radix device tree scanning earlier from Michael Ellerman
- Do feature patching before MMU init from Michael Ellerman
- Check features don't change after patching from Michael Ellerman
- Make MMU_FTR_RADIX a MMU family feature from Aneesh Kumar K.V
- Convert mmu_has_feature() to returning bool from Michael Ellerman
- Convert cpu_has_feature() to returning bool from Michael Ellerman
- Define radix_enabled() in one place & use static inline from Michael Ellerman
- Add early_[cpu|mmu]_has_feature() from Michael Ellerman
- Convert early cpu/mmu feature check to use the new helpers from Aneesh Kumar K.V
- jump_label: Make it possible for arches to invoke jump_label_init() earlier from Kevin Hao
- Call jump_label_init() in apply_feature_fixups() from Aneesh Kumar K.V
- Remove mfvtb() from Kevin Hao
- Move cpu_has_feature() to a separate file from Kevin Hao
- Add kconfig option to use jump labels for cpu/mmu_has_feature() from Michael Ellerman
- Add option to use jump label for cpu_has_feature() from Kevin Hao
- Add option to use jump label for mmu_has_feature() from Kevin Hao
- Catch usage of cpu/mmu_has_feature() before jump label init from Aneesh Kumar K.V
- Annotate jump label assembly from Michael EllermanTLB flush enhancements from Aneesh Kumar K.V:
- radix: Implement tlb mmu gather flush efficiently
- Add helper for finding SLBE LLP encoding
- Use hugetlb flush functions
- Drop multiple definition of mm_is_core_local
- radix: Add tlb flush of THP ptes
- radix: Rename function and drop unused arg
- radix/hugetlb: Add helper for finding page size
- hugetlb: Add flush_hugetlb_tlb_range
- remove flush_tlb_page_nohashAdd new ptrace regsets from Anshuman Khandual and Simon Guo:
- elf: Add powerpc specific core note sections
- Add the function flush_tmregs_to_thread
- Enable in transaction NT_PRFPREG ptrace requests
- Enable in transaction NT_PPC_VMX ptrace requests
- Enable in transaction NT_PPC_VSX ptrace requests
- Adapt gpr32_get, gpr32_set functions for transaction
- Enable support for NT_PPC_CGPR
- Enable support for NT_PPC_CFPR
- Enable support for NT_PPC_CVMX
- Enable support for NT_PPC_CVSX
- Enable support for TM SPR state
- Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR
- Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR
- Enable support for EBB registers
- Enable support for Performance Monitor registers"* tag 'powerpc-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (48 commits)
powerpc/mm: Move register_process_table() out of ppc_md
powerpc/perf: Fix incorrect event codes in power9-event-list
powerpc/32: Fix early access to cpu_spec relocation
powerpc/ptrace: Enable support for Performance Monitor registers
powerpc/ptrace: Enable support for EBB registers
powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR
powerpc/ptrace: Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR
powerpc/ptrace: Enable support for TM SPR state
powerpc/ptrace: Enable support for NT_PPC_CVSX
powerpc/ptrace: Enable support for NT_PPC_CVMX
powerpc/ptrace: Enable support for NT_PPC_CFPR
powerpc/ptrace: Enable support for NT_PPC_CGPR
powerpc/ptrace: Adapt gpr32_get, gpr32_set functions for transaction
powerpc/ptrace: Enable in transaction NT_PPC_VSX ptrace requests
powerpc/ptrace: Enable in transaction NT_PPC_VMX ptrace requests
powerpc/ptrace: Enable in transaction NT_PRFPREG ptrace requests
powerpc/process: Add the function flush_tmregs_to_thread
elf: Add powerpc specific core note sections
powerpc/mm: remove flush_tlb_page_nohash
powerpc/mm/hugetlb: Add flush_hugetlb_tlb_range
...
04 Aug, 2016
8 commits
-
Pull module updates from Rusty Russell:
"The only interesting thing here is Jessica's patch to add
ro_after_init support to modules. The rest are all trivia"* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
extable.h: add stddef.h so "NULL" definition is not implicit
modules: add ro_after_init support
jump_label: disable preemption around __module_text_address().
exceptions: fork exception table content from module.h into extable.h
modules: Add kernel parameter to blacklist modules
module: Do a WARN_ON_ONCE() for assert module mutex not held
Documentation/module-signing.txt: Note need for version info if reusing a key
module: Invalidate signatures on force-loaded modules
module: Issue warnings when tainting kernel
module: fix redundant test.
module: fix noreturn attribute for __module_put_and_exit() -
The current jump_label.h includes bug.h for things such as WARN_ON().
This makes the header problematic for inclusion by kernel.h or any
headers that kernel.h includes, since bug.h includes kernel.h (circular
dependency). The inclusion of atomic.h is similarly problematic. Thus,
this should make jump_label.h 'includable' from most places.Link: http://lkml.kernel.org/r/7060ce35ddd0d20b33bf170685e6b0fab816bdf2.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron
Cc: "David S. Miller"
Cc: Arnd Bergmann
Cc: Benjamin Herrenschmidt
Cc: Chris Metcalf
Cc: Heiko Carstens
Cc: Joe Perches
Cc: Martin Schwidefsky
Cc: Michael Ellerman
Cc: Paul Mackerras
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The use of config_enabled() against config options is ambiguous. In
practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
author might have used it for the meaning of IS_ENABLED(). Using
IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc. makes the intention
clearer.This commit replaces config_enabled() with IS_ENABLED() where possible.
This commit is only touching bool config options.I noticed two cases where config_enabled() is used against a tristate
option:- config_enabled(CONFIG_HWMON)
[ drivers/net/wireless/ath/ath10k/thermal.c ]- config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
[ drivers/gpu/drm/gma500/opregion.c ]I did not touch them because they should be converted to IS_BUILTIN()
in order to keep the logic, but I was not sure it was the authors'
intention.Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada
Acked-by: Kees Cook
Cc: Stas Sergeev
Cc: Matt Redfearn
Cc: Joshua Kinard
Cc: Jiri Slaby
Cc: Bjorn Helgaas
Cc: Borislav Petkov
Cc: Markos Chandras
Cc: "Dmitry V. Levin"
Cc: yu-cheng yu
Cc: James Hogan
Cc: Brian Gerst
Cc: Johannes Berg
Cc: Peter Zijlstra
Cc: Al Viro
Cc: Will Drewry
Cc: Nikolay Martynov
Cc: Huacai Chen
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Daniel Borkmann
Cc: Leonid Yegoshin
Cc: Rafal Milecki
Cc: James Cowgill
Cc: Greg Kroah-Hartman
Cc: Ralf Baechle
Cc: Alex Smith
Cc: Adam Buchbinder
Cc: Qais Yousef
Cc: Jiang Liu
Cc: Mikko Rapeli
Cc: Paul Gortmaker
Cc: Denys Vlasenko
Cc: Brian Norris
Cc: Hidehiro Kawai
Cc: "Luis R. Rodriguez"
Cc: Andy Lutomirski
Cc: Ingo Molnar
Cc: Dave Hansen
Cc: "Kirill A. Shutemov"
Cc: Roland McGrath
Cc: Paul Burton
Cc: Kalle Valo
Cc: Viresh Kumar
Cc: Tony Wu
Cc: Huaitong Han
Cc: Sumit Semwal
Cc: Alexei Starovoitov
Cc: Juergen Gross
Cc: Jason Cooper
Cc: "David S. Miller"
Cc: Oleg Nesterov
Cc: Andrea Gelmini
Cc: David Woodhouse
Cc: Marc Zyngier
Cc: Rabin Vincent
Cc: "Maciej W. Rozycki"
Cc: David Daney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add ro_after_init support for modules by adding a new page-aligned section
in the module layout (after rodata) for ro_after_init data and enabling RO
protection for that section after module init runs.Signed-off-by: Jessica Yu
Acked-by: Kees Cook
Signed-off-by: Rusty Russell -
Steven reported a warning caused by not holding module_mutex or
rcu_read_lock_sched: his backtrace was corrupted but a quick audit
found this possible cause. It's wrong anyway...Reported-by: Steven Rostedt
Signed-off-by: Rusty Russell -
Blacklisting a module in linux has long been a problem. The current
procedure is to use rd.blacklist=module_name, however, that doesn't
cover the case after the initramfs and before a boot prompt (where one
is supposed to use /etc/modprobe.d/blacklist.conf to blacklist
runtime loading). Using rd.shell to get an early prompt is hit-or-miss,
and doesn't cover all situations AFAICT.This patch adds this functionality of permanently blacklisting a module
by its name via the kernel parameter module_blacklist=module_name.[v2]: Rusty, use core_param() instead of __setup() which simplifies
things.[v3]: Rusty, undo wreckage from strsep()
[v4]: Rusty, simpler version of blacklisted()
Signed-off-by: Prarit Bhargava
Cc: Jonathan Corbet
Cc: Rusty Russell
Cc: linux-doc@vger.kernel.org
Signed-off-by: Rusty Russell -
When running with lockdep enabled, I triggered the WARN_ON() in the
module code that asserts when module_mutex or rcu_read_lock_sched are
not held. The issue I have is that this can also be called from the
dump_stack() code, causing us to enter an infinite loop...------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-test-00013-g501c2375253c #14
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
ffff880215e8fa70 ffff880215e8fa70 ffffffff812fc8e3 0000000000000000
ffffffff81d3e55b ffff880215e8fac0 ffffffff8104fc88 ffffffff8104fcab
0000000915e88300 0000000000000046 ffffffffa019b29a 0000000000000001
Call Trace:
[] dump_stack+0x67/0x90
[] __warn+0xcb/0xe9
[] ? warn_slowpath_null+0x5/0x1f
------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-test-00013-g501c2375253c #14
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
ffff880215e8f7a0 ffff880215e8f7a0 ffffffff812fc8e3 0000000000000000
ffffffff81d3e55b ffff880215e8f7f0 ffffffff8104fc88 ffffffff8104fcab
0000000915e88300 0000000000000046 ffffffffa019b29a 0000000000000001
Call Trace:
[] dump_stack+0x67/0x90
[] __warn+0xcb/0xe9
[] ? warn_slowpath_null+0x5/0x1f
------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-test-00013-g501c2375253c #14
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
ffff880215e8f4d0 ffff880215e8f4d0 ffffffff812fc8e3 0000000000000000
ffffffff81d3e55b ffff880215e8f520 ffffffff8104fc88 ffffffff8104fcab
0000000915e88300 0000000000000046 ffffffffa019b29a 0000000000000001
Call Trace:
[] dump_stack+0x67/0x90
[] __warn+0xcb/0xe9
[] ? warn_slowpath_null+0x5/0x1f
------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
[...]Which gives us rather useless information. Worse yet, there's some race
that causes this, and I seldom trigger it, so I have no idea what
happened.This would not be an issue if that warning was a WARN_ON_ONCE().
Signed-off-by: Steven Rostedt
Signed-off-by: Rusty Russell -
Pull tracing fixes from Steven Rostedt:
"A few updates and fixes:- move the suppressing of the __builtin_return_address >0 warning to
the tracing directory only.- metag recordmcount fix for newer glibc's
- two tracing histogram fixes that were reported by KASAN"
* tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix use-after-free in hist_register_trigger()
tracing: Fix use-after-free in hist_unreg_all/hist_enable_unreg_all
Makefile: Mute warning for __builtin_return_address(>0) for tracing only
ftrace/recordmcount: Work around for addition of metag magic but not relocations
03 Aug, 2016
20 commits
-
Copy the config fragments from the AOSP common kernel android-4.4
branch. It is becoming possible to run mainline kernels with Android,
but the kernel defconfigs don't work as-is and debugging missing config
options is a pain. Adding the config fragments into the kernel tree,
makes configuring a mainline kernel as simple as:make ARCH=arm multi_v7_defconfig android-base.config android-recommended.config
The following non-upstream config options were removed:
CONFIG_NETFILTER_XT_MATCH_QTAGUID
CONFIG_NETFILTER_XT_MATCH_QUOTA2
CONFIG_NETFILTER_XT_MATCH_QUOTA2_LOG
CONFIG_PPPOLAC
CONFIG_PPPOPNS
CONFIG_SECURITY_PERF_EVENTS_RESTRICT
CONFIG_USB_CONFIGFS_F_MTP
CONFIG_USB_CONFIGFS_F_PTP
CONFIG_USB_CONFIGFS_F_ACC
CONFIG_USB_CONFIGFS_F_AUDIO_SRC
CONFIG_USB_CONFIGFS_UEVENT
CONFIG_INPUT_KEYCHORD
CONFIG_INPUT_KEYRESETLink: http://lkml.kernel.org/r/1466708235-28593-1-git-send-email-robh@kernel.org
Signed-off-by: Rob Herring
Cc: Amit Pundir
Cc: John Stultz
Cc: Dmitry Shmidt
Cc: Rom Lemarchand
Cc: Greg Kroah-Hartman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit 20d8b67c06fa ("relay: add buffer-only channels; useful for early
logging") added support to use channels with no associated files.This is useful when the exact location of relay file is not known or the
the parent directory of relay file is not available, while creating the
channel and the logging has to start right from the boot.But there was no provision to use global mode with buffer-only channels,
which is added by this patch, without modifying the interface where
initially there will be a dummy invocation of create_buf_file callback
through which kernel client can convey the need of a global buffer.For the use case where drivers/kernel clients want a simple interface
for the userspace, which enables them to capture data/logs from relay
file inorder & without any post processing, support of Global buffer
mode is warranted.Modules, like i915, using relay_open() in early init would have to later
register their buffer-only relays, once debugfs is available, by calling
relay_late_setup_files(). Hence relay_late_setup_files() symbol also
needs to be exported.Link: http://lkml.kernel.org/r/1468404563-11653-1-git-send-email-akash.goel@intel.com
Signed-off-by: Akash Goel
Cc: Eduard - Gabriel Munteanu
Cc: Tom Zanussi
Cc: Chris Wilson
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I hit the following issue when run trinity in my system. The kernel is
3.4 version, but mainline has the same issue.The root cause is that the segment size is too large so the kerenl
spends too long trying to allocate a page. Other cases will block until
the test case quits. Also, OOM conditions will occur.Call Trace:
__alloc_pages_nodemask+0x14c/0x8f0
alloc_pages_current+0xaf/0x120
kimage_alloc_pages+0x10/0x60
kimage_alloc_control_pages+0x5d/0x270
machine_kexec_prepare+0xe5/0x6c0
? kimage_free_page_list+0x52/0x70
sys_kexec_load+0x141/0x600
? vfs_write+0x100/0x180
system_call_fastpath+0x16/0x1bThe patch changes sanity_check_segment_list() to verify that the usage by
all segments does not exceed half of memory.[akpm@linux-foundation.org: fix for kexec-return-error-number-directly.patch, update comment]
Link: http://lkml.kernel.org/r/1469625474-53904-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang
Suggested-by: Eric W. Biederman
Cc: Vivek Goyal
Cc: Dave Young
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Provide a wrapper function to be used by kernel code to check whether a
crash kernel is loaded. It returns the same value that can be seen in
/sys/kernel/kexec_crash_loaded by userspace programs.I'm exporting the function, because it will be used by Xen, and it is
possible to compile Xen modules separately to enable the use of PV
drivers with unmodified bare-metal kernels.Link: http://lkml.kernel.org/r/20160713121955.14969.69080.stgit@hananiah.suse.cz
Signed-off-by: Petr Tesarik
Cc: Juergen Gross
Cc: Josh Triplett
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Eric Biederman
Cc: "H. Peter Anvin"
Cc: Boris Ostrovsky
Cc: "Paul E. McKenney"
Cc: Dave Young
Cc: David Vrabel
Cc: Vivek Goyal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
crash_kexec_post_notifiers ia a boot option which controls whether the
1st kernel calls panic notifiers or not before booting the 2nd kernel.
However, there is no need to limit it to being modifiable only at boot
time. So, use core_param instead of early_param.Link: http://lkml.kernel.org/r/20160705113327.5864.43139.stgit@softrs
Signed-off-by: Hidehiro Kawai
Cc: Dave Young
Cc: Baoquan He
Cc: Vivek Goyal
Cc: Eric Biederman
Cc: Masami Hiramatsu
Cc: Borislav Petkov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
kexec physical addresses are the boot-time view of the system. For
certain ARM systems (such as Keystone 2), the boot view of the system
does not match the kernel's view of the system: the boot view uses a
special alias in the lower 4GB of the physical address space.To cater for these kinds of setups, we need to translate between the
boot view physical addresses and the normal kernel view physical
addresses. This patch extracts the current transation points into
linux/kexec.h, and allows an architecture to override the functions.Due to the translations required, we unfortunately end up with six
translation functions, which are reduced down to four that the
architecture can override.[akpm@linux-foundation.org: kexec.h needs asm/io.h for phys_to_virt()]
Link: http://lkml.kernel.org/r/E1b8koP-0004HZ-Vf@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King
Cc: Keerthy
Cc: Pratyush Anand
Cc: Vitaly Andrianov
Cc: Eric Biederman
Cc: Dave Young
Cc: Baoquan He
Cc: Vivek Goyal
Cc: Simon Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
On PAE systems (eg, ARM LPAE) the vmcore note may be located above 4GB
physical on 32-bit architectures, so we need a wider type than "unsigned
long" here. Arrange for paddr_vmcoreinfo_note() to return a
phys_addr_t, thereby allowing it to be located above 4GB.This makes no difference for kexec-tools, as they already assume a
64-bit type when reading from this file.Link: http://lkml.kernel.org/r/E1b8koK-0004HS-K9@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King
Reviewed-by: Pratyush Anand
Acked-by: Baoquan He
Cc: Keerthy
Cc: Vitaly Andrianov
Cc: Eric Biederman
Cc: Dave Young
Cc: Vivek Goyal
Cc: Simon Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Ensure that user memory sizes do not wrap around when validating the
user input, which can lead to the following input validation working
incorrectly.[akpm@linux-foundation.org: fix it for kexec-return-error-number-directly.patch]
Link: http://lkml.kernel.org/r/E1b8koF-0004HM-5x@rmk-PC.armlinux.org.uk
Signed-off-by: Russell King
Reviewed-by: Pratyush Anand
Acked-by: Baoquan He
Cc: Keerthy
Cc: Vitaly Andrianov
Cc: Eric Biederman
Cc: Dave Young
Cc: Vivek Goyal
Cc: Simon Horman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This is a cleanup patch to make kexec more clear to return error number
directly. The variable result is useless, because there is no other
function's return value assignes to it. So remove it.Link: http://lkml.kernel.org/r/1464179273-57668-1-git-send-email-mnghuan@gmail.com
Signed-off-by: Minfei Huang
Cc: Dave Young
Cc: Baoquan He
Cc: Borislav Petkov
Cc: Xunlei Pang
Cc: Atsushi Kumagai
Cc: Vivek Goyal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Many targets enable CONFIG_DEBUG_STACK_USAGE, and while the information
is useful, it isn't worthy of pr_warn(). Reduce it to pr_info().Link: http://lkml.kernel.org/r/1466982072-29836-1-git-send-email-anton@ozlabs.org
Signed-off-by: Anton Blanchard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add a "printk.devkmsg" kernel command line parameter which controls how
userspace writes into /dev/kmsg. It has three options:* ratelimit - ratelimit logging from userspace.
* on - unlimited logging from userspace
* off - logging from userspace gets ignoredThe default setting is to ratelimit the messages written to it.
This changes the kernel default setting of "on" to "ratelimit" and we do
that because we want to keep userspace spamming /dev/kmsg to sane
levels. This is especially moot when a small kernel log buffer wraps
around and messages get lost. So the ratelimiting setting should be a
sane setting where kernel messages should have a bit higher chance of
survival from all the spamming.It additionally does not limit logging to /dev/kmsg while the system is
booting if we haven't disabled it on the command line.Furthermore, we can control the logging from a lower priority sysctl
interface - kernel.printk_devkmsg.That interface will succeed only if printk.devkmsg *hasn't* been
supplied on the command line. If it has, then printk.devkmsg is a
one-time setting which remains for the duration of the system lifetime.
This "locking" of the setting is to prevent userspace from changing the
logging on us through sysctl(2).This patch is based on previous patches from Linus and Steven.
[bp@suse.de: fixes]
Link: http://lkml.kernel.org/r/20160719072344.GC25563@nazgul.tnic
Link: http://lkml.kernel.org/r/20160716061745.15795-3-bp@alien8.de
Signed-off-by: Borislav Petkov
Cc: Dave Young
Cc: Franck Bui
Cc: Greg Kroah-Hartman
Cc: Ingo Molnar
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Uwe Kleine-König
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
asm-generic headers are generic implementations for architecture
specific code and should not be included by common code. Thus use the
asm/ version of sections.h to get at the linker sections.Link: http://lkml.kernel.org/r/1468285008-7331-1-git-send-email-hch@lst.de
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Messages' levels and console log level are inspected when the actual
printing occurs, which may provoke console_unlock() and
console_cont_flush() to waste CPU cycles on every message that has
loglevel above the current console_loglevel.Schematically, console_unlock() does the following:
console_unlock()
{
...
for (;;) {
...
raw_spin_lock_irqsave(&logbuf_lock, flags);
skip:
msg = log_from_idx(console_idx);if (msg->flags & LOG_NOCONS) {
...
goto skip;
}level = msg->level;
len += msg_print_text(); >> sprintfs
memcpy,
etc.if (nr_ext_console_drivers) {
ext_len = msg_print_ext_header(); >> scnprintf
ext_len += msg_print_ext_body(); >> scnprintfs
etc.
}
...
raw_spin_unlock(&logbuf_lock);call_console_drivers(level, ext_text, ext_len, text, len)
{
if (level >= console_loglevel && >> drop the message
!ignore_loglevel)
return;console->write(...);
}local_irq_restore(flags);
}
...
}The thing here is this deferred `level >= console_loglevel' check. We
are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages
that we will eventually drop.This can be huge when we register a new CON_PRINTBUFFER console, for
instance. For every such a console register_console() resets theconsole_seq, console_idx, console_prev
and sets a `exclusive console' pointer to replay the log buffer to that
just-registered console. And there can be a lot of messages to replay,
in the worst case most of which can be dropped after console_loglevel
test.We know messages' levels long before we call msg_print_text() and
friends, so we can just move console_loglevel check out of
call_console_drivers() and format a new message only if we are sure that
it won't be dropped.The patch factors out loglevel check into suppress_message_printing()
function and tests message->level and console_loglevel before formatting
functions in console_unlock() and console_cont_flush() are getting
executed. This improves things not only for exclusive CON_PRINTBUFFER
consoles, but for every console_unlock() that attempts to print a
message of level above the console_loglevel.Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky
Reviewed-by: Petr Mladek
Cc: Tejun Heo
Cc: Jan Kara
Cc: Calvin Owens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Using functions instead of macros can reduce overall code size by
eliminating unnecessary "KERN_SOH" prefixes from format strings.defconfig x86-64:
$ size vmlinux*
text data bss dec hex filename
10193570 4331464 1105920 15630954 ee826a vmlinux.new
10192623 4335560 1105920 15634103 ee8eb7 vmlinux.oldAs the return value are unimportant and unused in the kernel tree, these
new functions return void.Miscellanea:
- change pr_ macros to call new __pr_ functions
- change vprintk_nmi and vprintk_default to add LOGLEVEL_ argument[akpm@linux-foundation.org: fix LOGLEVEL_INFO, per Joe]
Link: http://lkml.kernel.org/r/e16cc34479dfefcae37c98b481e6646f0f69efc3.1466718827.git.joe@perches.com
Signed-off-by: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A trivial cosmetic change: interrupt.h header is redundant since commit
6b898c07cb1d ("console: use might_sleep in console_lock").Link: http://lkml.kernel.org/r/20160620132847.21930-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
kernel.h header doesn't directly use dynamic debug, instead we can
include it in module.c (which used it via kernel.h). printk.h only uses
it if CONFIG_DYNAMIC_DEBUG is on, changing the inclusion to only happen
in that case.Link: http://lkml.kernel.org/r/1468429793-16917-1-git-send-email-luisbg@osg.samsung.com
[luisbg@osg.samsung.com: include dynamic_debug.h in drb_int.h]
Link: http://lkml.kernel.org/r/1468447828-18558-2-git-send-email-luisbg@osg.samsung.com
Signed-off-by: Luis de Bethencourt
Cc: Rusty Russell
Cc: Hidehiro Kawai
Cc: Borislav Petkov
Cc: Michal Nazarewicz
Cc: Rasmus Villemoes
Cc: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change task_work_cancel() to use lockless_dereference(), this is what
the code really wants but we didn't have this helper when it was
written.Also add the fast-path task->task_works == NULL check, in the likely
case this task has no pending works and we can avoid
spin_lock(task->pi_lock).While at it, change other users of ACCESS_ONCE() to use READ_ONCE().
Link: http://lkml.kernel.org/r/20160610150042.GA13868@redhat.com
Signed-off-by: Oleg Nesterov
Cc: Andrea Parri
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This fixes a use-after-free case flagged by KASAN; make sure the test
happens before the potential free in this case.Link: http://lkml.kernel.org/r/48fd74ab61bebd7dca9714386bb47d7c5ccd6a7b.1467247517.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi
Signed-off-by: Steven Rostedt -
While running tools/testing/selftests test suite with KASAN, Dmitry
Vyukov hit the following use-after-free report:==================================================================
BUG: KASAN: use-after-free in hist_unreg_all+0x1a1/0x1d0 at addr
ffff880031632cc0
Read of size 8 by task ftracetest/7413
==================================================================
BUG kmalloc-128 (Not tainted): kasan: bad access detected
------------------------------------------------------------------This fixes the problem, along with the same problem in
hist_enable_unreg_all().Link: http://lkml.kernel.org/r/c3d05b79e42555b6e36a3a99aae0e37315ee5304.1467247517.git.tom.zanussi@linux.intel.com
Cc: Dmitry Vyukov
[Copied Steve's hist_enable_unreg_all() fix to hist_unreg_all()]
Signed-off-by: Tom Zanussi
Signed-off-by: Steven Rostedt -
With the latest gcc compilers, they give a warning if
__builtin_return_address() parameter is greater than 0. That is because if
it is used by a function called by a top level function (or in the case of
the kernel, by assembly), it can try to access stack frames outside the
stack and crash the system.The tracing system uses __builtin_return_address() of up to 2! But it is
well aware of the dangers that it may have, and has even added precautions
to protect against it (see the thunk code in arch/x86/entry/thunk*.S)Linus originally added KBUILD_CFLAGS that would suppress the warning for the
entire kernel, as simply adding KBUILD_CFLAGS to the tracing directory
wouldn't work. The tracing directory plays a bit with the CFLAGS and
requires a little more logic.This adds that special logic to only suppress the warning for the tracing
directory. If it is used anywhere else outside of tracing, the warning will
still be triggered.Link: http://lkml.kernel.org/r/20160728223043.51996267@grimm.local.home
Tested-by: Linus Torvalds
Signed-off-by: Steven Rostedt
02 Aug, 2016
1 commit
-
When the perf interrupt handler exceeds a threshold warning messages
are displayed on console:[12739.31793] perf interrupt took too long (2504 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[71340.165065] perf interrupt took too long (5005 > 5000), lowering kernel.perf_event_max_sample_rate to 25000Many customers and users are confused by the message wondering if
something is wrong or they need to take action to fix a problem.
Since a user can not do anything to fix the issue, the message is really
more informational than a warning. Adjust the log level accordingly.Signed-off-by: David Ahern
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1470084569-438-1-git-send-email-dsa@cumulusnetworks.com
Signed-off-by: Ingo Molnar
01 Aug, 2016
1 commit
-
Some arches (powerpc at least) would like to invoke jump_label_init()
much earlier in boot. So check static_key_initialized in order to make
sure this function runs only once.LGTM-by: Ingo (http://marc.info/?l=linux-kernel&m=144049104329961&w=2)
Signed-off-by: Kevin Hao
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Michael Ellerman
31 Jul, 2016
1 commit
-
Pull misc fixes from Thomas Gleixner:
"This update contains:- a fix for stomp-machine so the nmi_watchdog wont trigger on the cpu
waiting for the others to execute the callback- various fixes and updates to objtool including an resync of the
instruction decoder to match the kernel's decoder"* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Un-capitalize "Warning" for out-of-sync instruction decoder
objtool: Resync x86 instruction decoder with the kernel's
objtool: Support new GCC 6 switch jump table pattern
stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE
objtool: Add 'fixdep' to objtool/.gitignore
30 Jul, 2016
1 commit
-
Pull audit updates from Paul Moore:
"Six audit patches for 4.8.There are a couple of style and minor whitespace tweaks for the logs,
as well as a minor fixup to catch errors on user filter rules, however
the major improvements are a fix to the s390 syscall argument masking
code (reviewed by the nice s390 folks), some consolidation around the
exclude filtering (less code, always a win), and a double-fetch fix
for recording the execve arguments"* 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit:
audit: fix a double fetch in audit_log_single_execve_arg()
audit: fix whitespace in CWD record
audit: add fields to exclude filter by reusing user filter
s390: ensure that syscall arguments are properly masked on s390
audit: fix some horrible switch statement style crimes
audit: fixup: log on errors from filter user rules