23 Aug, 2016

4 commits

  • Currently, __note_gp_changes() checks to see if the CPU has slept through
    multiple grace periods. If it has, it resynchronizes that CPU's view
    of the grace-period state, which includes whether or not the current
    grace period needs a quiescent state from this CPU. The fact of this
    need (or lack thereof) needs to be in two places, rdp->cpu_no_qs.b.norm
    and rdp->core_needs_qs. The former tells RCU's context-switch code to
    go get a quiescent state and the latter says that it needs to be reported.
    The current code unconditionally sets the former to true, but correctly
    sets the latter.

    This does not result in failures, but it does unnecessarily increase
    the amount of work done on average at context-switch time. This commit
    therefore correctly sets both fields.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • The Kconfig currently controlling compilation of tree.c is:

    init/Kconfig:config TREE_RCU
    init/Kconfig: bool

    ...and update.c and sync.c are "obj-y" meaning that none are ever
    built as a module by anyone.

    Since MODULE_ALIAS is a no-op for non-modular code, we can remove
    them from these files.

    We leave moduleparam.h behind since the files instantiate some boot
    time configuration parameters with module_param() still.

    Cc: "Paul E. McKenney"
    Cc: Josh Triplett
    Cc: Steven Rostedt
    Cc: Mathieu Desnoyers
    Cc: Lai Jiangshan
    Signed-off-by: Paul Gortmaker
    Signed-off-by: Paul E. McKenney

    Paul Gortmaker
     
  • Both timers and hrtimers are maintained on the outgoing CPU until
    CPU_DEAD time, at which point they are migrated to a surviving CPU. If a
    mod_timer() executes between CPU_DYING and CPU_DEAD time, x86 systems
    will splat in native_smp_send_reschedule() when attempting to wake up
    the just-now-offlined CPU, as shown below from a NO_HZ_FULL kernel:

    [ 7976.741556] WARNING: CPU: 0 PID: 661 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x39/0x40
    [ 7976.741595] Modules linked in:
    [ 7976.741595] CPU: 0 PID: 661 Comm: rcu_torture_rea Not tainted 4.7.0-rc2+ #1
    [ 7976.741595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    [ 7976.741595] 0000000000000000 ffff88000002fcc8 ffffffff8138ab2e 0000000000000000
    [ 7976.741595] 0000000000000000 ffff88000002fd08 ffffffff8105cabc 0000007d1fd0ee18
    [ 7976.741595] 0000000000000001 ffff88001fd16d40 ffff88001fd0ee00 ffff88001fd0ee00
    [ 7976.741595] Call Trace:
    [ 7976.741595] [] dump_stack+0x67/0x99
    [ 7976.741595] [] __warn+0xcc/0xf0
    [ 7976.741595] [] warn_slowpath_null+0x18/0x20
    [ 7976.741595] [] native_smp_send_reschedule+0x39/0x40
    [ 7976.741595] [] wake_up_nohz_cpu+0x82/0x190
    [ 7976.741595] [] internal_add_timer+0x7a/0x80
    [ 7976.741595] [] mod_timer+0x187/0x2b0
    [ 7976.741595] [] rcu_torture_reader+0x33d/0x380
    [ 7976.741595] [] ? sched_torture_read_unlock+0x30/0x30
    [ 7976.741595] [] ? rcu_bh_torture_read_lock+0x80/0x80
    [ 7976.741595] [] kthread+0xdf/0x100
    [ 7976.741595] [] ret_from_fork+0x1f/0x40
    [ 7976.741595] [] ? kthread_create_on_node+0x200/0x200

    However, in this case, the wakeup is redundant, because the timer
    migration will reprogram timer hardware as needed. Note that the fact
    that preemption is disabled does not avoid the splat, as the offline
    operation has already passed both the synchronize_sched() and the
    stop_machine() that would be blocked by disabled preemption.

    This commit therefore modifies wake_up_nohz_cpu() to avoid attempting
    to wake up offline CPUs. It also adds a comment stating that the
    caller must tolerate lost wakeups when the target CPU is going offline,
    and suggesting the CPU_DEAD notifier as a recovery mechanism.

    Signed-off-by: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Cc: Thomas Gleixner

    Paul E. McKenney
     
  • Commit abedf8e2419f ("rcu: Use simple wait queues where possible in
    rcutree") converts Tree RCU's wait queues to simple wait queues,
    but it incorrectly reverts the commit 2aa792e6faf1 ("rcu: Use
    rcu_gp_kthread_wake() to wake up grace period kthreads"). This can
    result in redundant self-wakeups.

    This commit therefore replaces the simple wait-queue wakeups with
    rcu_gp_kthread_wake(), thus avoiding the redundant wakeups.

    Signed-off-by: Jisheng Zhang
    Signed-off-by: Paul E. McKenney

    Jisheng Zhang
     

22 Aug, 2016

1 commit

  • Carrying out the following steps results in a softlockup in the
    RCU callback-offload (rcuo) kthreads:

    1. Connect to ixgbevf, and set the speed to 10Gb/s.
    2. Use ifconfig to bring the nic up and down repeatedly.

    [ 317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
    [ 368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
    [ 368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    [ 368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
    [ 368.106005] RIP: 0010:[] [] fib_table_lookup+0x14/0x390
    [ 368.106005] RSP: 0018:ffff88061fc83ce8 EFLAGS: 00000286
    [ 368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
    [ 368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
    [ 368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
    [ 368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
    [ 368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
    [ 368.106005] FS: 0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
    [ 368.106005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
    [ 368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [ 368.106005] Stack:
    [ 368.106005] 00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
    [ 368.106005] ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
    [ 368.106005] ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
    [ 368.106005] Call Trace:
    [ 368.106005]
    [ 368.106005]
    [ 368.106005] [] ip_route_input_noref+0x516/0xbd0
    [ 368.106005] [] ? skb_release_data+0xd6/0x110
    [ 368.106005] [] ? kfree_skb+0x3a/0xa0
    [ 368.106005] [] ip_rcv_finish+0x29f/0x350
    [ 368.106005] [] ip_rcv+0x234/0x380
    [ 368.106005] [] __netif_receive_skb_core+0x676/0x870
    [ 368.106005] [] __netif_receive_skb+0x18/0x60
    [ 368.106005] [] process_backlog+0xae/0x180
    [ 368.106005] [] net_rx_action+0x152/0x240
    [ 368.106005] [] __do_softirq+0xef/0x280
    [ 368.106005] [] call_softirq+0x1c/0x30
    [ 368.106005]
    [ 368.106005]
    [ 368.106005] [] do_softirq+0x65/0xa0
    [ 368.106005] [] local_bh_enable+0x94/0xa0
    [ 368.106005] [] rcu_nocb_kthread+0x232/0x370
    [ 368.106005] [] ? wake_up_bit+0x30/0x30
    [ 368.106005] [] ? rcu_start_gp+0x40/0x40
    [ 368.106005] [] kthread+0xcf/0xe0
    [ 368.106005] [] ? kthread_create_on_node+0x140/0x140
    [ 368.106005] [] ret_from_fork+0x58/0x90
    [ 368.106005] [] ? kthread_create_on_node+0x140/0x140

    ==================================cut here==============================

    It turns out that the rcuos callback-offload kthread is busy processing
    a very large quantity of RCU callbacks, and it is not reliquishing the
    CPU while doing so. This commit therefore adds an cond_resched_rcu_qs()
    within the loop to allow other tasks to run.

    Signed-off-by: Ding Tianhong
    [ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
    Signed-off-by: Paul E. McKenney

    Ding Tianhong
     

08 Aug, 2016

1 commit

  • Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
    portion and the op code in the higher portions. This means that
    old code that relies on manually setting bi_rw is most likely
    going to be broken. Instead of letting that brokeness linger,
    rename the member, to force old and out-of-tree code to break
    at compile time instead of at runtime.

    No intended functional changes in this commit.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

06 Aug, 2016

1 commit

  • Pull perf updates from Ingo Molnar:
    "Mostly tooling fixes and some late tooling updates, plus two perf
    related printk message fixes"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf tests bpf: Use SyS_epoll_wait alias
    perf tests: objdump output can contain multi byte chunks
    perf record: Add --sample-cpu option
    perf hists: Introduce output_resort_cb method
    perf tools: Move config/Makefile into Makefile.config
    perf tests: Add test for bitmap_scnprintf function
    tools lib: Add bitmap_and function
    tools lib: Add bitmap_scnprintf function
    tools lib: Add bitmap_alloc function
    tools lib traceevent: Ignore generated library files
    perf tools: Fix build failure on perl script context
    perf/core: Change log level for duration warning to KERN_INFO
    perf annotate: Plug filename string leak
    perf annotate: Introduce strerror for handling symbol__disassemble() errors
    perf annotate: Rename symbol__annotate() to symbol__disassemble()
    perf/x86: Modify error message in virtualized environment
    perf target: str_error_r() always returns the buffer it receives
    perf annotate: Use pipe + fork instead of popen
    perf evsel: Introduce constructor for cycles event

    Linus Torvalds
     

05 Aug, 2016

1 commit

  • Pull more powerpc updates from Michael Ellerman:
    "These were delayed for various reasons, so I let them sit in next a
    bit longer, rather than including them in my first pull request.

    Fixes:
    - Fix early access to cpu_spec relocation from Benjamin Herrenschmidt
    - Fix incorrect event codes in power9-event-list from Madhavan Srinivasan
    - Move register_process_table() out of ppc_md from Michael Ellerman

    Use jump_label use for [cpu|mmu]_has_feature():
    - Add mmu_early_init_devtree() from Michael Ellerman
    - Move disable_radix handling into mmu_early_init_devtree() from Michael Ellerman
    - Do hash device tree scanning earlier from Michael Ellerman
    - Do radix device tree scanning earlier from Michael Ellerman
    - Do feature patching before MMU init from Michael Ellerman
    - Check features don't change after patching from Michael Ellerman
    - Make MMU_FTR_RADIX a MMU family feature from Aneesh Kumar K.V
    - Convert mmu_has_feature() to returning bool from Michael Ellerman
    - Convert cpu_has_feature() to returning bool from Michael Ellerman
    - Define radix_enabled() in one place & use static inline from Michael Ellerman
    - Add early_[cpu|mmu]_has_feature() from Michael Ellerman
    - Convert early cpu/mmu feature check to use the new helpers from Aneesh Kumar K.V
    - jump_label: Make it possible for arches to invoke jump_label_init() earlier from Kevin Hao
    - Call jump_label_init() in apply_feature_fixups() from Aneesh Kumar K.V
    - Remove mfvtb() from Kevin Hao
    - Move cpu_has_feature() to a separate file from Kevin Hao
    - Add kconfig option to use jump labels for cpu/mmu_has_feature() from Michael Ellerman
    - Add option to use jump label for cpu_has_feature() from Kevin Hao
    - Add option to use jump label for mmu_has_feature() from Kevin Hao
    - Catch usage of cpu/mmu_has_feature() before jump label init from Aneesh Kumar K.V
    - Annotate jump label assembly from Michael Ellerman

    TLB flush enhancements from Aneesh Kumar K.V:
    - radix: Implement tlb mmu gather flush efficiently
    - Add helper for finding SLBE LLP encoding
    - Use hugetlb flush functions
    - Drop multiple definition of mm_is_core_local
    - radix: Add tlb flush of THP ptes
    - radix: Rename function and drop unused arg
    - radix/hugetlb: Add helper for finding page size
    - hugetlb: Add flush_hugetlb_tlb_range
    - remove flush_tlb_page_nohash

    Add new ptrace regsets from Anshuman Khandual and Simon Guo:
    - elf: Add powerpc specific core note sections
    - Add the function flush_tmregs_to_thread
    - Enable in transaction NT_PRFPREG ptrace requests
    - Enable in transaction NT_PPC_VMX ptrace requests
    - Enable in transaction NT_PPC_VSX ptrace requests
    - Adapt gpr32_get, gpr32_set functions for transaction
    - Enable support for NT_PPC_CGPR
    - Enable support for NT_PPC_CFPR
    - Enable support for NT_PPC_CVMX
    - Enable support for NT_PPC_CVSX
    - Enable support for TM SPR state
    - Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR
    - Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR
    - Enable support for EBB registers
    - Enable support for Performance Monitor registers"

    * tag 'powerpc-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (48 commits)
    powerpc/mm: Move register_process_table() out of ppc_md
    powerpc/perf: Fix incorrect event codes in power9-event-list
    powerpc/32: Fix early access to cpu_spec relocation
    powerpc/ptrace: Enable support for Performance Monitor registers
    powerpc/ptrace: Enable support for EBB registers
    powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR
    powerpc/ptrace: Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR
    powerpc/ptrace: Enable support for TM SPR state
    powerpc/ptrace: Enable support for NT_PPC_CVSX
    powerpc/ptrace: Enable support for NT_PPC_CVMX
    powerpc/ptrace: Enable support for NT_PPC_CFPR
    powerpc/ptrace: Enable support for NT_PPC_CGPR
    powerpc/ptrace: Adapt gpr32_get, gpr32_set functions for transaction
    powerpc/ptrace: Enable in transaction NT_PPC_VSX ptrace requests
    powerpc/ptrace: Enable in transaction NT_PPC_VMX ptrace requests
    powerpc/ptrace: Enable in transaction NT_PRFPREG ptrace requests
    powerpc/process: Add the function flush_tmregs_to_thread
    elf: Add powerpc specific core note sections
    powerpc/mm: remove flush_tlb_page_nohash
    powerpc/mm/hugetlb: Add flush_hugetlb_tlb_range
    ...

    Linus Torvalds
     

04 Aug, 2016

8 commits

  • Pull module updates from Rusty Russell:
    "The only interesting thing here is Jessica's patch to add
    ro_after_init support to modules. The rest are all trivia"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    extable.h: add stddef.h so "NULL" definition is not implicit
    modules: add ro_after_init support
    jump_label: disable preemption around __module_text_address().
    exceptions: fork exception table content from module.h into extable.h
    modules: Add kernel parameter to blacklist modules
    module: Do a WARN_ON_ONCE() for assert module mutex not held
    Documentation/module-signing.txt: Note need for version info if reusing a key
    module: Invalidate signatures on force-loaded modules
    module: Issue warnings when tainting kernel
    module: fix redundant test.
    module: fix noreturn attribute for __module_put_and_exit()

    Linus Torvalds
     
  • The current jump_label.h includes bug.h for things such as WARN_ON().
    This makes the header problematic for inclusion by kernel.h or any
    headers that kernel.h includes, since bug.h includes kernel.h (circular
    dependency). The inclusion of atomic.h is similarly problematic. Thus,
    this should make jump_label.h 'includable' from most places.

    Link: http://lkml.kernel.org/r/7060ce35ddd0d20b33bf170685e6b0fab816bdf2.1467837322.git.jbaron@akamai.com
    Signed-off-by: Jason Baron
    Cc: "David S. Miller"
    Cc: Arnd Bergmann
    Cc: Benjamin Herrenschmidt
    Cc: Chris Metcalf
    Cc: Heiko Carstens
    Cc: Joe Perches
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Baron
     
  • The use of config_enabled() against config options is ambiguous. In
    practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
    author might have used it for the meaning of IS_ENABLED(). Using
    IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc. makes the intention
    clearer.

    This commit replaces config_enabled() with IS_ENABLED() where possible.
    This commit is only touching bool config options.

    I noticed two cases where config_enabled() is used against a tristate
    option:

    - config_enabled(CONFIG_HWMON)
    [ drivers/net/wireless/ath/ath10k/thermal.c ]

    - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
    [ drivers/gpu/drm/gma500/opregion.c ]

    I did not touch them because they should be converted to IS_BUILTIN()
    in order to keep the logic, but I was not sure it was the authors'
    intention.

    Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Acked-by: Kees Cook
    Cc: Stas Sergeev
    Cc: Matt Redfearn
    Cc: Joshua Kinard
    Cc: Jiri Slaby
    Cc: Bjorn Helgaas
    Cc: Borislav Petkov
    Cc: Markos Chandras
    Cc: "Dmitry V. Levin"
    Cc: yu-cheng yu
    Cc: James Hogan
    Cc: Brian Gerst
    Cc: Johannes Berg
    Cc: Peter Zijlstra
    Cc: Al Viro
    Cc: Will Drewry
    Cc: Nikolay Martynov
    Cc: Huacai Chen
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Daniel Borkmann
    Cc: Leonid Yegoshin
    Cc: Rafal Milecki
    Cc: James Cowgill
    Cc: Greg Kroah-Hartman
    Cc: Ralf Baechle
    Cc: Alex Smith
    Cc: Adam Buchbinder
    Cc: Qais Yousef
    Cc: Jiang Liu
    Cc: Mikko Rapeli
    Cc: Paul Gortmaker
    Cc: Denys Vlasenko
    Cc: Brian Norris
    Cc: Hidehiro Kawai
    Cc: "Luis R. Rodriguez"
    Cc: Andy Lutomirski
    Cc: Ingo Molnar
    Cc: Dave Hansen
    Cc: "Kirill A. Shutemov"
    Cc: Roland McGrath
    Cc: Paul Burton
    Cc: Kalle Valo
    Cc: Viresh Kumar
    Cc: Tony Wu
    Cc: Huaitong Han
    Cc: Sumit Semwal
    Cc: Alexei Starovoitov
    Cc: Juergen Gross
    Cc: Jason Cooper
    Cc: "David S. Miller"
    Cc: Oleg Nesterov
    Cc: Andrea Gelmini
    Cc: David Woodhouse
    Cc: Marc Zyngier
    Cc: Rabin Vincent
    Cc: "Maciej W. Rozycki"
    Cc: David Daney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     
  • Add ro_after_init support for modules by adding a new page-aligned section
    in the module layout (after rodata) for ro_after_init data and enabling RO
    protection for that section after module init runs.

    Signed-off-by: Jessica Yu
    Acked-by: Kees Cook
    Signed-off-by: Rusty Russell

    Jessica Yu
     
  • Steven reported a warning caused by not holding module_mutex or
    rcu_read_lock_sched: his backtrace was corrupted but a quick audit
    found this possible cause. It's wrong anyway...

    Reported-by: Steven Rostedt
    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Blacklisting a module in linux has long been a problem. The current
    procedure is to use rd.blacklist=module_name, however, that doesn't
    cover the case after the initramfs and before a boot prompt (where one
    is supposed to use /etc/modprobe.d/blacklist.conf to blacklist
    runtime loading). Using rd.shell to get an early prompt is hit-or-miss,
    and doesn't cover all situations AFAICT.

    This patch adds this functionality of permanently blacklisting a module
    by its name via the kernel parameter module_blacklist=module_name.

    [v2]: Rusty, use core_param() instead of __setup() which simplifies
    things.

    [v3]: Rusty, undo wreckage from strsep()

    [v4]: Rusty, simpler version of blacklisted()

    Signed-off-by: Prarit Bhargava
    Cc: Jonathan Corbet
    Cc: Rusty Russell
    Cc: linux-doc@vger.kernel.org
    Signed-off-by: Rusty Russell

    Prarit Bhargava
     
  • When running with lockdep enabled, I triggered the WARN_ON() in the
    module code that asserts when module_mutex or rcu_read_lock_sched are
    not held. The issue I have is that this can also be called from the
    dump_stack() code, causing us to enter an infinite loop...

    ------------[ cut here ]------------
    WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
    Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6
    CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-test-00013-g501c2375253c #14
    Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
    ffff880215e8fa70 ffff880215e8fa70 ffffffff812fc8e3 0000000000000000
    ffffffff81d3e55b ffff880215e8fac0 ffffffff8104fc88 ffffffff8104fcab
    0000000915e88300 0000000000000046 ffffffffa019b29a 0000000000000001
    Call Trace:
    [] dump_stack+0x67/0x90
    [] __warn+0xcb/0xe9
    [] ? warn_slowpath_null+0x5/0x1f
    ------------[ cut here ]------------
    WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
    Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6
    CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-test-00013-g501c2375253c #14
    Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
    ffff880215e8f7a0 ffff880215e8f7a0 ffffffff812fc8e3 0000000000000000
    ffffffff81d3e55b ffff880215e8f7f0 ffffffff8104fc88 ffffffff8104fcab
    0000000915e88300 0000000000000046 ffffffffa019b29a 0000000000000001
    Call Trace:
    [] dump_stack+0x67/0x90
    [] __warn+0xcb/0xe9
    [] ? warn_slowpath_null+0x5/0x1f
    ------------[ cut here ]------------
    WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
    Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6
    CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-test-00013-g501c2375253c #14
    Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
    ffff880215e8f4d0 ffff880215e8f4d0 ffffffff812fc8e3 0000000000000000
    ffffffff81d3e55b ffff880215e8f520 ffffffff8104fc88 ffffffff8104fcab
    0000000915e88300 0000000000000046 ffffffffa019b29a 0000000000000001
    Call Trace:
    [] dump_stack+0x67/0x90
    [] __warn+0xcb/0xe9
    [] ? warn_slowpath_null+0x5/0x1f
    ------------[ cut here ]------------
    WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
    [...]

    Which gives us rather useless information. Worse yet, there's some race
    that causes this, and I seldom trigger it, so I have no idea what
    happened.

    This would not be an issue if that warning was a WARN_ON_ONCE().

    Signed-off-by: Steven Rostedt
    Signed-off-by: Rusty Russell

    Steven Rostedt
     
  • Pull tracing fixes from Steven Rostedt:
    "A few updates and fixes:

    - move the suppressing of the __builtin_return_address >0 warning to
    the tracing directory only.

    - metag recordmcount fix for newer glibc's

    - two tracing histogram fixes that were reported by KASAN"

    * tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Fix use-after-free in hist_register_trigger()
    tracing: Fix use-after-free in hist_unreg_all/hist_enable_unreg_all
    Makefile: Mute warning for __builtin_return_address(>0) for tracing only
    ftrace/recordmcount: Work around for addition of metag magic but not relocations

    Linus Torvalds
     

03 Aug, 2016

20 commits

  • Copy the config fragments from the AOSP common kernel android-4.4
    branch. It is becoming possible to run mainline kernels with Android,
    but the kernel defconfigs don't work as-is and debugging missing config
    options is a pain. Adding the config fragments into the kernel tree,
    makes configuring a mainline kernel as simple as:

    make ARCH=arm multi_v7_defconfig android-base.config android-recommended.config

    The following non-upstream config options were removed:

    CONFIG_NETFILTER_XT_MATCH_QTAGUID
    CONFIG_NETFILTER_XT_MATCH_QUOTA2
    CONFIG_NETFILTER_XT_MATCH_QUOTA2_LOG
    CONFIG_PPPOLAC
    CONFIG_PPPOPNS
    CONFIG_SECURITY_PERF_EVENTS_RESTRICT
    CONFIG_USB_CONFIGFS_F_MTP
    CONFIG_USB_CONFIGFS_F_PTP
    CONFIG_USB_CONFIGFS_F_ACC
    CONFIG_USB_CONFIGFS_F_AUDIO_SRC
    CONFIG_USB_CONFIGFS_UEVENT
    CONFIG_INPUT_KEYCHORD
    CONFIG_INPUT_KEYRESET

    Link: http://lkml.kernel.org/r/1466708235-28593-1-git-send-email-robh@kernel.org
    Signed-off-by: Rob Herring
    Cc: Amit Pundir
    Cc: John Stultz
    Cc: Dmitry Shmidt
    Cc: Rom Lemarchand
    Cc: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Herring
     
  • Commit 20d8b67c06fa ("relay: add buffer-only channels; useful for early
    logging") added support to use channels with no associated files.

    This is useful when the exact location of relay file is not known or the
    the parent directory of relay file is not available, while creating the
    channel and the logging has to start right from the boot.

    But there was no provision to use global mode with buffer-only channels,
    which is added by this patch, without modifying the interface where
    initially there will be a dummy invocation of create_buf_file callback
    through which kernel client can convey the need of a global buffer.

    For the use case where drivers/kernel clients want a simple interface
    for the userspace, which enables them to capture data/logs from relay
    file inorder & without any post processing, support of Global buffer
    mode is warranted.

    Modules, like i915, using relay_open() in early init would have to later
    register their buffer-only relays, once debugfs is available, by calling
    relay_late_setup_files(). Hence relay_late_setup_files() symbol also
    needs to be exported.

    Link: http://lkml.kernel.org/r/1468404563-11653-1-git-send-email-akash.goel@intel.com
    Signed-off-by: Akash Goel
    Cc: Eduard - Gabriel Munteanu
    Cc: Tom Zanussi
    Cc: Chris Wilson
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akash Goel
     
  • I hit the following issue when run trinity in my system. The kernel is
    3.4 version, but mainline has the same issue.

    The root cause is that the segment size is too large so the kerenl
    spends too long trying to allocate a page. Other cases will block until
    the test case quits. Also, OOM conditions will occur.

    Call Trace:
    __alloc_pages_nodemask+0x14c/0x8f0
    alloc_pages_current+0xaf/0x120
    kimage_alloc_pages+0x10/0x60
    kimage_alloc_control_pages+0x5d/0x270
    machine_kexec_prepare+0xe5/0x6c0
    ? kimage_free_page_list+0x52/0x70
    sys_kexec_load+0x141/0x600
    ? vfs_write+0x100/0x180
    system_call_fastpath+0x16/0x1b

    The patch changes sanity_check_segment_list() to verify that the usage by
    all segments does not exceed half of memory.

    [akpm@linux-foundation.org: fix for kexec-return-error-number-directly.patch, update comment]
    Link: http://lkml.kernel.org/r/1469625474-53904-1-git-send-email-zhongjiang@huawei.com
    Signed-off-by: zhong jiang
    Suggested-by: Eric W. Biederman
    Cc: Vivek Goyal
    Cc: Dave Young
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    zhong jiang
     
  • Provide a wrapper function to be used by kernel code to check whether a
    crash kernel is loaded. It returns the same value that can be seen in
    /sys/kernel/kexec_crash_loaded by userspace programs.

    I'm exporting the function, because it will be used by Xen, and it is
    possible to compile Xen modules separately to enable the use of PV
    drivers with unmodified bare-metal kernels.

    Link: http://lkml.kernel.org/r/20160713121955.14969.69080.stgit@hananiah.suse.cz
    Signed-off-by: Petr Tesarik
    Cc: Juergen Gross
    Cc: Josh Triplett
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Eric Biederman
    Cc: "H. Peter Anvin"
    Cc: Boris Ostrovsky
    Cc: "Paul E. McKenney"
    Cc: Dave Young
    Cc: David Vrabel
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Tesarik
     
  • crash_kexec_post_notifiers ia a boot option which controls whether the
    1st kernel calls panic notifiers or not before booting the 2nd kernel.
    However, there is no need to limit it to being modifiable only at boot
    time. So, use core_param instead of early_param.

    Link: http://lkml.kernel.org/r/20160705113327.5864.43139.stgit@softrs
    Signed-off-by: Hidehiro Kawai
    Cc: Dave Young
    Cc: Baoquan He
    Cc: Vivek Goyal
    Cc: Eric Biederman
    Cc: Masami Hiramatsu
    Cc: Borislav Petkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hidehiro Kawai
     
  • kexec physical addresses are the boot-time view of the system. For
    certain ARM systems (such as Keystone 2), the boot view of the system
    does not match the kernel's view of the system: the boot view uses a
    special alias in the lower 4GB of the physical address space.

    To cater for these kinds of setups, we need to translate between the
    boot view physical addresses and the normal kernel view physical
    addresses. This patch extracts the current transation points into
    linux/kexec.h, and allows an architecture to override the functions.

    Due to the translations required, we unfortunately end up with six
    translation functions, which are reduced down to four that the
    architecture can override.

    [akpm@linux-foundation.org: kexec.h needs asm/io.h for phys_to_virt()]
    Link: http://lkml.kernel.org/r/E1b8koP-0004HZ-Vf@rmk-PC.armlinux.org.uk
    Signed-off-by: Russell King
    Cc: Keerthy
    Cc: Pratyush Anand
    Cc: Vitaly Andrianov
    Cc: Eric Biederman
    Cc: Dave Young
    Cc: Baoquan He
    Cc: Vivek Goyal
    Cc: Simon Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Russell King
     
  • On PAE systems (eg, ARM LPAE) the vmcore note may be located above 4GB
    physical on 32-bit architectures, so we need a wider type than "unsigned
    long" here. Arrange for paddr_vmcoreinfo_note() to return a
    phys_addr_t, thereby allowing it to be located above 4GB.

    This makes no difference for kexec-tools, as they already assume a
    64-bit type when reading from this file.

    Link: http://lkml.kernel.org/r/E1b8koK-0004HS-K9@rmk-PC.armlinux.org.uk
    Signed-off-by: Russell King
    Reviewed-by: Pratyush Anand
    Acked-by: Baoquan He
    Cc: Keerthy
    Cc: Vitaly Andrianov
    Cc: Eric Biederman
    Cc: Dave Young
    Cc: Vivek Goyal
    Cc: Simon Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Russell King
     
  • Ensure that user memory sizes do not wrap around when validating the
    user input, which can lead to the following input validation working
    incorrectly.

    [akpm@linux-foundation.org: fix it for kexec-return-error-number-directly.patch]
    Link: http://lkml.kernel.org/r/E1b8koF-0004HM-5x@rmk-PC.armlinux.org.uk
    Signed-off-by: Russell King
    Reviewed-by: Pratyush Anand
    Acked-by: Baoquan He
    Cc: Keerthy
    Cc: Vitaly Andrianov
    Cc: Eric Biederman
    Cc: Dave Young
    Cc: Vivek Goyal
    Cc: Simon Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Russell King
     
  • This is a cleanup patch to make kexec more clear to return error number
    directly. The variable result is useless, because there is no other
    function's return value assignes to it. So remove it.

    Link: http://lkml.kernel.org/r/1464179273-57668-1-git-send-email-mnghuan@gmail.com
    Signed-off-by: Minfei Huang
    Cc: Dave Young
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Xunlei Pang
    Cc: Atsushi Kumagai
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minfei Huang
     
  • Many targets enable CONFIG_DEBUG_STACK_USAGE, and while the information
    is useful, it isn't worthy of pr_warn(). Reduce it to pr_info().

    Link: http://lkml.kernel.org/r/1466982072-29836-1-git-send-email-anton@ozlabs.org
    Signed-off-by: Anton Blanchard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Blanchard
     
  • Add a "printk.devkmsg" kernel command line parameter which controls how
    userspace writes into /dev/kmsg. It has three options:

    * ratelimit - ratelimit logging from userspace.
    * on - unlimited logging from userspace
    * off - logging from userspace gets ignored

    The default setting is to ratelimit the messages written to it.

    This changes the kernel default setting of "on" to "ratelimit" and we do
    that because we want to keep userspace spamming /dev/kmsg to sane
    levels. This is especially moot when a small kernel log buffer wraps
    around and messages get lost. So the ratelimiting setting should be a
    sane setting where kernel messages should have a bit higher chance of
    survival from all the spamming.

    It additionally does not limit logging to /dev/kmsg while the system is
    booting if we haven't disabled it on the command line.

    Furthermore, we can control the logging from a lower priority sysctl
    interface - kernel.printk_devkmsg.

    That interface will succeed only if printk.devkmsg *hasn't* been
    supplied on the command line. If it has, then printk.devkmsg is a
    one-time setting which remains for the duration of the system lifetime.
    This "locking" of the setting is to prevent userspace from changing the
    logging on us through sysctl(2).

    This patch is based on previous patches from Linus and Steven.

    [bp@suse.de: fixes]
    Link: http://lkml.kernel.org/r/20160719072344.GC25563@nazgul.tnic
    Link: http://lkml.kernel.org/r/20160716061745.15795-3-bp@alien8.de
    Signed-off-by: Borislav Petkov
    Cc: Dave Young
    Cc: Franck Bui
    Cc: Greg Kroah-Hartman
    Cc: Ingo Molnar
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Uwe Kleine-König
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • asm-generic headers are generic implementations for architecture
    specific code and should not be included by common code. Thus use the
    asm/ version of sections.h to get at the linker sections.

    Link: http://lkml.kernel.org/r/1468285008-7331-1-git-send-email-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Messages' levels and console log level are inspected when the actual
    printing occurs, which may provoke console_unlock() and
    console_cont_flush() to waste CPU cycles on every message that has
    loglevel above the current console_loglevel.

    Schematically, console_unlock() does the following:

    console_unlock()
    {
    ...
    for (;;) {
    ...
    raw_spin_lock_irqsave(&logbuf_lock, flags);
    skip:
    msg = log_from_idx(console_idx);

    if (msg->flags & LOG_NOCONS) {
    ...
    goto skip;
    }

    level = msg->level;
    len += msg_print_text(); >> sprintfs
    memcpy,
    etc.

    if (nr_ext_console_drivers) {
    ext_len = msg_print_ext_header(); >> scnprintf
    ext_len += msg_print_ext_body(); >> scnprintfs
    etc.
    }
    ...
    raw_spin_unlock(&logbuf_lock);

    call_console_drivers(level, ext_text, ext_len, text, len)
    {
    if (level >= console_loglevel && >> drop the message
    !ignore_loglevel)
    return;

    console->write(...);
    }

    local_irq_restore(flags);
    }
    ...
    }

    The thing here is this deferred `level >= console_loglevel' check. We
    are wasting CPU cycles on sprintfs/memcpy/etc. preparing the messages
    that we will eventually drop.

    This can be huge when we register a new CON_PRINTBUFFER console, for
    instance. For every such a console register_console() resets the

    console_seq, console_idx, console_prev

    and sets a `exclusive console' pointer to replay the log buffer to that
    just-registered console. And there can be a lot of messages to replay,
    in the worst case most of which can be dropped after console_loglevel
    test.

    We know messages' levels long before we call msg_print_text() and
    friends, so we can just move console_loglevel check out of
    call_console_drivers() and format a new message only if we are sure that
    it won't be dropped.

    The patch factors out loglevel check into suppress_message_printing()
    function and tests message->level and console_loglevel before formatting
    functions in console_unlock() and console_cont_flush() are getting
    executed. This improves things not only for exclusive CON_PRINTBUFFER
    consoles, but for every console_unlock() that attempts to print a
    message of level above the console_loglevel.

    Link: http://lkml.kernel.org/r/20160627135012.8229-1-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Reviewed-by: Petr Mladek
    Cc: Tejun Heo
    Cc: Jan Kara
    Cc: Calvin Owens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • Using functions instead of macros can reduce overall code size by
    eliminating unnecessary "KERN_SOH" prefixes from format strings.

    defconfig x86-64:

    $ size vmlinux*
    text data bss dec hex filename
    10193570 4331464 1105920 15630954 ee826a vmlinux.new
    10192623 4335560 1105920 15634103 ee8eb7 vmlinux.old

    As the return value are unimportant and unused in the kernel tree, these
    new functions return void.

    Miscellanea:

    - change pr_ macros to call new __pr_ functions
    - change vprintk_nmi and vprintk_default to add LOGLEVEL_ argument

    [akpm@linux-foundation.org: fix LOGLEVEL_INFO, per Joe]
    Link: http://lkml.kernel.org/r/e16cc34479dfefcae37c98b481e6646f0f69efc3.1466718827.git.joe@perches.com
    Signed-off-by: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • A trivial cosmetic change: interrupt.h header is redundant since commit
    6b898c07cb1d ("console: use might_sleep in console_lock").

    Link: http://lkml.kernel.org/r/20160620132847.21930-1-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • kernel.h header doesn't directly use dynamic debug, instead we can
    include it in module.c (which used it via kernel.h). printk.h only uses
    it if CONFIG_DYNAMIC_DEBUG is on, changing the inclusion to only happen
    in that case.

    Link: http://lkml.kernel.org/r/1468429793-16917-1-git-send-email-luisbg@osg.samsung.com
    [luisbg@osg.samsung.com: include dynamic_debug.h in drb_int.h]
    Link: http://lkml.kernel.org/r/1468447828-18558-2-git-send-email-luisbg@osg.samsung.com
    Signed-off-by: Luis de Bethencourt
    Cc: Rusty Russell
    Cc: Hidehiro Kawai
    Cc: Borislav Petkov
    Cc: Michal Nazarewicz
    Cc: Rasmus Villemoes
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luis de Bethencourt
     
  • Change task_work_cancel() to use lockless_dereference(), this is what
    the code really wants but we didn't have this helper when it was
    written.

    Also add the fast-path task->task_works == NULL check, in the likely
    case this task has no pending works and we can avoid
    spin_lock(task->pi_lock).

    While at it, change other users of ACCESS_ONCE() to use READ_ONCE().

    Link: http://lkml.kernel.org/r/20160610150042.GA13868@redhat.com
    Signed-off-by: Oleg Nesterov
    Cc: Andrea Parri
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • This fixes a use-after-free case flagged by KASAN; make sure the test
    happens before the potential free in this case.

    Link: http://lkml.kernel.org/r/48fd74ab61bebd7dca9714386bb47d7c5ccd6a7b.1467247517.git.tom.zanussi@linux.intel.com

    Signed-off-by: Tom Zanussi
    Signed-off-by: Steven Rostedt

    Tom Zanussi
     
  • While running tools/testing/selftests test suite with KASAN, Dmitry
    Vyukov hit the following use-after-free report:

    ==================================================================
    BUG: KASAN: use-after-free in hist_unreg_all+0x1a1/0x1d0 at addr
    ffff880031632cc0
    Read of size 8 by task ftracetest/7413
    ==================================================================
    BUG kmalloc-128 (Not tainted): kasan: bad access detected
    ------------------------------------------------------------------

    This fixes the problem, along with the same problem in
    hist_enable_unreg_all().

    Link: http://lkml.kernel.org/r/c3d05b79e42555b6e36a3a99aae0e37315ee5304.1467247517.git.tom.zanussi@linux.intel.com

    Cc: Dmitry Vyukov
    [Copied Steve's hist_enable_unreg_all() fix to hist_unreg_all()]
    Signed-off-by: Tom Zanussi
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • With the latest gcc compilers, they give a warning if
    __builtin_return_address() parameter is greater than 0. That is because if
    it is used by a function called by a top level function (or in the case of
    the kernel, by assembly), it can try to access stack frames outside the
    stack and crash the system.

    The tracing system uses __builtin_return_address() of up to 2! But it is
    well aware of the dangers that it may have, and has even added precautions
    to protect against it (see the thunk code in arch/x86/entry/thunk*.S)

    Linus originally added KBUILD_CFLAGS that would suppress the warning for the
    entire kernel, as simply adding KBUILD_CFLAGS to the tracing directory
    wouldn't work. The tracing directory plays a bit with the CFLAGS and
    requires a little more logic.

    This adds that special logic to only suppress the warning for the tracing
    directory. If it is used anywhere else outside of tracing, the warning will
    still be triggered.

    Link: http://lkml.kernel.org/r/20160728223043.51996267@grimm.local.home

    Tested-by: Linus Torvalds
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

02 Aug, 2016

1 commit

  • When the perf interrupt handler exceeds a threshold warning messages
    are displayed on console:

    [12739.31793] perf interrupt took too long (2504 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
    [71340.165065] perf interrupt took too long (5005 > 5000), lowering kernel.perf_event_max_sample_rate to 25000

    Many customers and users are confused by the message wondering if
    something is wrong or they need to take action to fix a problem.
    Since a user can not do anything to fix the issue, the message is really
    more informational than a warning. Adjust the log level accordingly.

    Signed-off-by: David Ahern
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1470084569-438-1-git-send-email-dsa@cumulusnetworks.com
    Signed-off-by: Ingo Molnar

    David Ahern
     

01 Aug, 2016

1 commit


31 Jul, 2016

1 commit

  • Pull misc fixes from Thomas Gleixner:
    "This update contains:

    - a fix for stomp-machine so the nmi_watchdog wont trigger on the cpu
    waiting for the others to execute the callback

    - various fixes and updates to objtool including an resync of the
    instruction decoder to match the kernel's decoder"

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    objtool: Un-capitalize "Warning" for out-of-sync instruction decoder
    objtool: Resync x86 instruction decoder with the kernel's
    objtool: Support new GCC 6 switch jump table pattern
    stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE
    objtool: Add 'fixdep' to objtool/.gitignore

    Linus Torvalds
     

30 Jul, 2016

1 commit

  • Pull audit updates from Paul Moore:
    "Six audit patches for 4.8.

    There are a couple of style and minor whitespace tweaks for the logs,
    as well as a minor fixup to catch errors on user filter rules, however
    the major improvements are a fix to the s390 syscall argument masking
    code (reviewed by the nice s390 folks), some consolidation around the
    exclude filtering (less code, always a win), and a double-fetch fix
    for recording the execve arguments"

    * 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit:
    audit: fix a double fetch in audit_log_single_execve_arg()
    audit: fix whitespace in CWD record
    audit: add fields to exclude filter by reusing user filter
    s390: ensure that syscall arguments are properly masked on s390
    audit: fix some horrible switch statement style crimes
    audit: fixup: log on errors from filter user rules

    Linus Torvalds