30 Nov, 2016

1 commit

  • commit 1c3c90930392 broke PAE40. Macro pfn_pte(pfn, prot) creates paddr
    from pfn, but the page shift was getting truncated to 32 bits since we lost
    the proper cast to 64 bits (for PAE400

    Instead of reverting that commit, use a better helper which is 32/64 bits
    safe just like ARM implementation.

    Fixes: 1c3c90930392 ("ARC: mm: fix build breakage with STRICT_MM_TYPECHECKS")
    Cc: #4.4+
    Signed-off-by: Yuriy Kolerov
    [vgupta: massaged changelog]
    Signed-off-by: Vineet Gupta

    Yuriy Kolerov
     

29 Nov, 2016

2 commits


28 Nov, 2016

1 commit


27 Nov, 2016

3 commits

  • Pull ARM fix from Russell King:
    "This resolves the ksyms issues by reverting the commit which
    introduced the breakage"

    There was what I consider to be a better fix, but it's late in the rc
    game, so I'll take the revert.

    * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
    Revert "arm: move exports to definitions"

    Linus Torvalds
     
  • Pull KVM fixes from Radim Krčmář:
    "Four fixes for bugs found by syzkaller on x86, all for stable"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: x86: check for pic and ioapic presence before use
    KVM: x86: fix out-of-bounds accesses of rtc_eoi map
    KVM: x86: drop error recovery in em_jmp_far and em_ret_far
    KVM: x86: fix out-of-bounds access in lapic

    Linus Torvalds
     
  • Pull powerpc fixes from Michael Ellerman:
    "Fixes marked for stable:
    - Set missing wakeup bit in LPCR on POWER9
    - Fix the early OPAL console wrappers
    - Fixup kernel read only mapping

    Fixes for code merged this cycle:
    - Fix missing CRCs, add more asm-prototypes.h declarations"

    * tag 'powerpc-4.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    powerpc/mm: Fixup kernel read only mapping
    powerpc/boot: Fix the early OPAL console wrappers
    powerpc: Fix missing CRCs, add more asm-prototypes.h declarations
    powerpc: Set missing wakeup bit in LPCR on POWER9

    Linus Torvalds
     

26 Nov, 2016

1 commit

  • Pull parisc fixes from Helge Deller:
    "On parisc we were still seeing occasional random segmentation faults
    and memory corruption on SMP machines. Dave Anglin then looked again
    at the TLB related code and found two issues in the PCI DMA and
    generic TLB flush functions.

    Then, in our startup code we had some timing of the cache and TLB
    functions to calculate a threshold when to use a complete TLB/cache
    flush or just to flush a specific range. This code produced a race
    with newly started CPUs and thus lead to occasional kernel crashes
    (due to stale TLB/cache entries). The patch by Dave fixes this issue
    by flushing the local caches before starting secondary CPUs and by
    removing the race.

    The last problem fixed by this series is that we quite often suffered
    from hung tasks and self-detected stalls on the CPUs. It was somehow
    clear that this was related to the (in v4.7) newly introduced cr16
    clocksource and the own implementation of sched_clock(). I replaced
    the open-coded sched_clock() function and switched to the generic
    sched_clock() implementation which seems to have fixed this isse as
    well.

    All patches have been sucessfully tested on a variety of machines,
    including our debian buildd servers.

    All patches (beside the small pr_cont fix) are tagged for stable
    releases"

    * 'parisc-4.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    parisc: Also flush data TLB in flush_icache_page_asm
    parisc: Fix race in pci-dma.c
    parisc: Switch to generic sched_clock implementation
    parisc: Fix races in parisc_setup_cache_timing()
    parisc: Fix printk continuations in system detection

    Linus Torvalds
     

25 Nov, 2016

10 commits

  • This is the second issue I noticed in reviewing the parisc TLB code.

    The fic instruction may use either the instruction or data TLB in
    flushing the instruction cache. Thus, on machines with a split TLB, we
    should also flush the data TLB after setting up the temporary alias
    registers.

    Although this has no functional impact, I changed the pdtlb and pitlb
    instructions to consistently use the index register %r0. These
    instructions do not support integer displacements.

    Tested on rp3440 and c8000.

    Signed-off-by: John David Anglin
    Cc: # v3.16+
    Signed-off-by: Helge Deller

    John David Anglin
     
  • We are still troubled by occasional random segmentation faults and
    memory memory corruption on SMP machines. The causes quite a few
    package builds to fail on the Debian buildd machines for parisc. When
    gcc-6 failed to build three times in a row, I looked again at the TLB
    related code. I found a couple of issues. This is the first.

    In general, we need to ensure page table updates and corresponding TLB
    purges are atomic. The attached patch fixes an instance in pci-dma.c
    where the page table update was not guarded by the TLB lock.

    Tested on rp3440 and c8000. So far, no further random segmentation
    faults have been observed.

    Signed-off-by: John David Anglin
    Cc: # v3.16+
    Signed-off-by: Helge Deller

    John David Anglin
     
  • Drop the open-coded sched_clock() function and replace it by the provided
    GENERIC_SCHED_CLOCK implementation. We have seen quite some hung tasks in the
    past, which seem to be fixed by this patch.

    Signed-off-by: Helge Deller
    Cc: # v4.7+
    Signed-off-by: Helge Deller

    Helge Deller
     
  • Helge reported to me the following startup crash:

    [ 0.000000] Linux version 4.8.0-1-parisc64-smp (debian-kernel@lists.debian.org) (gcc version 5.4.1 20161019 (GCC) ) #1 SMP Debian 4.8.7-1 (2016-11-13)
    [ 0.000000] The 64-bit Kernel has started...
    [ 0.000000] Kernel default page size is 4 KB. Huge pages enabled with 1 MB physical and 2 MB virtual size.
    [ 0.000000] Determining PDC firmware type: System Map.
    [ 0.000000] model 9000/785/J5000
    [ 0.000000] Total Memory: 2048 MB
    [ 0.000000] Memory: 2018528K/2097152K available (9272K kernel code, 3053K rwdata, 1319K rodata, 1024K init, 840K bss, 78624K reserved, 0K cma-reserved)
    [ 0.000000] virtual kernel memory layout:
    [ 0.000000] vmalloc : 0x0000000000008000 - 0x000000003f000000 (1007 MB)
    [ 0.000000] memory : 0x0000000040000000 - 0x00000000c0000000 (2048 MB)
    [ 0.000000] .init : 0x0000000040100000 - 0x0000000040200000 (1024 kB)
    [ 0.000000] .data : 0x0000000040b0e000 - 0x0000000040f533e0 (4372 kB)
    [ 0.000000] .text : 0x0000000040200000 - 0x0000000040b0e000 (9272 kB)
    [ 0.768910] Brought up 1 CPUs
    [ 0.992465] NET: Registered protocol family 16
    [ 2.429981] Releasing cpu 1 now, hpa=fffffffffffa2000
    [ 2.635751] CPU(s): 2 out of 2 PA8500 (PCX-W) at 440.000000 MHz online
    [ 2.726692] Setting cache flush threshold to 1024 kB
    [ 2.729932] Not-handled unaligned insn 0x43ffff80
    [ 2.798114] Setting TLB flush threshold to 140 kB
    [ 2.928039] Unaligned handler failed, ret = -1
    [ 3.000419] _______________________________
    [ 3.000419] < Your System ate a SPARC! Gah! >
    [ 3.000419] -------------------------------
    [ 3.000419] \ ^__^
    [ 3.000419] (__)\ )\/\
    [ 3.000419] U ||----w |
    [ 3.000419] || ||
    [ 9.340055] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-1-parisc64-smp #1 Debian 4.8.7-1
    [ 9.448082] task: 00000000bfd48060 task.stack: 00000000bfd50000
    [ 9.528040]
    [ 10.760029] IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004025d154 000000004025d158
    [ 10.868052] IIR: 43ffff80 ISR: 0000000000340000 IOR: 000001ff54150960
    [ 10.960029] CPU: 1 CR30: 00000000bfd50000 CR31: 0000000011111111
    [ 11.052057] ORIG_R28: 000000004021e3b4
    [ 11.100045] IAOQ[0]: irq_exit+0x94/0x120
    [ 11.152062] IAOQ[1]: irq_exit+0x98/0x120
    [ 11.208031] RP(r2): irq_exit+0xb8/0x120
    [ 11.256074] Backtrace:
    [ 11.288067] [] cpu_startup_entry+0x1e4/0x598
    [ 11.368058] [] smp_callin+0x2c0/0x2f0
    [ 11.436308] [] update_curr+0x18c/0x2d0
    [ 11.508055] [] dequeue_entity+0x2c0/0x1030
    [ 11.584040] [] set_next_entity+0x80/0xd30
    [ 11.660069] [] pick_next_task_fair+0x614/0x720
    [ 11.740085] [] __schedule+0x394/0xa60
    [ 11.808054] [] schedule+0x88/0x118
    [ 11.876039] [] rescuer_thread+0x4d4/0x5b0
    [ 11.948090] [] kthread+0x1ec/0x248
    [ 12.016053] [] end_fault_vector+0x20/0xc0
    [ 12.092239] [] _switch_to_ret+0x0/0xf40
    [ 12.164044]
    [ 12.184036] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-1-parisc64-smp #1 Debian 4.8.7-1
    [ 12.244040] Backtrace:
    [ 12.244040] [] show_stack+0x68/0x80
    [ 12.244040] [] dump_stack+0xec/0x168
    [ 12.244040] [] die_if_kernel+0x25c/0x430
    [ 12.244040] [] handle_unaligned+0xb48/0xb50
    [ 12.244040]
    [ 12.632066] ---[ end trace 9ca05a7215c7bbb2 ]---
    [ 12.692036] Kernel panic - not syncing: Attempted to kill the idle task!

    We have the insn 0x43ffff80 in IIR but from IAOQ we should have:
    4025d150: 0f f3 20 df ldd,s r19(r31),r31
    4025d154: 0f 9f 00 9c ldw r31(ret0),ret0
    4025d158: bf 80 20 58 cmpb,*<> r0,ret0,4025d18c

    Cpu0 has just completed running parisc_setup_cache_timing:

    [ 2.429981] Releasing cpu 1 now, hpa=fffffffffffa2000
    [ 2.635751] CPU(s): 2 out of 2 PA8500 (PCX-W) at 440.000000 MHz online
    [ 2.726692] Setting cache flush threshold to 1024 kB
    [ 2.729932] Not-handled unaligned insn 0x43ffff80
    [ 2.798114] Setting TLB flush threshold to 140 kB
    [ 2.928039] Unaligned handler failed, ret = -1

    From the backtrace, cpu1 is in smp_callin:

    void __init smp_callin(void)
    {
    int slave_id = cpu_now_booting;

    smp_cpu_init(slave_id);
    preempt_disable();

    flush_cache_all_local(); /* start with known state */
    flush_tlb_all_local(NULL);

    local_irq_enable(); /* Interrupts have been off until now */

    cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);

    So, it has just flushed its caches and the TLB. It would seem either the
    flushes in parisc_setup_cache_timing or smp_callin have corrupted kernel
    memory.

    The attached patch reworks parisc_setup_cache_timing to remove the races
    in setting the cache and TLB flush thresholds. It also corrects the
    number of bytes flushed in the TLB calculation.

    The patch flushes the cache and TLB on cpu0 before starting the
    secondary processors so that they are started from a known state.

    Tested with a few reboots on c8000.

    Signed-off-by: John David Anglin
    Cc: # v3.18+
    Signed-off-by: Helge Deller

    John David Anglin
     
  • Since commit 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing
    continuation lines") the output from __do_page_fault on MIPS has been
    pretty unreadable due to the lack of KERN_CONT markers. Use pr_cont
    to provide the appropriate markers & restore the expected output.

    Signed-off-by: Matt Redfearn
    Cc: Paul Gortmaker
    Cc: Kirill A. Shutemov
    Cc: Andrew Morton
    Cc: linux-mips@linux-mips.org
    Cc: linux-kernel@vger.kernel.org
    Patchwork: https://patchwork.linux-mips.org/patch/14544/
    Signed-off-by: Ralf Baechle

    Matt Redfearn
     
  • With commit e58e87adc8bf9 ("powerpc/mm: Update _PAGE_KERNEL_RO") we
    started using the ppp value 0b110 to map kernel readonly. But that
    facility was only added as part of ISA 2.04. For earlier ISA version
    only supported ppp bit value for readonly mapping is 0b011. (This
    implies both user and kernel get mapped using the same ppp bit value for
    readonly mapping.).
    Update the code such that for earlier architecture version we use ppp
    value 0b011 for readonly mapping. We don't differentiate between power5+
    and power5 here and apply the new ppp bits only from power6 (ISA 2.05).
    This keep the changes minimal.

    This fixes issue with PS3 spu usage reported at
    https://lkml.kernel.org/r/rep.1421449714.geoff@infradead.org

    Fixes: e58e87adc8bf9 ("powerpc/mm: Update _PAGE_KERNEL_RO")
    Cc: stable@vger.kernel.org # v4.7+
    Tested-by: Geoff Levand
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Michael Ellerman

    Aneesh Kumar K.V
     
  • Split irqchip allows pic and ioapic routes to be used without them being
    created, which results in NULL access. Check for NULL and avoid it.
    (The setup is too racy for a nicer solutions.)

    Found by syzkaller:

    general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    CPU: 3 PID: 11923 Comm: kworker/3:2 Not tainted 4.9.0-rc5+ #27
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Workqueue: events irqfd_inject
    task: ffff88006a06c7c0 task.stack: ffff880068638000
    RIP: 0010:[...] [...] __lock_acquire+0xb35/0x3380 kernel/locking/lockdep.c:3221
    RSP: 0000:ffff88006863ea20 EFLAGS: 00010006
    RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: 0000000000000000
    RDX: 0000000000000039 RSI: 0000000000000000 RDI: 1ffff1000d0c7d9e
    RBP: ffff88006863ef58 R08: 0000000000000001 R09: 0000000000000000
    R10: 00000000000001c8 R11: 0000000000000000 R12: ffff88006a06c7c0
    R13: 0000000000000001 R14: ffffffff8baab1a0 R15: 0000000000000001
    FS: 0000000000000000(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000004abdd0 CR3: 000000003e2f2000 CR4: 00000000000026e0
    Stack:
    ffffffff894d0098 1ffff1000d0c7d56 ffff88006863ecd0 dffffc0000000000
    ffff88006a06c7c0 0000000000000000 ffff88006863ecf8 0000000000000082
    0000000000000000 ffffffff815dd7c1 ffffffff00000000 ffffffff00000000
    Call Trace:
    [...] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3746
    [...] __raw_spin_lock include/linux/spinlock_api_smp.h:144
    [...] _raw_spin_lock+0x38/0x50 kernel/locking/spinlock.c:151
    [...] spin_lock include/linux/spinlock.h:302
    [...] kvm_ioapic_set_irq+0x4c/0x100 arch/x86/kvm/ioapic.c:379
    [...] kvm_set_ioapic_irq+0x8f/0xc0 arch/x86/kvm/irq_comm.c:52
    [...] kvm_set_irq+0x239/0x640 arch/x86/kvm/../../../virt/kvm/irqchip.c:101
    [...] irqfd_inject+0xb4/0x150 arch/x86/kvm/../../../virt/kvm/eventfd.c:60
    [...] process_one_work+0xb40/0x1ba0 kernel/workqueue.c:2096
    [...] worker_thread+0x214/0x18a0 kernel/workqueue.c:2230
    [...] kthread+0x328/0x3e0 kernel/kthread.c:209
    [...] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433

    Reported-by: Dmitry Vyukov
    Cc: stable@vger.kernel.org
    Fixes: 49df6397edfc ("KVM: x86: Split the APIC from the rest of IRQCHIP.")
    Signed-off-by: Radim Krčmář

    Radim Krčmář
     
  • KVM was using arrays of size KVM_MAX_VCPUS with vcpu_id, but ID can be
    bigger that the maximal number of VCPUs, resulting in out-of-bounds
    access.

    Found by syzkaller:

    BUG: KASAN: slab-out-of-bounds in __apic_accept_irq+0xb33/0xb50 at addr [...]
    Write of size 1 by task a.out/27101
    CPU: 1 PID: 27101 Comm: a.out Not tainted 4.9.0-rc5+ #49
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    [...]
    Call Trace:
    [...] __apic_accept_irq+0xb33/0xb50 arch/x86/kvm/lapic.c:905
    [...] kvm_apic_set_irq+0x10e/0x180 arch/x86/kvm/lapic.c:495
    [...] kvm_irq_delivery_to_apic+0x732/0xc10 arch/x86/kvm/irq_comm.c:86
    [...] ioapic_service+0x41d/0x760 arch/x86/kvm/ioapic.c:360
    [...] ioapic_set_irq+0x275/0x6c0 arch/x86/kvm/ioapic.c:222
    [...] kvm_ioapic_inject_all arch/x86/kvm/ioapic.c:235
    [...] kvm_set_ioapic+0x223/0x310 arch/x86/kvm/ioapic.c:670
    [...] kvm_vm_ioctl_set_irqchip arch/x86/kvm/x86.c:3668
    [...] kvm_arch_vm_ioctl+0x1a08/0x23c0 arch/x86/kvm/x86.c:3999
    [...] kvm_vm_ioctl+0x1fa/0x1a70 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3099

    Reported-by: Dmitry Vyukov
    Cc: stable@vger.kernel.org
    Fixes: af1bae5497b9 ("KVM: x86: bump KVM_MAX_VCPU_ID to 1023")
    Reviewed-by: Paolo Bonzini
    Reviewed-by: David Hildenbrand
    Signed-off-by: Radim Krčmář

    Radim Krčmář
     
  • em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
    bit mode, but syzkaller proved otherwise (and SDM agrees).
    Code segment was restored upon failure, but it was left uninitialized
    outside of long mode, which could lead to a leak of host kernel stack.
    We could have fixed that by always saving and restoring the CS, but we
    take a simpler approach and just break any guest that manages to fail
    as the error recovery is error-prone and modern CPUs don't need emulator
    for this.

    Found by syzkaller:

    WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
    Kernel panic - not syncing: panic_on_warn set ...

    CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    [...]
    Call Trace:
    [...] __dump_stack lib/dump_stack.c:15
    [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
    [...] panic+0x1b7/0x3a3 kernel/panic.c:179
    [...] __warn+0x1c4/0x1e0 kernel/panic.c:542
    [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
    [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
    [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
    [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
    [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
    [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
    [...] complete_emulated_io arch/x86/kvm/x86.c:6870
    [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
    [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
    [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
    [...] vfs_ioctl fs/ioctl.c:43
    [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
    [...] SYSC_ioctl fs/ioctl.c:694
    [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
    [...] entry_SYSCALL_64_fastpath+0x1f/0xc2

    Reported-by: Dmitry Vyukov
    Cc: stable@vger.kernel.org
    Fixes: d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far jumps")
    Signed-off-by: Radim Krčmář

    Radim Krčmář
     
  • Cluster xAPIC delivery incorrectly assumed that dest_id
    Cc: stable@vger.kernel.org
    Fixes: e45115b62f9a ("KVM: x86: use physical LAPIC array for logical x2APIC")
    Reviewed-by: Paolo Bonzini
    Signed-off-by: Radim Krčmář

    Radim Krčmář
     

24 Nov, 2016

4 commits

  • Since MIPSr6 the Wired register is split into 2 fields, with the upper
    16 bits of the register indicating a limit on the value that the wired
    entry count in the bottom 16 bits of the register can take. This means
    that simply reading the wired register doesn't get us a valid TLB entry
    index any longer, and we instead need to retrieve only the lower 16 bits
    of the register. Introduce a new num_wired_entries() function which does
    this on MIPSr6 or higher and simply returns the value of the wired
    register on older architecture revisions, and make use of it when
    reading the number of wired entries.

    Since commit e710d6668309 ("MIPS: tlb-r4k: If there are wired entries,
    don't use TLBINVF") we have been using a non-zero number of wired
    entries to determine whether we should avoid use of the tlbinvf
    instruction (which would invalidate wired entries) and instead loop over
    TLB entries in local_flush_tlb_all(). This loop begins with the number
    of wired entries, or before this patch some large bogus TLB index on
    MIPSr6 systems. Thus since the aforementioned commit some MIPSr6 systems
    with FTLBs have been prone to leaving stale address translations in the
    FTLB & crashing in various weird & wonderful ways when we later observe
    the wrong memory.

    Signed-off-by: Paul Burton
    Cc: Matt Redfearn
    Cc: linux-mips@linux-mips.org
    Patchwork: https://patchwork.linux-mips.org/patch/14557/
    Signed-off-by: Ralf Baechle

    Paul Burton
     
  • When configured with CONFIG_PPC_EARLY_DEBUG_OPAL=y the kernel expects
    the OPAL entry and base addresses to be passed in r8 and r9
    respectively. Currently the wrapper does not attempt to restore these
    values before entering the decompressed kernel which causes the kernel
    to branch into whatever happens to be in r9 when doing a write to the
    OPAL console in early boot.

    This patch adds a platform_ops hook that can be used to branch into the
    new kernel. The OPAL console driver patches this at runtime so that if
    the console is used it will be restored just prior to entering the
    kernel.

    Fixes: 656ad58ef19e ("powerpc/boot: Add OPAL console to epapr wrappers")
    Cc: stable@vger.kernel.org # v4.8+
    Signed-off-by: Oliver O'Halloran
    Signed-off-by: Michael Ellerman

    Oliver O'Halloran
     
  • For large values of "mult" and long uptimes, the intermediate
    result of "cycles * mult" can overflow 64 bits. For example,
    the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock;
    we have mult = 853, and after 208.5 days, we overflow 64 bits.

    Since clocksource_cyc2ns() is intended to be used for relative
    cycle counts, not absolute cycle counts, performance is more
    importance than accepting a wider range of cycle values. So,
    just use mult_frac() directly in tile's sched_clock().

    Commit 4cecf6d401a0 ("sched, x86: Avoid unnecessary overflow
    in sched_clock") by Salman Qazi results in essentially the same
    generated code for x86 as this change does for tile. In fact,
    a follow-on change by Salman introduced mult_frac() and switched
    to using it, so the C code was largely identical at that point too.

    Peter Zijlstra then added mul_u64_u32_shr() and switched x86
    to use it. This is, in principle, better; by optimizing the
    64x64->64 multiplies to be 32x32->64 multiplies we can potentially
    save some time. However, the compiler piplines the 64x64->64
    multiplies pretty well, and the conditional branch in the generic
    mul_u64_u32_shr() causes some bubbles in execution, with the
    result that it's pretty much a wash. If tilegx provided its own
    implementation of mul_u64_u32_shr() without the conditional branch,
    we could potentially save 3 cycles, but that seems like small gain
    for a fair amount of additional build scaffolding; no other platform
    currently provides a mul_u64_u32_shr() override, and tile doesn't
    currently have an header to put the override in.

    Additionally, gcc currently has an optimization bug that prevents
    it from recognizing the opportunity to use a 32x32->64 multiply,
    and so the result would be no better than the existing mult_frac()
    until such time as the compiler is fixed.

    For now, just using mult_frac() seems like the right answer.

    Cc: stable@kernel.org [v3.4+]
    Signed-off-by: Chris Metcalf

    Chris Metcalf
     
  • Pull perf fixes from Ingo Molnar:
    "Six fixes for bugs that were found via fuzzing, and a trivial
    hw-enablement patch for AMD Family-17h CPU PMUs"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/x86/intel/uncore: Allow only a single PMU/box within an events group
    perf/x86/intel: Cure bogus unwind from PEBS entries
    perf/x86: Restore TASK_SIZE check on frame pointer
    perf/core: Fix address filter parser
    perf/x86: Add perf support for AMD family-17h processors
    perf/x86/uncore: Fix crash by removing bogus event_list[] handling for SNB client uncore IMC
    perf/core: Do not set cpuctx->cgrp for unscheduled cgroups

    Linus Torvalds
     

23 Nov, 2016

3 commits

  • This reverts commit 4dd1837d7589f468ed109556513f476e7a7f9121.

    Moving the exports for assembly code into the assembly files breaks
    KSYM trimming, but also breaks modversions.

    While fixing the KSYM trimming is trivial, fixing modversions brings
    us to a technically worse position that we had prior to the above
    change:

    - We end up with the prototype definitions divorsed from everything
    else, which means that adding or removing assembly level ksyms
    become more fragile:
    * if adding a new assembly ksyms export, a missed prototype in
    asm-prototypes.h results in a successful build if no module in
    the selected configuration makes use of the symbol.
    * when removing a ksyms export, asm-prototypes.h will get forgotten,
    with armksyms.c, you'll get a build error if you forget to touch
    the file.

    - We end up with the same amount of include files and prototypes,
    they're just in a header file instead of a .c file with their
    exports.

    As for lines of code, we don't get much of a size reduction:
    (original commit)
    47 files changed, 131 insertions(+), 208 deletions(-)
    (fix for ksyms trimming)
    7 files changed, 18 insertions(+), 5 deletions(-)
    (two fixes for modversions)
    1 file changed, 34 insertions(+)
    3 files changed, 7 insertions(+), 2 deletions(-)
    which results in a net total of only 25 lines deleted.

    As there does not seem to be much benefit from this change of approach,
    revert the change.

    Signed-off-by: Russell King

    Russell King
     
  • Pull x86 fixes from Ingo Molnar:
    "Misc fixes:
    - two fixes to make (very) old Intel CPUs boot reliably
    - fix the intel-mid driver and rename it
    - two KASAN false positive fixes
    - an FPU fix
    - two sysfb fixes
    - two build fixes related to new toolchain versions"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/platform/intel-mid: Rename platform_wdt to platform_mrfld_wdt
    x86/build: Build compressed x86 kernels as PIE when !CONFIG_RELOCATABLE as well
    x86/platform/intel-mid: Register watchdog device after SCU
    x86/fpu: Fix invalid FPU ptrace state after execve()
    x86/boot: Fail the boot if !M486 and CPUID is missing
    x86/traps: Ignore high word of regs->cs in early_fixup_exception()
    x86/dumpstack: Prevent KASAN false positive warnings
    x86/unwind: Prevent KASAN false positive warnings in guess unwinder
    x86/boot: Avoid warning for zero-filling .bss
    x86/sysfb: Fix lfb_size calculation
    x86/sysfb: Add support for 64bit EFI lfb_base

    Linus Torvalds
     
  • Signed-off-by: Helge Deller

    Helge Deller
     

22 Nov, 2016

6 commits

  • Group validation expects all events to be of the same PMU; however
    is_uncore_pmu() is too wide, it matches _all_ uncore events, even
    across PMUs.

    This triggers failure when we group different events from different
    uncore PMUs, like:

    perf stat -vv -e '{uncore_cbox_0/config=0x0334/,uncore_qpi_0/event=1/}' -a sleep 1

    Fix is_uncore_pmu() by only matching events to the box at hand.

    Note that generic code; ran after this step; will disallow this
    mixture of PMU events.

    Reported-by: Jiri Olsa
    Tested-by: Jiri Olsa
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: Vince Weaver
    Link: http://lkml.kernel.org/r/20161118125354.GQ3117@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Vince Weaver reported that perf_fuzzer + KASAN detects that PEBS event
    unwinds sometimes do 'weird' things. In particular, we seemed to be
    ending up unwinding from random places on the NMI stack.

    While it was somewhat expected that the event record BP,SP would not
    match the interrupt BP,SP in that the interrupt is strictly later than
    the record event, it was overlooked that it could be on an already
    overwritten stack.

    Therefore, don't copy the recorded BP,SP over the interrupted BP,SP
    when we need stack unwinds.

    Note that its still possible the unwind doesn't full match the actual
    event, as its entirely possible to have done an (I)RET between record
    and interrupt, but on average it should still point in the general
    direction of where the event came from. Also, it's the best we can do,
    considering.

    The particular scenario that triggered the bogus NMI stack unwind was
    a PEBS event with very short period, upon enabling the event at the
    tail of the PMI handler (FREEZE_ON_PMI is not used), it instantly
    triggers a record (while still on the NMI stack) which in turn
    triggers the next PMI. This then causes back-to-back NMIs and we'll
    try and unwind the stack-frame from the last NMI, which obviously is
    now overwritten by our own.

    Analyzed-by: Josh Poimboeuf
    Reported-by: Vince Weaver
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: davej@codemonkey.org.uk
    Cc: dvyukov@google.com
    Cc: stable@vger.kernel.org
    Fixes: ca037701a025 ("perf, x86: Add PEBS infrastructure")
    Link: http://lkml.kernel.org/r/20161117171731.GV3157@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The following commit:

    75925e1ad7f5 ("perf/x86: Optimize stack walk user accesses")

    ... switched from copy_from_user_nmi() to __copy_from_user_nmi() with a manual
    access_ok() check.

    Unfortunately, copy_from_user_nmi() does an explicit check against TASK_SIZE,
    whereas the access_ok() uses whatever the current address limit of the task is.

    We are getting NMIs when __probe_kernel_read() has switched to KERNEL_DS, and
    then see vmalloc faults when we access what looks like pointers into vmalloc
    space:

    [] WARNING: CPU: 3 PID: 3685731 at arch/x86/mm/fault.c:435 vmalloc_fault+0x289/0x290
    [] CPU: 3 PID: 3685731 Comm: sh Tainted: G W 4.6.0-5_fbk1_223_gdbf0f40 #1
    [] Call Trace:
    [] [] dump_stack+0x4d/0x6c
    [] [] __warn+0xd3/0xf0
    [] [] warn_slowpath_null+0x1d/0x20
    [] [] vmalloc_fault+0x289/0x290
    [] [] __do_page_fault+0x330/0x490
    [] [] do_page_fault+0xc/0x10
    [] [] page_fault+0x22/0x30
    [] [] ? perf_callchain_user+0x100/0x2a0
    [] [] get_perf_callchain+0x17f/0x190
    [] [] perf_callchain+0x67/0x80
    [] [] perf_prepare_sample+0x2a0/0x370
    [] [] perf_event_output+0x20/0x60
    [] [] ? perf_event_update_userpage+0xc7/0x130
    [] [] __perf_event_overflow+0x181/0x1d0
    [] [] perf_event_overflow+0x14/0x20
    [] [] intel_pmu_handle_irq+0x1d3/0x490
    [] [] ? copy_user_enhanced_fast_string+0x7/0x10
    [] [] ? vunmap_page_range+0x1a1/0x2f0
    [] [] ? unmap_kernel_range_noflush+0x11/0x20
    [] [] ? ghes_copy_tofrom_phys+0x116/0x1f0
    [] [] ? x2apic_send_IPI_self+0x1d/0x20
    [] [] perf_event_nmi_handler+0x2d/0x50
    [] [] nmi_handle+0x61/0x110
    [] [] default_do_nmi+0x44/0x110
    [] [] do_nmi+0xdb/0x150
    [] [] end_repeat_nmi+0x1a/0x1e
    [] [] ? copy_user_enhanced_fast_string+0x7/0x10
    [] [] ? copy_user_enhanced_fast_string+0x7/0x10
    [] [] ? copy_user_enhanced_fast_string+0x7/0x10
    [] <> [] ? __probe_kernel_read+0x3e/0xa0

    Fix this by moving the valid_user_frame() check to before the uaccess
    that loads the return address and the pointer to the next frame.

    Signed-off-by: Johannes Weiner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Fixes: 75925e1ad7f5 ("perf/x86: Optimize stack walk user accesses")
    Signed-off-by: Ingo Molnar

    Johannes Weiner
     
  • After patch 4efca4ed0 ("kbuild: modversions for EXPORT_SYMBOL() for asm"),
    asm exports can get modversions CRCs generated if they have C definitions
    in asm-prototypes.h. This patch adds missing definitions for 32 and 64 bit
    allmodconfig builds.

    Fixes: 9445aa1a3062 ("ppc: move exports to definitions")
    Signed-off-by: Nicholas Piggin
    Signed-off-by: Michael Ellerman

    Nicholas Piggin
     
  • There is a new bit, LPCR_PECE_HVEE (Hypervisor Virtualization Exit
    Enable), which controls wakeup from STOP states on Hypervisor
    Virtualization Interrupts (which happen to also be all external
    interrupts in host or bare metal mode).

    It needs to be set or we will miss wakeups.

    Fixes: 9baaef0a22c8 ("powerpc/irq: Add support for HV virtualization interrupts")
    Cc: stable@vger.kernel.org # v4.8+
    Signed-off-by: Benjamin Herrenschmidt
    [mpe: Rename it to HVEE to match the name in the ISA]
    Signed-off-by: Michael Ellerman

    Benjamin Herrenschmidt
     
  • Pull sparc fixes from David Miller:

    1) With modern networking cards we can run out of 32-bit DMA space, so
    support 64-bit DMA addressing when possible on sparc64. From Dave
    Tushar.

    2) Some signal frame validation checks are inverted on sparc32, fix
    from Andreas Larsson.

    3) Lockdep tables can get too large in some circumstances on sparc64,
    add a way to adjust the size a bit. From Babu Moger.

    4) Fix NUMA node probing on some sun4v systems, from Thomas Tai.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
    sparc: drop duplicate header scatterlist.h
    lockdep: Limit static allocations if PROVE_LOCKING_SMALL is defined
    config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparc
    sunbmac: Fix compiler warning
    sunqe: Fix compiler warnings
    sparc64: Enable 64-bit DMA
    sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
    sparc64: Bind PCIe devices to use IOMMU v2 service
    sparc64: Initialize iommu_map_table and iommu_pool
    sparc64: Add ATU (new IOMMU) support
    sparc64: Add FORCE_MAX_ZONEORDER and default to 13
    sparc64: fix compile warning section mismatch in find_node()
    sparc32: Fix inverted invalid_frame_pointer checks on sigreturns
    sparc64: Fix find_node warning if numa node cannot be found

    Linus Torvalds
     

21 Nov, 2016

7 commits

  • Rename the watchdog platform library file to explicitly show that is used only
    on Intel Merrifield platforms.

    Signed-off-by: Andy Shevchenko
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20161118172723.179761-1-andriy.shevchenko@linux.intel.com
    Signed-off-by: Ingo Molnar

    Andy Shevchenko
     
  • Since the bootloader may load the compressed x86 kernel at any address,
    it should always be built as PIE, not just when CONFIG_RELOCATABLE=y.

    Otherwise, linker in binutils 2.27 will optimize GOT load into the
    absolute address when building the compressed x86 kernel as a non-PIE
    executable.

    Signed-off-by: H.J. Lu
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    [ Small wording changes. ]
    Signed-off-by: Ingo Molnar

    H.J. Lu
     
  • Watchdog device in Intel Tangier relies on SCU to be present. It uses the SCU
    IPC channel to send commands and receive responses. If watchdog driver is
    initialized quite before SCU and a command has been sent the result is always
    an error like the following:

    intel_mid_wdt: Error stopping watchdog: 0xffffffed

    Register watchdog device whne SCU is ready to avoid described issue.

    Signed-off-by: Andy Shevchenko
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20161118165224.175514-1-andriy.shevchenko@linux.intel.com
    [ Small cleanups. ]
    Signed-off-by: Ingo Molnar

    Andy Shevchenko
     
  • Robert O'Callahan reported that after an execve PTRACE_GETREGSET
    NT_X86_XSTATE continues to return the pre-exec register values
    until the exec'ed task modifies FPU state.

    The test code is at:

    https://bugzilla.redhat.com/attachment.cgi?id=1164286.

    What is happening is fpu__clear() does not properly clear fpstate.
    Fix it by doing just that.

    Reported-by: Robert O'Callahan
    Signed-off-by: Yu-cheng Yu
    Cc:
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: David Hansen
    Cc: Fenghua Yu
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Ravi V. Shankar
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1479402695-6553-1-git-send-email-yu-cheng.yu@intel.com
    Signed-off-by: Ingo Molnar

    Yu-cheng Yu
     
  • Linux will have all kinds of sporadic problems on systems that don't
    have the CPUID instruction unless CONFIG_M486=y. In particular,
    sync_core() will explode.

    I believe that these kernels had a better chance of working before
    commit 05fb3c199bb0 ("x86/boot: Initialize FPU and X86_FEATURE_ALWAYS
    even if we don't have CPUID"). That commit inadvertently fixed a
    serious bug: we used to fail to detect the FPU if CPUID wasn't
    present. Because we also used to forget to set X86_FEATURE_ALWAYS, we
    end up with no cpu feature bits set at all. This meant that
    alternative patching didn't do anything and, if paravirt was disabled,
    we could plausibly finish the entire boot process without calling
    sync_core().

    Rather than trying to work around these issues, just have the kernel
    fail loudly if it's running on a CPUID-less 486, doesn't have CPUID,
    and doesn't have CONFIG_M486 set.

    Reported-by: Matthew Whitehead
    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/70eac6639f23df8be5fe03fa1984aedd5d40077a.1479598603.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     
  • On the 80486 DX, it seems that some exceptions may leave garbage in
    the high bits of CS. This causes sporadic failures in which
    early_fixup_exception() refuses to fix up an exception.

    As far as I can tell, this has been buggy for a long time, but the
    problem seems to have been exacerbated by commits:

    1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4")
    e1bfc11c5a6f ("x86/init: Fix cr4_init_shadow() on CR4-less machines")

    This appears to have broken for as long as we've had early
    exception handling.

    [ Note to stable maintainers: This patch is needed all the way back to 3.4,
    but it will only apply to 4.6 and up, as it depends on commit:

    0e861fbb5bda ("x86/head: Move early exception panic code into early_fixup_exception()")

    If you want to backport to kernels before 4.6, please don't backport the
    prerequisites (there was a big chain of them that rewrote a lot of the
    early exception machinery); instead, ask me and I can send you a one-liner
    that will apply. ]

    Reported-by: Matthew Whitehead
    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: stable@vger.kernel.org
    Fixes: 4c5023a3fa2e ("x86-32: Handle exception table entries during early boot")
    Link: http://lkml.kernel.org/r/cb32c69920e58a1a58e7b5cad975038a69c0ce7d.1479609510.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     
  • Pull ARM fixes from Russell King:
    "A few more ARM fixes:

    - the assembly backtrace code suffers problems with the new printk()
    implementation which assumes that kernel messages without KERN_CONT
    should have newlines inserted between them. Fix this.
    - fix a section naming error - ".init.text" rather than ".text.init"
    - preallocate DMA debug memory at core_initcall() time rather than
    fs_initcall(), as we have some core drivers that need to use DMA
    mapping - and that triggers a kernel warning from the DMA debug
    code.
    - fix XIP kernels after the ro_after_init changes made this data
    permanently read-only"

    * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
    ARM: Fix XIP kernels
    ARM: 8628/1: dma-mapping: preallocate DMA-debug hash tables in core_initcall
    ARM: 8624/1: proc-v7m.S: fix init section name
    ARM: fix backtrace

    Linus Torvalds
     

20 Nov, 2016

2 commits

  • Pull ARM SoC fixes from Olof Johansson:
    "Again a set of smaller fixes across several platforms (OMAP, Marvell,
    Allwinner, i.MX, etc).

    A handful of typo fixes and smaller missing contents from device
    trees, with some tweaks to OMAP mach files to deal with CPU feature
    print misformatting, potential NULL ptr dereference and one setup
    issue with UARTs"

    * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
    ipmi/bt-bmc: change compatible node to 'aspeed, ast2400-ibt-bmc'
    ARM: dts: STiH410-b2260: Fix typo in spi0 chipselect definition
    ARM: dts: omap5: board-common: fix wrong SMPS6 (VDD-DDR3) voltage
    ARM: omap3: Add missing memory node in SOM-LV
    arm64: dts: marvell: add unique identifiers for Armada A8k SPI controllers
    arm64: dts: marvell: fix clocksource for CP110 slave SPI0
    arm64: dts: marvell: Fix typo in label name on Armada 37xx
    ASoC: omap-abe-twl6040: fix typo in bindings documentation
    dts: omap5: board-common: enable twl6040 headset jack detection
    dts: omap5: board-common: add phandle to reference Palmas gpadc
    ARM: OMAP2+: avoid NULL pointer dereference
    ARM: OMAP2+: PRM: initialize en_uart4_mask and grpsel_uart4_mask
    ARM: dts: omap3: Fix memory node in Torpedo board
    ARM: AM43XX: Select OMAP_INTERCONNECT in Kconfig
    ARM: OMAP3: Fix formatting of features printed
    ARM: dts: imx53-qsb: Fix regulator constraints
    ARM: dts: sun8i: fix the pinmux for UART1

    Linus Torvalds
     
  • Pull KVM fixes from Radim Krčmář:
    "ARM:
    - Fix handling of the 32bit cycle counter
    - Fix cycle counter filtering

    x86:
    - Fix a race leading to double unregistering of user notifiers
    - Amend oversight in kvm_arch_set_irq that turned Hyper-V code dead
    - Use SRCU around kvm_lapic_set_vapic_addr
    - Avoid recursive flushing of asynchronous page faults
    - Do not rely on deferred update in KVM_GET_CLOCK, which fixes #GP
    - Let userspace know that KVM_GET_CLOCK is useful with master clock;
    4.9 changed the return value to better match the guest clock, but
    didn't provide means to let guests take advantage of it"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    kvm: x86: merge kvm_arch_set_irq and kvm_arch_set_irq_inatomic
    KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr
    KVM: async_pf: avoid recursive flushing of work items
    kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use
    KVM: Disable irq while unregistering user notifier
    KVM: x86: do not go through vcpu in __get_kvmclock_ns
    KVM: arm64: Fix the issues when guest PMCCFILTR is configured
    arm64: KVM: pmu: Fix AArch32 cycle counter access

    Linus Torvalds