Eric Lee / smarc-fsl-linux-kernel

23 Jan, 2017

1 commit

095cbe669 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fix from Thomas Gleixner:
"Restore the retrigger callbacks in the IO APIC irq chips. That
addresses a long standing regression which got introduced with the
rewrite of the x86 irq subsystem two years ago and went unnoticed so
far"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioapic: Restore IO-APIC irq_chip retrigger callback

Linus Torvalds
2017-01-23 04:47:48 +0800

21 Jan, 2017

1 commit

4c9eff7af Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM fixes from Radim Krčmář:
"ARM:
- Fix for timer setup on VHE machines
- Drop spurious warning when the timer races against the vcpu running
again
- Prevent a vgic deadlock when the initialization fails (for stable)

s390:
- Fix a kernel memory exposure (for stable)

x86:
- Fix exception injection when hypercall instruction cannot be
patched"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: s390: do not expose random data via facility bitmap
KVM: x86: fix fixing of hypercalls
KVM: arm/arm64: vgic: Fix deadlock on error handling
KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems
KVM: arm/arm64: Fix occasional warning from the timer work function

Linus Torvalds
2017-01-21 06:19:34 +0800

20 Jan, 2017

1 commit

81aaeaac4 Merge tag 'pci-v4.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci ... Browse Code »

Pull PCI fixes from Bjorn Helgaas:

- recognize that a PCI-to-PCIe bridge originates a PCIe hierarchy, so
we enumerate that hierarchy correctly

- X-Gene: fix a change merged for v4.10 that broke MSI

- Keystone: avoid reading undefined registers, which can cause
asynchronous external aborts

- Supermicro X8DTH-i/6/iF/6F: ignore broken _CRS that caused us to
change (and break) existing I/O port assignments

* tag 'pci-v4.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI/MSI: pci-xgene-msi: Fix CPU hotplug registration handling
PCI: Enumerate switches below PCI-to-PCIe bridges
x86/PCI: Ignore _CRS on Supermicro X8DTH-i/6/iF/6F
PCI: designware: Check for iATU unroll only on platforms that use ATU

Linus Torvalds
2017-01-20 01:59:46 +0800

19 Jan, 2017

1 commit

ca92e6c7e Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull SMP hotplug update from Thomas Gleixner:
"This contains a trivial typo fix and an extension to the core code for
dynamically allocating states in the prepare stage.

The extension is necessary right now because we need a proper way to
unbreak LTTNG, which iscurrently non functional due to the removal of
the notifiers. Surely it's out of tree, but it's widely used by
distros.

The simple solution would have been to reserve a state for LTTNG, but
I'm not fond about unused crap in the kernel and the dynamic range,
which we admittedly should have done right away, allows us to remove
quite some of the hardcoded states, i.e. those which have no ordering
requirements. So doing the right thing now is better than having an
smaller intermediate solution which needs to be reworked anyway"

* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Provide dynamic range for prepare stage
perf/x86/amd/ibs: Fix typo after cleanup state names in cpu/hotplug

Linus Torvalds
2017-01-19 03:13:41 +0800

18 Jan, 2017

1 commit

020eb3daa x86/ioapic: Restore IO-APIC irq_chip retrigger callback ... Browse Code »

commit d32932d02e18 removed the irq_retrigger callback from the IO-APIC
chip and did not add it to the new IO-APIC-IR irq chip.

Unfortunately the software resend fallback is not enabled on X86, so edge
interrupts which are received during the lazy disabled state of the
interrupt line are not retriggered and therefor lost.

Restore the callbacks.

[ tglx: Massaged changelog ]

Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Signed-off-by: Ruslan Ruslichenko
Cc: xe-linux-external@cisco.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1484662432-13580-1-git-send-email-rruslich@cisco.com
Signed-off-by: Thomas Gleixner

Ruslan Ruslichenko
2017-01-18 22:37:28 +0800

17 Jan, 2017

2 commits

ce2e852ec KVM: x86: fix fixing of hypercalls ... Browse Code »

emulator_fix_hypercall() replaces hypercall with vmcall instruction,
but it does not handle GP exception properly when writes the new instruction.
It can return X86EMUL_PROPAGATE_FAULT without setting exception information.
This leads to incorrect emulation and triggers
WARN_ON(ctxt->exception.vector > 0x1f) in x86_emulate_insn()
as discovered by syzkaller fuzzer:

WARNING: CPU: 2 PID: 18646 at arch/x86/kvm/emulate.c:5558
Call Trace:
warn_slowpath_null+0x2c/0x40 kernel/panic.c:582
x86_emulate_insn+0x16a5/0x4090 arch/x86/kvm/emulate.c:5572
x86_emulate_instruction+0x403/0x1cc0 arch/x86/kvm/x86.c:5618
emulate_instruction arch/x86/include/asm/kvm_host.h:1127 [inline]
handle_exception+0x594/0xfd0 arch/x86/kvm/vmx.c:5762
vmx_handle_exit+0x2b7/0x38b0 arch/x86/kvm/vmx.c:8625
vcpu_enter_guest arch/x86/kvm/x86.c:6888 [inline]
vcpu_run arch/x86/kvm/x86.c:6947 [inline]

Set exception information when write in emulator_fix_hypercall() fails.

Signed-off-by: Dmitry Vyukov
Cc: Paolo Bonzini
Cc: Radim Krčmář
Cc: Wanpeng Li
Cc: kvm@vger.kernel.org
Cc: syzkaller@googlegroups.com
Signed-off-by: Radim Krčmář

Dmitry Vyukov
2017-01-17 22:06:05 +0800
4e71de798 perf/x86/intel: Handle exclusive threadid correctly on CPU hotplug ... Browse Code »

The CPU hotplug function intel_pmu_cpu_starting() sets
cpu_hw_events.excl_thread_id unconditionally to 1 when the shared exclusive
counters data structure is already availabe for the sibling thread.

This works during the boot process because the first sibling gets threadid
0 assigned and the second sibling which shares the data structure gets 1.

But when the first thread of the core is offlined and onlined again it
shares the data structure with the second thread and gets exclusive thread
id 1 assigned as well.

Prevent this by checking the threadid of the already online thread.

[ tglx: Rewrote changelog ]

Signed-off-by: Zhou Chengming
Cc: NuoHan Qiao
Cc: ak@linux.intel.com
Cc: peterz@infradead.org
Cc: kan.liang@intel.com
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: qiaonuohan@huawei.com
Cc: davidcc@google.com
Cc: guohanjun@huawei.com
Link: http://lkml.kernel.org/r/1484536871-3131-1-git-send-email-zhouchengming1@huawei.com
Signed-off-by: Thomas Gleixner
--- ---
arch/x86/events/intel/core.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

Zhou Chengming
2017-01-17 18:08:36 +0800

16 Jan, 2017

3 commits

83346fbc0 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Ingo Molnar:
"Misc fixes:

- unwinder fixes
- AMD CPU topology enumeration fixes
- microcode loader fixes
- x86 embedded platform fixes
- fix for a bootup crash that may trigger when clearcpuid= is used
with invalid values"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mpx: Use compatible types in comparison to fix sparse error
x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc()
x86/entry: Fix the end of the stack for newly forked tasks
x86/unwind: Include __schedule() in stack traces
x86/unwind: Disable KASAN checks for non-current tasks
x86/unwind: Silence warnings for non-current tasks
x86/microcode/intel: Use correct buffer size for saving microcode data
x86/microcode/intel: Fix allocation size of struct ucode_patch
x86/microcode/intel: Add a helper which gives the microcode revision
x86/microcode: Use native CPUID to tickle out microcode revision
x86/CPU: Add native CPUID variants returning a single datum
x86/boot: Add missing declaration of string functions
x86/CPU/AMD: Fix Bulldozer topology
x86/platform/intel-mid: Rename 'spidev' to 'mrfld_spidev'
x86/cpu: Fix typo in the comment for Anniedale
x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option

Linus Torvalds
2017-01-16 04:03:11 +0800
79078c53b Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Misc race fixes uncovered by fuzzing efforts, a Sparse fix, two PMU
driver fixes, plus miscellanous tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Reject non sampling events with precise_ip
perf/x86/intel: Account interrupts for PEBS errors
perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
perf/core: Fix sys_perf_event_open() vs. hotplug
perf/x86/intel: Use ULL constant to prevent undefined shift behaviour
perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code
perf/x86: Set pmu->module in Intel PMU modules
perf probe: Fix to probe on gcc generated symbols for offline kernel
perf probe: Fix --funcs to show correct symbols for offline module
perf symbols: Robustify reading of build-id from sysfs
perf tools: Install tools/lib/traceevent plugins with install-bin
tools lib traceevent: Fix prev/next_prio for deadline tasks
perf record: Fix --switch-output documentation and comment
perf record: Make __record_options static
tools lib subcmd: Add OPT_STRING_OPTARG_SET option
perf probe: Fix to get correct modname from elf header
samples/bpf trace_output_user: Remove duplicate sys/ioctl.h include
samples/bpf sock_example: Avoid getting ethhdr from two includes
perf sched timehist: Show total scheduling time

Linus Torvalds
2017-01-16 03:37:43 +0800
255e6140f Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull EFI fixes from Ingo Molnar:
"A number of regression fixes:

- Fix a boot hang on machines that have somewhat unusual memory map
entries of phys_addr=0x0 num_pages=0, which broke due to a recent
commit. This commit got cherry-picked from the v4.11 queue because
the bug is affecting real machines.

- Fix a boot hang also reported by KASAN, caused by incorrect init
ordering introduced by a recent optimization.

- Fix a recent robustification fix to allocate_new_fdt_and_exit_boot()
that introduced an invalid assumption. Neither bugs were seen in
the wild AFAIK"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/x86: Prune invalid memory map entries and fix boot regression
x86/efi: Don't allocate memmap through memblock after mm_init()
efi/libstub/arm*: Pass latest memory map to the kernel

Linus Torvalds
2017-01-16 02:54:39 +0800

14 Jan, 2017

6 commits

0100a3e67 efi/x86: Prune invalid memory map entries and fix boot regression ... Browse Code »

Some machines, such as the Lenovo ThinkPad W541 with firmware GNET80WW
(2.28), include memory map entries with phys_addr=0x0 and num_pages=0.

These machines fail to boot after the following commit,

commit 8e80632fb23f ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")

Fix this by removing such bogus entries from the memory map.

Furthermore, currently the log output for this case (with efi=debug)
looks like:

[ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[0x0000000000000000-0xffffffffffffffff] (0MB)

This is clearly wrong, and also not as informative as it could be. This
patch changes it so that if we find obviously invalid memory map
entries, we print an error and skip those entries. It also detects the
display of the address range calculation overflow, so the new output is:

[ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
[ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[0x0000000000000000-0x0000000000000000] (invalid)

It also detects memory map sizes that would overflow the physical
address, for example phys_addr=0xfffffffffffff000 and
num_pages=0x0200000000000001, and prints:

[ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
[ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[phys_addr=0xfffffffffffff000-0x20ffffffffffffffff] (invalid)

It then removes these entries from the memory map.

Signed-off-by: Peter Jones
Signed-off-by: Ard Biesheuvel
[ardb: refactor for clarity with no functional changes, avoid PAGE_SHIFT]
Signed-off-by: Matt Fleming
[Matt: Include bugzilla info in commit log]
Cc: # v4.9+
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: https://bugzilla.kernel.org/show_bug.cgi?id=191121
Signed-off-by: Ingo Molnar

Peter Jones
2017-01-14 23:48:53 +0800
18e7a45af perf/x86: Reject non sampling events with precise_ip ... Browse Code »

As Peter suggested [1] rejecting non sampling PEBS events,
because they dont make any sense and could cause bugs
in the NMI handler [2].

[1] http://lkml.kernel.org/r/20170103094059.GC3093@worktop
[2] http://lkml.kernel.org/r/1482931866-6018-3-git-send-email-jolsa@kernel.org

Signed-off-by: Jiri Olsa
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: Vince Weaver
Link: http://lkml.kernel.org/r/20170103142454.GA26251@krava
Signed-off-by: Ingo Molnar

Jiri Olsa
2017-01-14 18:06:50 +0800
475113d93 perf/x86/intel: Account interrupts for PEBS errors ... Browse Code »

It's possible to set up PEBS events to get only errors and not
any data, like on SNB-X (model 45) and IVB-EP (model 62)
via 2 perf commands running simultaneously:

taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10

This leads to a soft lock up, because the error path of the
intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt
for error PEBS interrupts, so in case you're getting ONLY
errors you don't have a way to stop the event when it's over
the max_samples_per_tick limit:

NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816]
...
RIP: 0010:[] [] smp_call_function_single+0xe2/0x140
...
Call Trace:
? trace_hardirqs_on_caller+0xf5/0x1b0
? perf_cgroup_attach+0x70/0x70
perf_install_in_context+0x199/0x1b0
? ctx_resched+0x90/0x90
SYSC_perf_event_open+0x641/0xf90
SyS_perf_event_open+0x9/0x10
do_syscall_64+0x6c/0x1f0
entry_SYSCALL64_slow_path+0x25/0x25

Add perf_event_account_interrupt() which does the interrupt
and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s
error path.

We keep the pending_kill and pending_wakeup logic only in the
__perf_event_overflow() path, because they make sense only if
there's any data to deliver.

Signed-off-by: Jiri Olsa
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: Vince Weaver
Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar

Jiri Olsa
2017-01-14 18:06:49 +0800
453828625 x86/mpx: Use compatible types in comparison to fix sparse error ... Browse Code »

info->si_addr is of type void __user *, so it should be compared against
something from the same address space.

This fixes the following sparse error:

arch/x86/mm/mpx.c:296:27: error: incompatible types in comparison expression (different address spaces)

Signed-off-by: Tobias Klauser
Cc: Dave Hansen
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Tobias Klauser
2017-01-14 16:32:06 +0800
695085b4b x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc() ... Browse Code »

The Intel Denverton microserver uses a 25 MHz TSC crystal,
so we can derive its exact [*] TSC frequency
using CPUID and some arithmetic, eg.:

TSC: 1800 MHz (25000000 Hz * 216 / 3 / 1000000)

[*] 'exact' is only as good as the crystal, which should be +/- 20ppm

Signed-off-by: Len Brown
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/306899f94804aece6d8fa8b4223ede3b48dbb59c.1484287748.git.len.brown@intel.com
Signed-off-by: Ingo Molnar

Len Brown
2017-01-14 16:30:37 +0800
406732c93 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM fixes from Paolo Bonzini:

- fix for module unload vs deferred jump labels (note: there might be
other buggy modules!)

- two NULL pointer dereferences from syzkaller

- also syzkaller: fix emulation of fxsave/fxrstor/sgdt/sidt, problem
made worse during this merge window, "just" kernel memory leak on
releases

- fix emulation of "mov ss" - somewhat serious on AMD, less so on Intel

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: fix emulation of "MOV SS, null selector"
KVM: x86: fix NULL deref in vcpu_scan_ioapic
KVM: eventfd: fix NULL deref irqbypass consumer
KVM: x86: Introduce segmented_write_std
KVM: x86: flush pending lapic jump label updates on module unload
jump_labels: API for flushing deferred jump label updates

Linus Torvalds
2017-01-14 09:06:24 +0800

12 Jan, 2017

9 commits

33ab91103 KVM: x86: fix emulation of "MOV SS, null selector" ... Browse Code »

This is CVE-2017-2583. On Intel this causes a failed vmentry because
SS's type is neither 3 nor 7 (even though the manual says this check is
only done for usable SS, and the dmesg splat says that SS is unusable!).
On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb.

The fix fabricates a data segment descriptor when SS is set to a null
selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb.
Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3;
this in turn ensures CPL < 3 because RPL must be equal to CPL.

Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing
the bug and deciphering the manuals.

Reported-by: Xiaohan Zhang
Fixes: 79d5b4c3cd809c770d4bf9812635647016c56011
Cc: stable@nongnu.org
Signed-off-by: Paolo Bonzini

Paolo Bonzini
2017-01-12 22:17:13 +0800
546d87e5c KVM: x86: fix NULL deref in vcpu_scan_ioapic ... Browse Code »

Reported by syzkaller:

BUG: unable to handle kernel NULL pointer dereference at 00000000000001b0
IP: _raw_spin_lock+0xc/0x30
PGD 3e28eb067
PUD 3f0ac6067
PMD 0
Oops: 0002 [#1] SMP
CPU: 0 PID: 2431 Comm: test Tainted: G OE 4.10.0-rc1+ #3
Call Trace:
? kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
kvm_arch_vcpu_ioctl_run+0x10a8/0x15f0 [kvm]
? pick_next_task_fair+0xe1/0x4e0
? kvm_arch_vcpu_load+0xea/0x260 [kvm]
kvm_vcpu_ioctl+0x33a/0x600 [kvm]
? hrtimer_try_to_cancel+0x29/0x130
? do_nanosleep+0x97/0xf0
do_vfs_ioctl+0xa1/0x5d0
? __hrtimer_init+0x90/0x90
? do_nanosleep+0x5b/0xf0
SyS_ioctl+0x79/0x90
do_syscall_64+0x6e/0x180
entry_SYSCALL64_slow_path+0x25/0x25
RIP: _raw_spin_lock+0xc/0x30 RSP: ffffa43688973cc0

The syzkaller folks reported a NULL pointer dereference due to
ENABLE_CAP succeeding even without an irqchip. The Hyper-V
synthetic interrupt controller is activated, resulting in a
wrong request to rescan the ioapic and a NULL pointer dereference.

#include
#include
#include
#include
#include
#include
#include
#include
#include
#include

#ifndef KVM_CAP_HYPERV_SYNIC
#define KVM_CAP_HYPERV_SYNIC 123
#endif

void* thr(void* arg)
{
struct kvm_enable_cap cap;
cap.flags = 0;
cap.cap = KVM_CAP_HYPERV_SYNIC;
ioctl((long)arg, KVM_ENABLE_CAP, &cap);
return 0;
}

int main()
{
void *host_mem = mmap(0, 0x1000, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
int kvmfd = open("/dev/kvm", 0);
int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
struct kvm_userspace_memory_region memreg;
memreg.slot = 0;
memreg.flags = 0;
memreg.guest_phys_addr = 0;
memreg.memory_size = 0x1000;
memreg.userspace_addr = (unsigned long)host_mem;
host_mem[0] = 0xf4;
ioctl(vmfd, KVM_SET_USER_MEMORY_REGION, &memreg);
int cpufd = ioctl(vmfd, KVM_CREATE_VCPU, 0);
struct kvm_sregs sregs;
ioctl(cpufd, KVM_GET_SREGS, &sregs);
sregs.cr0 = 0;
sregs.cr4 = 0;
sregs.efer = 0;
sregs.cs.selector = 0;
sregs.cs.base = 0;
ioctl(cpufd, KVM_SET_SREGS, &sregs);
struct kvm_regs regs = { .rflags = 2 };
ioctl(cpufd, KVM_SET_REGS, ®s);
ioctl(vmfd, KVM_CREATE_IRQCHIP, 0);
pthread_t th;
pthread_create(&th, 0, thr, (void*)(long)cpufd);
usleep(rand() % 10000);
ioctl(cpufd, KVM_RUN, 0);
pthread_join(th, 0);
return 0;
}

This patch fixes it by failing ENABLE_CAP if without an irqchip.

Reported-by: Dmitry Vyukov
Fixes: 5c919412fe61 (kvm/x86: Hyper-V synthetic interrupt controller)
Cc: stable@vger.kernel.org # 4.5+
Cc: Paolo Bonzini
Cc: Radim Krčmář
Cc: Dmitry Vyukov
Signed-off-by: Wanpeng Li
Signed-off-by: Paolo Bonzini

Wanpeng Li
2017-01-12 21:52:52 +0800
129a72a0d KVM: x86: Introduce segmented_write_std ... Browse Code »

Introduces segemented_write_std.

Switches from emulated reads/writes to standard read/writes in fxsave,
fxrstor, sgdt, and sidt. This fixes CVE-2017-2584, a longstanding
kernel memory leak.

Since commit 283c95d0e389 ("KVM: x86: emulate FXSAVE and FXRSTOR",
2016-11-09), which is luckily not yet in any final release, this would
also be an exploitable kernel memory *write*!

Reported-by: Dmitry Vyukov
Cc: stable@vger.kernel.org
Fixes: 96051572c819194c37a8367624b285be10297eca
Fixes: 283c95d0e3891b64087706b344a4b545d04a6e62
Suggested-by: Paolo Bonzini
Signed-off-by: Steve Rutherford
Signed-off-by: Paolo Bonzini

Steve Rutherford
2017-01-12 21:34:58 +0800
cef84c302 KVM: x86: flush pending lapic jump label updates on module unload ... Browse Code »

KVM's lapic emulation uses static_key_deferred (apic_{hw,sw}_disabled).
These are implemented with delayed_work structs which can still be
pending when the KVM module is unloaded. We've seen this cause kernel
panics when the kvm_intel module is quickly reloaded.

Use the new static_key_deferred_flush() API to flush pending updates on
module unload.

Signed-off-by: David Matlack
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini

David Matlack
2017-01-12 21:33:17 +0800
ff3f7e247 x86/entry: Fix the end of the stack for newly forked tasks ... Browse Code »

When unwinding a task, the end of the stack is always at the same offset
right below the saved pt_regs, regardless of which syscall was used to
enter the kernel. That convention allows the unwinder to verify that a
stack is sane.

However, newly forked tasks don't always follow that convention, as
reported by the following unwinder warning seen by Dave Jones:

WARNING: kernel stack frame pointer at ffffc90001443f30 in kworker/u8:8:30468 has bad value (null)

The warning was due to the following call chain:

(ftrace handler)
call_usermodehelper_exec_async+0x5/0x140
ret_from_fork+0x22/0x30

The problem is that ret_from_fork() doesn't create a stack frame before
calling other functions. Fix that by carefully using the frame pointer
macros.

In addition to conforming to the end of stack convention, this also
makes related stack traces more sensible by making it clear to the user
that ret_from_fork() was involved.

Reported-by: Dave Jones
Signed-off-by: Josh Poimboeuf
Cc: Andy Lutomirski
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: Dmitry Vyukov
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Miroslav Benes
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/8854cdaab980e9700a81e9ebf0d4238e4bbb68ef.1483978430.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar

Josh Poimboeuf
2017-01-12 16:28:29 +0800
2c96b2fe9 x86/unwind: Include __schedule() in stack traces ... Browse Code »

In the following commit:

0100301bfdf5 ("sched/x86: Rewrite the switch_to() code")

... the layout of the 'inactive_task_frame' struct was designed to have
a frame pointer header embedded in it, so that the unwinder could use
the 'bp' and 'ret_addr' fields to report __schedule() on the stack (or
ret_from_fork() for newly forked tasks which haven't actually run yet).

Finish the job by changing get_frame_pointer() to return a pointer to
inactive_task_frame's 'bp' field rather than 'bp' itself. This allows
the unwinder to start one frame higher on the stack, so that it properly
reports __schedule().

Reported-by: Miroslav Benes
Signed-off-by: Josh Poimboeuf
Cc: Andy Lutomirski
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Jones
Cc: Denys Vlasenko
Cc: Dmitry Vyukov
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/598e9f7505ed0aba86e8b9590aa528c6c7ae8dcd.1483978430.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar

Josh Poimboeuf
2017-01-12 16:28:28 +0800
84936118b x86/unwind: Disable KASAN checks for non-current tasks ... Browse Code »

There are a handful of callers to save_stack_trace_tsk() and
show_stack() which try to unwind the stack of a task other than current.
In such cases, it's remotely possible that the task is running on one
CPU while the unwinder is reading its stack from another CPU, causing
the unwinder to see stack corruption.

These cases seem to be mostly harmless. The unwinder has checks which
prevent it from following bad pointers beyond the bounds of the stack.
So it's not really a bug as long as the caller understands that
unwinding another task will not always succeed.

In such cases, it's possible that the unwinder may read a KASAN-poisoned
region of the stack. Account for that by using READ_ONCE_NOCHECK() when
reading the stack of another task.

Use READ_ONCE() when reading the stack of the current task, since KASAN
warnings can still be useful for finding bugs in that case.

Reported-by: Dmitry Vyukov
Signed-off-by: Josh Poimboeuf
Cc: Andy Lutomirski
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dave Jones
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Miroslav Benes
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/4c575eb288ba9f73d498dfe0acde2f58674598f1.1483978430.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar

Josh Poimboeuf
2017-01-12 16:28:27 +0800
900742d89 x86/unwind: Silence warnings for non-current tasks ... Browse Code »

There are a handful of callers to save_stack_trace_tsk() and
show_stack() which try to unwind the stack of a task other than current.
In such cases, it's remotely possible that the task is running on one
CPU while the unwinder is reading its stack from another CPU, causing
the unwinder to see stack corruption.

These cases seem to be mostly harmless. The unwinder has checks which
prevent it from following bad pointers beyond the bounds of the stack.
So it's not really a bug as long as the caller understands that
unwinding another task will not always succeed.

Since stack "corruption" on another task's stack isn't necessarily a
bug, silence the warnings when unwinding tasks other than current.

Reported-by: Dave Jones
Signed-off-by: Josh Poimboeuf
Cc: Andy Lutomirski
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: Dmitry Vyukov
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Miroslav Benes
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/00d8c50eea3446c1524a2a755397a3966629354c.1483978430.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar

Josh Poimboeuf
2017-01-12 16:28:27 +0800
a6b6e6165 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 ... Browse Code »

Pull crypto fix from Herbert Xu:
"This fixes a regression in aesni that renders it useless if it's
built-in with a modular pcbc configuration"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: aesni - Fix failure when built-in with modular pcbc

Linus Torvalds
2017-01-12 01:28:13 +0800

11 Jan, 2017

3 commits

ad5013d56 perf/x86/intel: Use ULL constant to prevent undefined shift behaviour ... Browse Code »

When x86_pmu.num_counters is 32 the shift of the integer constant 1 is
exceeding 32bit and therefor undefined behaviour.

Fix this by shifting 1ULL instead of 1.

Reported-by: CoverityScan CID#1192105 ("Bad bit shift operation")
Signed-off-by: Colin Ian King
Cc: Andi Kleen
Cc: Peter Zijlstra
Cc: Kan Liang
Cc: Stephane Eranian
Cc: Alexander Shishkin
Link: http://lkml.kernel.org/r/20170111114310.17928-1-colin.king@canonical.com
Signed-off-by: Thomas Gleixner

Colin King
2017-01-11 23:43:30 +0800
89e9f7bcd x86/PCI: Ignore _CRS on Supermicro X8DTH-i/6/iF/6F ... Browse Code »

Martin reported that the Supermicro X8DTH-i/6/iF/6F advertises incorrect
host bridge windows via _CRS:

pci_root PNP0A08:00: host bridge window [io 0xf000-0xffff]
pci_root PNP0A08:01: host bridge window [io 0xf000-0xffff]

Both bridges advertise the 0xf000-0xffff window, which cannot be correct.

Work around this by ignoring _CRS on this system. The downside is that we
may not assign resources correctly to hot-added PCI devices (if they are
possible on this system).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=42606
Reported-by: Martin Burnicki
Signed-off-by: Bjorn Helgaas
CC: stable@vger.kernel.org

Bjorn Helgaas
2017-01-11 23:11:15 +0800
6d6daa209 perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code ... Browse Code »

hswep_uncore_cpu_init() uses a hardcoded physical package id 0 for the boot
cpu. This works as long as the boot CPU is actually on the physical package
0, which is normaly the case after power on / reboot.

But it fails with a NULL pointer dereference when a kdump kernel is started
on a secondary socket which has a different physical package id because the
locigal package translation for physical package 0 does not exist.

Use the logical package id of the boot cpu instead of hard coded 0.

[ tglx: Rewrote changelog once more ]

Fixes: cf6d445f6897 ("perf/x86/uncore: Track packages, not per CPU data")
Signed-off-by: Prarit Bhargava
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Borislav Petkov
Cc: H. Peter Anvin
Cc: Harish Chegondi
Cc: Jiri Olsa
Cc: Kan Liang
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1483628965-2890-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner

Prarit Bhargava
2017-01-11 19:13:21 +0800

10 Jan, 2017

6 commits

2e86222c6 x86/microcode/intel: Use correct buffer size for saving microcode data ... Browse Code »

In generic_load_microcode(), curr_mc_size is the size of the last
allocated buffer and since we have this performance "optimization"
there to vmalloc a new buffer only when the current one is bigger,
curr_mc_size ends up becoming the size of the biggest buffer we've seen
so far.

However, we end up saving the microcode patch which matches our CPU
and its size is not curr_mc_size but the respective mc_size during the
iteration while we're staring at it.

So save that mc_size into a separate variable and use it to store the
previously found microcode buffer.

Without this fix, we could get oops like this:

BUG: unable to handle kernel paging request at ffffc9000e30f000
IP: __memcpy+0x12/0x20
...
Call Trace:
? kmemdup+0x43/0x60
__alloc_microcode_buf+0x44/0x70
save_microcode_patch+0xd4/0x150
generic_load_microcode+0x1b8/0x260
request_microcode_user+0x15/0x20
microcode_write+0x91/0x100
__vfs_write+0x34/0x120
vfs_write+0xc1/0x130
SyS_write+0x56/0xc0
do_syscall_64+0x6c/0x160
entry_SYSCALL64_slow_path+0x25/0x25

Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading")
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Borislav Petkov
Link: http://lkml.kernel.org/r/4f33cbfd-44f2-9bed-3b66-7446cd14256f@ce.jp.nec.com
Signed-off-by: Thomas Gleixner

Junichi Nomura
2017-01-10 06:11:15 +0800
9fcf5ba2e x86/microcode/intel: Fix allocation size of struct ucode_patch ... Browse Code »

We allocate struct ucode_patch here. @size is the size of microcode data
and used for kmemdup() later in this function.

Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading")
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Borislav Petkov
Link: http://lkml.kernel.org/r/7a730dc9-ac17-35c4-fe76-dfc94e5ecd95@ce.jp.nec.com
Signed-off-by: Thomas Gleixner

Junichi Nomura
2017-01-10 06:11:14 +0800
4167709bb x86/microcode/intel: Add a helper which gives the microcode revision ... Browse Code »

Since on Intel we're required to do CPUID(1) first, before reading
the microcode revision MSR, let's add a special helper which does the
required steps so that we don't forget to do them next time, when we
want to read the microcode revision.

Signed-off-by: Borislav Petkov
Link: http://lkml.kernel.org/r/20170109114147.5082-4-bp@alien8.de
Signed-off-by: Thomas Gleixner

Borislav Petkov
2017-01-10 06:11:14 +0800
f3e2a51f5 x86/microcode: Use native CPUID to tickle out microcode revision ... Browse Code »

Intel supplies the microcode revision value in MSR 0x8b
(IA32_BIOS_SIGN_ID) after CPUID(1) has been executed. Execute it each
time before reading that MSR.

It used to do sync_core() which did do CPUID but

c198b121b1a1 ("x86/asm: Rewrite sync_core() to use IRET-to-self")

changed the sync_core() implementation so we better make the microcode
loading case explicit, as the SDM documents it.

Reported-and-tested-by: Jun'ichi Nomura
Signed-off-by: Borislav Petkov
Link: http://lkml.kernel.org/r/20170109114147.5082-3-bp@alien8.de
Signed-off-by: Thomas Gleixner

Borislav Petkov
2017-01-10 06:11:14 +0800
5dedade6d x86/CPU: Add native CPUID variants returning a single datum ... Browse Code »

... similarly to the cpuid_() variants.

Signed-off-by: Borislav Petkov
Link: http://lkml.kernel.org/r/20170109114147.5082-2-bp@alien8.de
Signed-off-by: Thomas Gleixner

Borislav Petkov
2017-01-10 06:11:13 +0800
c92f5bdc4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Fix dumping of nft_quota entries, from Pablo Neira Ayuso.

2) Fix out of bounds access in nf_tables discovered by KASAN, from
Florian Westphal.

3) Fix IRQ enabling in dp83867 driver, from Grygorii Strashko.

4) Fix unicast filtering in be2net driver, from Ivan Vecera.

5) tg3_get_stats64() can race with driver close and ethtool
reconfigurations, fix from Michael Chan.

6) Fix error handling when pass limit is reached in bpf code gen on
x86. From Daniel Borkmann.

7) Don't clobber switch ops and use proper MDIO nested reads and writes
in bcm_sf2 driver, from Florian Fainelli.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits)
net: dsa: bcm_sf2: Utilize nested MDIO read/write
net: dsa: bcm_sf2: Do not clobber b53_switch_ops
net: stmmac: fix maxmtu assignment to be within valid range
bpf: change back to orig prog on too many passes
tg3: Fix race condition in tg3_get_stats64().
be2net: fix unicast list filling
be2net: fix accesses to unicast list
netlabel: add CALIPSO to the list of built-in protocols
vti6: fix device register to report IFLA_INFO_KIND
net: phy: dp83867: fix irq generation
amd-xgbe: Fix IRQ processing when running in single IRQ mode
sh_eth: R8A7740 supports packet shecksumming
sh_eth: fix EESIPR values for SH77{34|63}
r8169: fix the typo in the comment
nl80211: fix sched scan netlink socket owner destruction
bridge: netfilter: Fix dropping packets that moving through bridge interface
netfilter: ipt_CLUSTERIP: check duplicate config when initializing
netfilter: nft_payload: mangle ckecksum if NFT_PAYLOAD_L4CSUM_PSEUDOHDR is set
netfilter: nf_tables: fix oob access
netfilter: nft_queue: use raw_smp_processor_id()
...

Linus Torvalds
2017-01-10 03:58:28 +0800

09 Jan, 2017

2 commits

fac69d0ef x86/boot: Add missing declaration of string functions ... Browse Code »

Add the missing declarations of basic string functions to string.h to allow
a clean build.

Fixes: 5be865661516 ("String-handling functions for the new x86 setup code.")
Signed-off-by: Nicholas Mc Guire
Link: http://lkml.kernel.org/r/1483781911-21399-1-git-send-email-hofrat@osadl.org
Signed-off-by: Thomas Gleixner

Nicholas Mc Guire
2017-01-09 18:53:05 +0800
9d5ecb09d bpf: change back to orig prog on too many passes ... Browse Code »

If after too many passes still no image could be emitted, then
swap back to the original program as we do in all other cases
and don't use the one with blinding.

Fixes: 959a75791603 ("bpf, x86: add support for constant blinding")
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Daniel Borkmann
2017-01-09 06:00:18 +0800

07 Jan, 2017

3 commits

20b1e22d0 x86/efi: Don't allocate memmap through memblock after mm_init() ... Browse Code »

With the following commit:

4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")

... efi_bgrt_init() calls into the memblock allocator through
efi_mem_reserve() => efi_arch_mem_reserve() *after* mm_init() has been called.

Indeed, KASAN reports a bad read access later on in efi_free_boot_services():

BUG: KASAN: use-after-free in efi_free_boot_services+0xae/0x24c
at addr ffff88022de12740
Read of size 4 by task swapper/0/0
page:ffffea0008b78480 count:0 mapcount:-127
mapping: (null) index:0x1 flags: 0x5fff8000000000()
[...]
Call Trace:
dump_stack+0x68/0x9f
kasan_report_error+0x4c8/0x500
kasan_report+0x58/0x60
__asan_load4+0x61/0x80
efi_free_boot_services+0xae/0x24c
start_kernel+0x527/0x562
x86_64_start_reservations+0x24/0x26
x86_64_start_kernel+0x157/0x17a
start_cpu+0x5/0x14

The instruction at the given address is the first read from the memmap's
memory, i.e. the read of md->type in efi_free_boot_services().

Note that the writes earlier in efi_arch_mem_reserve() don't splat because
they're done through early_memremap()ed addresses.

So, after memblock is gone, allocations should be done through the "normal"
page allocator. Introduce a helper, efi_memmap_alloc() for this. Use
it from efi_arch_mem_reserve(), efi_free_boot_services() and, for the sake
of consistency, from efi_fake_memmap() as well.

Note that for the latter, the memmap allocations cease to be page aligned.
This isn't needed though.

Tested-by: Dan Williams
Signed-off-by: Nicolai Stange
Reviewed-by: Ard Biesheuvel
Cc: # v4.9
Cc: Dave Young
Cc: Linus Torvalds
Cc: Matt Fleming
Cc: Mika Penttilä
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-efi@vger.kernel.org
Fixes: 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
Link: http://lkml.kernel.org/r/20170105125130.2815-1-nicstange@gmail.com
Signed-off-by: Ingo Molnar

Nicolai Stange
2017-01-07 15:58:07 +0800
08289086b Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM fixes from Radim Krčmář:
"MIPS:
- fix host kernel crashes when receiving a signal with 64-bit
userspace

- flush instruction cache on all vcpus after generating entry code

(both for stable)

x86:
- fix NULL dereference in MMU caused by SMM transitions (for stable)

- correct guest instruction pointer after emulating some VMX errors

- minor cleanup"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: remove duplicated declaration
KVM: MIPS: Flush KVM entry code from icache globally
KVM: MIPS: Don't clobber CP0_Status.UX
KVM: x86: reset MMU on KVM_SET_VCPU_EVENTS
KVM: nVMX: fix instruction skipping during emulated vm-entry

Linus Torvalds
2017-01-07 07:27:17 +0800
2fd8774c7 Merge branch 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb ... Browse Code »

Pull swiotlb fixes from Konrad Rzeszutek Wilk:
"This has one fix to make i915 work when using Xen SWIOTLB, and a
feature from Geert to aid in debugging of devices that can't do DMA
outside the 32-bit address space.

The feature from Geert is on top of v4.10 merge window commit
(specifically you pulling my previous branch), as his changes were
dependent on the Documentation/ movement patches.

I figured it would just easier than me trying than to cherry-pick the
Documentation patches to satisfy git.

The patches have been soaking since 12/20, albeit I updated the last
patch due to linux-next catching an compiler error and adding an
Tested-and-Reported-by tag"

* 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
swiotlb: Export swiotlb_max_segment to users
swiotlb: Add swiotlb=noforce debug option
swiotlb: Convert swiotlb_force from int to enum
x86, swiotlb: Simplify pci_swiotlb_detect_override()

Linus Torvalds
2017-01-07 02:53:21 +0800

06 Jan, 2017

1 commit

a33d33176 x86/CPU/AMD: Fix Bulldozer topology ... Browse Code »

The following commit:

8196dab4fc15 ("x86/cpu: Get rid of compute_unit_id")

... broke the initial strategy for Bulldozer-based cores' topology,
where we consider each thread of a compute unit a standalone core
and not a HT or SMT thread.

Revert to the firmware-supplied core_id numbering and do not make
them thread siblings as we don't consider them for such even if they
technically are, more or less.

Reported-and-tested-by: Brice Goglin
Tested-by: Yazen Ghannam
Signed-off-by: Borislav Petkov
Cc: # v4.6+
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: 8196dab4fc15 ("x86/cpu: Get rid of compute_unit_id")
Link: http://lkml.kernel.org/r/20170105092638.5247-1-bp@alien8.de
Signed-off-by: Ingo Molnar

Borislav Petkov
2017-01-06 15:37:41 +0800