Eric Lee / smarc-fsl-linux-kernel

10 Apr, 2016

1 commit

40bca9dba Merge tag 'pm+acpi-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management and ACPI fixes from Rafael Wysocki:
"Fixes for some issues discovered after recent changes and for some
that have just been found lately regardless of those changes
(intel_pstate, intel_idle, PM core, mailbox/pcc, turbostat) plus
support for some new CPU models (intel_idle, Intel RAPL driver,
turbostat) and documentation updates (intel_pstate, PM core).

Specifics:

- intel_pstate fixes for two issues exposed by the recent switch over
from using timers and for one issue introduced during the 4.4 cycle
plus new comments describing data structures used by the driver
(Rafael Wysocki, Srinivas Pandruvada).

- intel_idle fixes related to CPU offline/online (Richard Cochran).

- intel_idle support (new CPU IDs and state definitions mostly) for
Skylake-X and Kabylake processors (Len Brown).

- PCC mailbox driver fix for an out-of-bounds memory access that may
cause the kernel to panic() (Shanker Donthineni).

- New (missing) CPU ID for one apparently overlooked Haswell model in
the Intel RAPL power capping driver (Srinivas Pandruvada).

- Fix for the PM core's wakeup IRQs framework to make it work after
wakeup settings reconfiguration from sysfs (Grygorii Strashko).

- Runtime PM documentation update to make it describe what needs to
be done during device removal more precisely (Krzysztof Kozlowski).

- Stale comment removal cleanup in the cpufreq-dt driver (Viresh
Kumar).

- turbostat utility fixes and support for Broxton, Skylake-X and
Kabylake processors (Len Brown)"

* tag 'pm+acpi-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (28 commits)
PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
tools/power turbostat: work around RC6 counter wrap
tools/power turbostat: initial KBL support
tools/power turbostat: initial SKX support
tools/power turbostat: decode BXT TSC frequency via CPUID
tools/power turbostat: initial BXT support
tools/power turbostat: print IRTL MSRs
tools/power turbostat: SGX state should print only if --debug
intel_idle: Add KBL support
intel_idle: Add SKX support
intel_idle: Clean up all registered devices on exit.
intel_idle: Propagate hot plug errors.
intel_idle: Don't overreact to a cpuidle registration failure.
intel_idle: Setup the timer broadcast only on successful driver load.
intel_idle: Avoid a double free of the per-CPU data.
intel_idle: Fix dangling registration on error path.
intel_idle: Fix deallocation order on the driver exit path.
intel_idle: Remove redundant initialization calls.
intel_idle: Fix a helper function's return value.
intel_idle: remove useless return from void function.
...

Linus Torvalds
2016-04-10 02:03:48 +0800

09 Apr, 2016

1 commit

73659be76 Merge branches 'pm-core', 'powercap' and 'pm-tools' ... Browse Code »

* pm-core:
PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
PM / runtime: Document steps for device removal

* powercap:
powercap: intel_rapl: Add missing Haswell model

* pm-tools:
tools/power turbostat: work around RC6 counter wrap
tools/power turbostat: initial KBL support
tools/power turbostat: initial SKX support
tools/power turbostat: decode BXT TSC frequency via CPUID
tools/power turbostat: initial BXT support
tools/power turbostat: print IRTL MSRs
tools/power turbostat: SGX state should print only if --debug

Rafael J. Wysocki
2016-04-09 03:46:56 +0800

08 Apr, 2016

1 commit

5a63426e2 tools/power turbostat: print IRTL MSRs ... Browse Code »

Some processors use the Interrupt Response Time Limit (IRTL) MSR value
to describe the maximum IRQ response time latency for deep
package C-states. (Though others have the register, but do not use it)
Lets print it out to give insight into the cases where it is used.

IRTL begain in SNB, with PC3/PC6/PC7, and HSW added PC8/PC9/PC10.

Signed-off-by: Len Brown
Signed-off-by: Rafael J. Wysocki

Len Brown
2016-04-08 04:18:32 +0800

07 Apr, 2016

1 commit

c4004b02f x86: remove the kernel code/data/bss resources from /proc/iomem ... Browse Code »

Let's see if anybody even notices. I doubt anybody uses this, and it
does expose addresses that should be randomized, so let's just remove
the code. It's old and traditional, and it used to be cute, but we
should have removed this long ago.

If it turns out anybody notices and this breaks something, we'll have to
revert this, and maybe we'll end up using other approaches instead
(using %pK or similar). But removing unnecessary code is always the
preferred option.

Noted-by: Emrah Demir
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-04-07 04:45:07 +0800

06 Apr, 2016

1 commit

541d8f4d5 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM fixes from Paolo Bonzini:
"Miscellaneous bugfixes.

The ARM and s390 fixes are for new regressions from the merge window,
others are usual stable material"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
compiler-gcc: disable -ftracer for __noclone functions
kvm: x86: make lapic hrtimer pinned
s390/mm/kvm: fix mis-merge in gmap handling
kvm: set page dirty only if page has been writable
KVM: x86: reduce default value of halt_poll_ns parameter
KVM: Hyper-V: do not do hypercall userspace exits if SynIC is disabled
KVM: x86: Inject pending interrupt even if pending nmi exist
arm64: KVM: Register CPU notifiers when the kernel runs at HYP
arm64: kvm: 4.6-rc1: Fix VTCR_EL2 VS setting

Linus Torvalds
2016-04-06 07:16:00 +0800

05 Apr, 2016

2 commits

61abdbe0b kvm: x86: make lapic hrtimer pinned ... Browse Code »

When a vCPU runs on a nohz_full core, the hrtimer used by
the lapic emulation code can be migrated to another core.
When this happens, it's possible to observe milisecond
latency when delivering timer IRQs to KVM guests.

The huge latency is mainly due to the fact that
apic_timer_fn() expects to run during a kvm exit. It
sets KVM_REQ_PENDING_TIMER and let it be handled on kvm
entry. However, if the timer fires on a different core,
we have to wait until the next kvm exit for the guest
to see KVM_REQ_PENDING_TIMER set.

This problem became visible after commit 9642d18ee. This
commit changed the timer migration code to always attempt
to migrate timers away from nohz_full cores. While it's
discussable if this is correct/desirable (I don't think
it is), it's clear that the lapic emulation code has
a requirement on firing the hrtimer in the same core
where it was started. This is achieved by making the
hrtimer pinned.

Lastly, note that KVM has code to migrate timers when a
vCPU is scheduled to run in different core. However, this
forced migration may fail. When this happens, we can have
the same problem. If we want 100% correctness, we'll have
to modify apic_timer_fn() to cause a kvm exit when it runs
on a different core than the vCPU. Not sure if this is
possible.

Here's a reproducer for the issue being fixed:

1. Set all cores but core0 to be nohz_full cores
2. Start a guest with a single vCPU
3. Trace apic_timer_fn() and kvm_inject_apic_timer_irqs()

You'll see that apic_timer_fn() will run in core0 while
kvm_inject_apic_timer_irqs() runs in a different core. If
you get both on core0, try running a program that takes 100%
of the CPU and pin it to core0 to force the vCPU out.

Signed-off-by: Luiz Capitulino
Signed-off-by: Paolo Bonzini

Luiz Capitulino
2016-04-05 20:19:08 +0800
93e2aeaca Merge tag 'for-linus-4.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip ... Browse Code »

Pull xen fixes from David Vrabel:
"Regression and bug fixes for 4.6-rc2:

- safely migrate event channels between CPUs
- fix CPU hotplug
- maintainer changes"

* tag 'for-linus-4.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
MAINTAINERS: xen: Konrad to step down and Juergen to pick up
xen/events: Mask a moving irq
Xen on ARM and ARM64: update MAINTAINERS info
xen/x86: Call cpu_startup_entry(CPUHP_AP_ONLINE_IDLE) from xen_play_dead()
xen/apic: Provide Xen-specific version of cpu_present_to_apicid APIC op

Linus Torvalds
2016-04-05 07:38:36 +0800

03 Apr, 2016

2 commits

4c3b73c6a Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Misc kernel side fixes:

- fix event leak
- fix AMD PMU driver bug
- fix core event handling bug
- fix build bug on certain randconfigs

Plus misc tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd/ibs: Fix pmu::stop() nesting
perf/core: Don't leak event in the syscall error path
perf/core: Fix time tracking bug with multiplexing
perf jit: genelf makes assumptions about endian
perf hists: Fix determination of a callchain node's childlessness
perf tools: Add missing initialization of perf_sample.cpumode in synthesized samples
perf tools: Fix build break on powerpc
perf/x86: Move events_sysfs_show() outside CPU_SUP_INTEL
perf bench: Fix detached tarball building due to missing 'perf bench memcpy' headers
perf tests: Fix tarpkg build test error output redirection

Linus Torvalds
2016-04-03 20:22:12 +0800
30cebb6ca Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Thomas Gleixner:
"This lot contains:

- Some fixups for the fallout of the topology consolidation which
unearthed AMD/Intel inconsistencies
- Documentation for the x86 topology management
- Support for AMD advanced power management bits
- Two simple cleanups removing duplicated code"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add advanced power management bits
x86/thread_info: Merge two !__ASSEMBLY__ sections
x86/cpufreq: Remove duplicated TDP MSR macro definitions
x86/Documentation: Start documenting x86 topology
x86/cpu: Get rid of compute_unit_id
perf/x86/amd: Cleanup Fam10h NB event constraints
x86/topology: Fix AMD core count

Linus Torvalds
2016-04-03 19:32:28 +0800

02 Apr, 2016

4 commits

1826907c1 Merge tag 'pm+acpi-4.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management and ACPI fix from Rafael J. Wysocki:
"Just one fix for a nasty boot failure on some systems based on Intel
Skylake that shipped with broken firmware where enabling
hardware-coordinated P-states management (HWP) causes a faulty
interrupt handler in SMM to be invoked and crash the system (Srinivas
Pandruvada)"

* tag 'pm+acpi-4.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / processor: Request native thermal interrupt handling via _OSC

Linus Torvalds
2016-04-02 08:52:10 +0800
8fbd4ade9 Merge branch 'acpi-processor' ... Browse Code »

* acpi-processor:
ACPI / processor: Request native thermal interrupt handling via _OSC

Rafael J. Wysocki
2016-04-02 07:17:36 +0800
858eaaa71 mm/rmap: batched invalidations should use existing api ... Browse Code »

The recently introduced batched invalidations mechanism uses its own
mechanism for shootdown. However, it does wrong accounting of
interrupts (e.g., inc_irq_stat is called for local invalidations),
trace-points (e.g., TLB_REMOTE_SHOOTDOWN for local invalidations) and
may break some platforms as it bypasses the invalidation mechanisms of
Xen and SGI UV.

This patch reuses the existing TLB flushing mechnaisms instead. We use
NULL as mm to indicate a global invalidation is required.

Fixes 72b252aed506b8 ("mm: send one IPI per CPU to TLB flush all entries after unmapping pages")
Signed-off-by: Nadav Amit
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Dave Hansen
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nadav Amit
2016-04-02 06:03:37 +0800
18c98243d x86/mm: TLB_REMOTE_SEND_IPI should count pages ... Browse Code »

TLB_REMOTE_SEND_IPI was recently introduced, but it counts bytes instead
of pages. In addition, it does not report correctly the case in which
flush_tlb_page flushes a page. Fix it to be consistent with other TLB
counters.

Fixes: 5b74283ab251b9d ("x86, mm: trace when an IPI is about to be sent")
Signed-off-by: Nadav Amit
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Dave Hansen
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nadav Amit
2016-04-02 06:03:37 +0800

01 Apr, 2016

4 commits

14f476056 kvm: set page dirty only if page has been writable ... Browse Code »

In absence of shadow dirty mask, there is no need to set page dirty
if page has never been writable. This is a tiny optimization but
good to have for people who care much about dirty page tracking.

Signed-off-by: Yu Zhao
Signed-off-by: Paolo Bonzini

Yu Zhao
2016-04-01 18:10:10 +0800
14ebda339 KVM: x86: reduce default value of halt_poll_ns parameter ... Browse Code »

Windows lets applications choose the frequency of the timer tick,
and in Windows 10 the maximum rate was changed from 1024 Hz to
2048 Hz. Unfortunately, because of the way the Windows API
works, most applications who need a higher rate than the default
64 Hz will just do

timeGetDevCaps(&tc, sizeof(tc));
timeBeginPeriod(tc.wPeriodMin);

and pick the maximum rate. This causes very high CPU usage when
playing media or games on Windows 10, even if the guest does not
actually use the CPU very much, because the frequent timer tick
causes halt_poll_ns to kick in.

There is no really good solution, especially because Microsoft
could sooner or later bump the limit to 4096 Hz, but for now
the best we can do is lower a bit the upper limit for
halt_poll_ns. :-(

Reported-by: Jon Panozzo
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini

Paolo Bonzini
2016-04-01 18:10:10 +0800
a2b5c3c0c KVM: Hyper-V: do not do hypercall userspace exits if SynIC is disabled ... Browse Code »

If SynIC is disabled, there is nothing that userspace can do to
handle these exits; on the other hand, userspace probably will
not know about KVM_EXIT_HYPERV_HCALL and complain about it or
even exit. Just prevent anything bad from happening by handling
the hypercall in KVM and returning an "invalid hypercall" code.

Fixes: 83326e43f27e9a8a501427a0060f8af519a39bb2
Cc: Andrey Smetanin
Reviewed-by: Roman Kagan
Signed-off-by: Paolo Bonzini

Paolo Bonzini
2016-04-01 18:10:09 +0800
321c5658c KVM: x86: Inject pending interrupt even if pending nmi exist ... Browse Code »

Non maskable interrupts (NMI) are preferred to interrupts in current
implementation. If a NMI is pending and NMI is blocked by the result
of nmi_allowed(), pending interrupt is not injected and
enable_irq_window() is not executed, even if interrupts injection is
allowed.

In old kernel (e.g. 2.6.32), schedule() is often called in NMI context.
In this case, interrupts are needed to execute iret that intends end
of NMI. The flag of blocking new NMI is not cleared until the guest
execute the iret, and interrupts are blocked by pending NMI. Due to
this, iret can't be invoked in the guest, and the guest is starved
until block is cleared by some events (e.g. canceling injection).

This patch injects pending interrupts, when it's allowed, even if NMI
is blocked. And, If an interrupts is pending after executing
inject_pending_event(), enable_irq_window() is executed regardless of
NMI pending counter.

Cc: stable@vger.kernel.org
Signed-off-by: Yuki Shibuya
Suggested-by: Paolo Bonzini
Signed-off-by: Paolo Bonzini

Yuki Shibuya
2016-04-01 18:10:09 +0800

31 Mar, 2016

1 commit

85dc60026 perf/x86/amd/ibs: Fix pmu::stop() nesting ... Browse Code »

Patch 5a50f5291701 ("perf/x86/ibs: Fix race with IBS_STARTING state")
closed a big hole while opening another, smaller hole.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Fixes: 5a50f5291701 ("perf/x86/ibs: Fix race with IBS_STARTING state")
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-03-31 15:54:08 +0800

29 Mar, 2016

9 commits

dc6416f1d xen/x86: Call cpu_startup_entry(CPUHP_AP_ONLINE_IDLE) from xen_play_dead() ... Browse Code »

This call has always been missing from xen_play dead() but until
recently this was rather benign. With new cpu hotplug framework
(commit 8df3e07e7f21 ("cpu/hotplug: Let upcoming cpu bring itself fully up").
however this call is required, otherwise a hot-plugged CPU will not
be properly brough up (by never calling cpuhp_online_idle())

Signed-off-by: Boris Ostrovsky
Signed-off-by: Konrad Rzeszutek Wilk

Boris Ostrovsky
2016-03-29 21:34:10 +0800
8041dcc88 Merge tag 'v4.6-rc1' into for-linus-4.6 ... Browse Code »

Linux 4.6-rc1

* tag 'v4.6-rc1': (12823 commits)
Linux 4.6-rc1
f2fs/crypto: fix xts_tweak initialization
NTB: Remove _addr functions from ntb_hw_amd
orangefs: fix orangefs_superblock locking
orangefs: fix do_readv_writev() handling of error halfway through
orangefs: have ->kill_sb() evict the VFS side of things first
orangefs: sanitize ->llseek()
orangefs-bufmap.h: trim unused junk
orangefs: saner calling conventions for getting a slot
orangefs_copy_{to,from}_bufmap(): don't pass bufmap pointer
orangefs: get rid of readdir_handle_s
thp: fix typo in khugepaged_scan_pmd()
MAINTAINERS: fill entries for KASAN
mm/filemap: generic_file_read_iter(): check for zero reads unconditionally
kasan: test fix: warn if the UAF could not be detected in kmalloc_uaf2
mm, kasan: stackdepot implementation. Enable stackdepot for SLAB
arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections
mm, kasan: add GFP flags to KASAN API
mm, kasan: SLAB support
kasan: modify kmalloc_large_oob_right(), add kmalloc_pagealloc_oob_right()
...

Konrad Rzeszutek Wilk
2016-03-29 21:33:47 +0800
34a4cceb7 x86/cpu: Add advanced power management bits ... Browse Code »

Bit 11 of CPUID 8000_0007 edx is processor feedback interface.
Bit 12 of CPUID 8000_0007 edx is accumulated power.

Print proper names in proc/cpuinfo

Reported-and-tested-by: Borislav Petkov
Signed-off-by: Huang Rui
Cc: Tony Li
Cc: Fenghua Yu
Cc: Tony Luck
Cc: Peter Zijlstra
Cc: "Rafael J. Wysocki"
Cc: Andy Lutomirski
Cc: Fengguang Wu
Cc: Sherry Hurwitz
Cc: Borislav Petkov
Cc: "Len Brown"
Link: http://lkml.kernel.org/r/1458871720-3209-1-git-send-email-ray.huang@amd.com
Signed-off-by: Thomas Gleixner

Huang Rui
2016-03-29 17:12:11 +0800
5f870a3f7 x86/thread_info: Merge two !__ASSEMBLY__ sections ... Browse Code »

We have

#ifndef __ASSEMBLY__
...
#endif

#ifndef __ASSEMBLY__
...
#endif

Merge the two.

No functionality change.

Signed-off-by: Borislav Petkov
Link: http://lkml.kernel.org/r/1459189217-25532-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner

Borislav Petkov
2016-03-29 17:12:10 +0800
4a6772f51 x86/cpufreq: Remove duplicated TDP MSR macro definitions ... Browse Code »

The list of CPU model specific registers contains two copies of TDP
registers, remove the one, which is out of numerical order in the
list.

Fixes: 6a35fc2d6c22 ("cpufreq: intel_pstate: get P1 from TAR when available")
Signed-off-by: Vladimir Zapolskiy
Cc: Len Brown
Cc: "Rafael J. Wysocki"
Cc: Kristen Carlson
Accardi
Cc: Srinivas Pandruvada
Link: http://lkml.kernel.org/r/1459018020-24577-1-git-send-email-vladimir_zapolskiy@mentor.com
Signed-off-by: Thomas Gleixner

Vladimir Zapolskiy
2016-03-29 17:12:10 +0800
8196dab4f x86/cpu: Get rid of compute_unit_id ... Browse Code »

It is cpu_core_id anyway.

Signed-off-by: Borislav Petkov
Link: http://lkml.kernel.org/r/1458917557-8757-3-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner

Borislav Petkov
2016-03-29 16:45:04 +0800
32b62f446 perf/x86/amd: Cleanup Fam10h NB event constraints ... Browse Code »

Avoid allocating the AMD NB event constraints data structure when not
needed. This gets rid of x86_max_cores usage and avoids allocating
this on AMD Core Perfctr supporting hardware (which has separate MSRs
for NB events).

Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Borislav Petkov
Cc: aherrmann@suse.com
Cc: Rui Huang
Cc: Borislav Petkov
Cc: jencce.kernel@gmail.com
Link: http://lkml.kernel.org/r/20160320124629.GY6375@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2016-03-29 16:45:04 +0800
ee6825c80 x86/topology: Fix AMD core count ... Browse Code »

It turns out AMD gets x86_max_cores wrong when there are compute
units.

The issue is that Linux assumes:

nr_logical_cpus = nr_cores * nr_siblings

But AMD reports its CU unit as 2 cores, but then sets num_smp_siblings
to 2 as well.

Boris: fixup ras/mce_amd_inj.c too, to compute the Node Base Core
properly, according to the new nomenclature.

Fixes: 1f12e32f4cd5 ("x86/topology: Create logical package id")
Reported-by: Xiong Zhou
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Borislav Petkov
Cc: Andreas Herrmann
Cc: Andy Lutomirski
Link: http://lkml.kernel.org/r/20160317095220.GO6344@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2016-03-29 16:45:04 +0800
fc0c20281 x86, pmem: use memcpy_mcsafe() for memcpy_from_pmem() ... Browse Code »

Update the definition of memcpy_from_pmem() to return 0 or a negative
error code. Implement x86/arch_memcpy_from_pmem() with memcpy_mcsafe().

Cc: Borislav Petkov
Cc: Tony Luck
Cc: Thomas Gleixner
Cc: Andy Lutomirski
Cc: Peter Zijlstra
Cc: Andrew Morton
Cc: Linus Torvalds
Acked-by: Ingo Molnar
Reviewed-by: Ross Zwisler
Signed-off-by: Dan Williams

Dan Williams
2016-03-29 08:19:31 +0800

26 Mar, 2016

3 commits

a21211672 ACPI / processor: Request native thermal interrupt handling via _OSC ... Browse Code »

There are several reports of freeze on enabling HWP (Hardware PStates)
feature on Skylake-based systems by the Intel P-states driver. The root
cause is identified as the HWP interrupts causing BIOS code to freeze.

HWP interrupts use the thermal LVT which can be handled by Linux
natively, but on the affected Skylake-based systems SMM will respond
to it by default. This is a problem for several reasons:
- On the affected systems the SMM thermal LVT handler is broken (it
will crash when invoked) and a BIOS update is necessary to fix it.
- With thermal interrupt handled in SMM we lose all of the reporting
features of the arch/x86/kernel/cpu/mcheck/therm_throt driver.
- Some thermal drivers like x86-package-temp depend on the thermal
threshold interrupts signaled via the thermal LVT.
- The HWP interrupts are useful for debugging and tuning
performance (if the kernel can handle them).
The native handling of thermal interrupts needs to be enabled
because of that.

This requires some way to tell SMM that the OS can handle thermal
interrupts. That can be done by using _OSC/_PDC in processor
scope very early during ACPI initialization.

The meaning of _OSC/_PDC bit 12 in processor scope is whether or
not the OS supports native handling of interrupts for Collaborative
Processor Performance Control (CPPC) notifications. Since on
HWP-capable systems CPPC is a firmware interface to HWP, setting
this bit effectively tells the firmware that the OS will handle
thermal interrupts natively going forward.

For details on _OSC/_PDC refer to:
http://www.intel.com/content/www/us/en/standards/processor-vendor-specific-acpi-specification.html

To implement the _OSC/_PDC handshake as described, introduce a new
function, acpi_early_processor_osc(), that walks the ACPI
namespace looking for ACPI processor objects and invokes _OSC for
them with bit 12 in the capabilities buffer set and terminates the
namespace walk on the first success.

Also modify intel_thermal_interrupt() to clear HWP status bits in
the HWP_STATUS MSR to acknowledge HWP interrupts (which prevents
them from firing continuously).

Signed-off-by: Srinivas Pandruvada
[ rjw: Subject & changelog, function rename ]
Signed-off-by: Rafael J. Wysocki

Srinivas Pandruvada
2016-03-26 09:00:38 +0800
cd11016e5 mm, kasan: stackdepot implementation. Enable stackdepot for SLAB ... Browse Code »

Implement the stack depot and provide CONFIG_STACKDEPOT. Stack depot
will allow KASAN store allocation/deallocation stack traces for memory
chunks. The stack traces are stored in a hash table and referenced by
handles which reside in the kasan_alloc_meta and kasan_free_meta
structures in the allocated memory chunks.

IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
duplication.

Right now stackdepot support is only enabled in SLAB allocator. Once
KASAN features in SLAB are on par with those in SLUB we can switch SLUB
to stackdepot as well, thus removing the dependency on SLUB stack
bookkeeping, which wastes a lot of memory.

This patch is based on the "mm: kasan: stack depots" patch originally
prepared by Dmitry Chernenkov.

Joonsoo has said that he plans to reuse the stackdepot code for the
mm/page_owner.c debugging facility.

[akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
[aryabinin@virtuozzo.com: comment style fixes]
Signed-off-by: Alexander Potapenko
Signed-off-by: Andrey Ryabinin
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Andrey Konovalov
Cc: Dmitry Vyukov
Cc: Steven Rostedt
Cc: Konstantin Serebryany
Cc: Dmitry Chernenkov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Potapenko
2016-03-26 07:37:42 +0800
be7635e72 arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections ... Browse Code »

KASAN needs to know whether the allocation happens in an IRQ handler.
This lets us strip everything below the IRQ entry point to reduce the
number of unique stack traces needed to be stored.

Move the definition of __irq_entry to so that the
users don't need to pull in . Also introduce the
__softirq_entry macro which is similar to __irq_entry, but puts the
corresponding functions to the .softirqentry.text section.

Signed-off-by: Alexander Potapenko
Acked-by: Steven Rostedt
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Andrey Konovalov
Cc: Dmitry Vyukov
Cc: Andrey Ryabinin
Cc: Konstantin Serebryany
Cc: Dmitry Chernenkov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexander Potapenko
2016-03-26 07:37:42 +0800

25 Mar, 2016

5 commits

ed6069be7 xen/apic: Provide Xen-specific version of cpu_present_to_apicid APIC op ... Browse Code »

Currently Xen uses default_cpu_present_to_apicid() which will always
report BAD_APICID for PV guests since x86_bios_cpu_apic_id is initialised
to that value and is never updated.

With commit 1f12e32f4cd5 ("x86/topology: Create logical package id"), this
op is now called by smp_init_package_map() when deciding whether to call
topology_update_package_map() which sets cpu_data(cpu).logical_proc_id.
The latter (as topology_logical_package_id(cpu)) may be used, for example,
by cpu_to_rapl_pmu() as an array index. Since uninitialized
logical_package_id is set to -1, the index will become 64K which is
obviously problematic.

While RAPL code (and any other users of logical_package_id) should be
careful in their assumptions about id's validity, Xen's
cpu_present_to_apicid op should still provide value consistent with its
own xen_apic_read(APIC_ID).

Signed-off-by: Boris Ostrovsky
Signed-off-by: Konrad Rzeszutek Wilk

Boris Ostrovsky
2016-03-25 23:42:53 +0800
a49ac9f83 perf/x86: Move events_sysfs_show() outside CPU_SUP_INTEL ... Browse Code »

randconfig builds can sometimes disable CONFIG_CPU_SUP_INTEL while
enabling the AMD power reporting PMU driver, resulting in this
build failure:

arch/x86/kernel/cpu/perf_event.h:663:31: error: 'events_sysfs_show' undeclared here (not in a function)

To fix it, move events_sysfs_show() outside of #ifdef CONFIG_CPU_SUP_INTEL.

Reported-by: Randy Dunlap
Reported-by: build test robot
Signed-off-by: Huang Rui
Cc: Borislav Petkov
Cc: Borislav Petkov
Cc: Fengguang Wu
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Sherry Hurwitz
Cc: Stephen Rothwell
Cc: Thomas Gleixner
Cc: acme@kernel.org
Cc: kbuild-all@01.org
Cc: linux-next@vger.kernel.org
Cc: spg_linux_kernel@amd.com
Link: http://lkml.kernel.org/r/1458875905-4278-1-git-send-email-ray.huang@amd.com
Signed-off-by: Ingo Molnar

Huang Rui
2016-03-25 16:46:53 +0800
e46b4e2b4 Merge tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing updates from Steven Rostedt:
"Nothing major this round. Mostly small clean ups and fixes.

Some visible changes:

- A new flag was added to distinguish traces done in NMI context.

- Preempt tracer now shows functions where preemption is disabled but
interrupts are still enabled.

Other notes:

- Updates were done to function tracing to allow better performance
with perf.

- Infrastructure code has been added to allow for a new histogram
feature for recording live trace event histograms that can be
configured by simple user commands. The feature itself was just
finished, but needs a round in linux-next before being pulled.

This only includes some infrastructure changes that will be needed"

* tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (22 commits)
tracing: Record and show NMI state
tracing: Fix trace_printk() to print when not using bprintk()
tracing: Remove redundant reset per-CPU buff in irqsoff tracer
x86: ftrace: Fix the misleading comment for arch/x86/kernel/ftrace.c
tracing: Fix crash from reading trace_pipe with sendfile
tracing: Have preempt(irqs)off trace preempt disabled functions
tracing: Fix return while holding a lock in register_tracer()
ftrace: Use kasprintf() in ftrace_profile_tracefs()
ftrace: Update dynamic ftrace calls only if necessary
ftrace: Make ftrace_hash_rec_enable return update bool
tracing: Fix typoes in code comment and printk in trace_nop.c
tracing, writeback: Replace cgroup path to cgroup ino
tracing: Use flags instead of bool in trigger structure
tracing: Add an unreg_all() callback to trigger commands
tracing: Add needs_rec flag to event triggers
tracing: Add a per-event-trigger 'paused' field
tracing: Add get_syscall_name()
tracing: Add event record param to trigger_ops.func()
tracing: Make event trigger functions available
tracing: Make ftrace_event_field checking functions available
...

Linus Torvalds
2016-03-25 01:52:25 +0800
3fa2fe2ce Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"This tree contains various perf fixes on the kernel side, plus three
hw/event-enablement late additions:

- Intel Memory Bandwidth Monitoring events and handling
- the AMD Accumulated Power Mechanism reporting facility
- more IOMMU events

... and a final round of perf tooling updates/fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
perf llvm: Use strerror_r instead of the thread unsafe strerror one
perf llvm: Use realpath to canonicalize paths
perf tools: Unexport some methods unused outside strbuf.c
perf probe: No need to use formatting strbuf method
perf help: Use asprintf instead of adhoc equivalents
perf tools: Remove unused perf_pathdup, xstrdup functions
perf tools: Do not include stringify.h from the kernel sources
tools include: Copy linux/stringify.h from the kernel
tools lib traceevent: Remove redundant CPU output
perf tools: Remove needless 'extern' from function prototypes
perf tools: Simplify die() mechanism
perf tools: Remove unused DIE_IF macro
perf script: Remove lots of unused arguments
perf thread: Rename perf_event__preprocess_sample_addr to thread__resolve
perf machine: Rename perf_event__preprocess_sample to machine__resolve
perf tools: Add cpumode to struct perf_sample
perf tests: Forward the perf_sample in the dwarf unwind test
perf tools: Remove misplaced __maybe_unused
perf list: Fix documentation of :ppp
perf bench numa: Fix assertion for nodes bitfield
...

Linus Torvalds
2016-03-25 01:02:14 +0800
d88f48e12 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Ingo Molnar:
"Misc fixes:

- fix hotplug bugs
- fix irq live lock
- fix various topology handling bugs
- fix APIC ACK ordering
- fix PV iopl handling
- fix speling
- fix/tweak memcpy_mcsafe() return value
- fix fbcon bug
- remove stray prototypes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/msr: Remove unused native_read_tscp()
x86/apic: Remove declaration of unused hw_nmi_is_cpu_stuck
x86/oprofile/nmi: Add missing hotplug FROZEN handling
x86/hpet: Use proper mask to modify hotplug action
x86/apic/uv: Fix the hotplug notifier
x86/apb/timer: Use proper mask to modify hotplug action
x86/topology: Use total_cpus not nr_cpu_ids for logical packages
x86/topology: Fix Intel HT disable
x86/topology: Fix logical package mapping
x86/irq: Cure live lock in fixup_irqs()
x86/tsc: Prevent NULL pointer deref in calibrate_delay_is_known()
x86/apic: Fix suspicious RCU usage in smp_trace_call_function_interrupt()
x86/iopl: Fix iopl capability check on Xen PV
x86/iopl/64: Properly context-switch IOPL on Xen PV
selftests/x86: Add an iopl test
x86/mm, x86/mce: Fix return type/value for memcpy_mcsafe()
x86/video: Don't assume all FB devices are PCI devices
arch/x86/irq: Purge useless handler declarations from hw_irq.h
x86: Fix misspellings in comments

Linus Torvalds
2016-03-25 00:47:32 +0800

23 Mar, 2016

5 commits

9da77666d x86/msr: Remove unused native_read_tscp() ... Browse Code »

After e76b027 ("x86,vdso: Use LSL unconditionally for vgetcpu")
native_read_tscp() is unused in the kernel. The function can be removed like
native_read_tsc() was.

Signed-off-by: Prarit Bhargava
Acked-by: Andy Lutomirski
Cc: Borislav Petkov
Link: http://lkml.kernel.org/r/1458687968-9106-1-git-send-email-prarit@redhat.com
Signed-off-by: Thomas Gleixner

Prarit Bhargava
2016-03-23 19:34:17 +0800
a24e3d414 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge third patch-bomb from Andrew Morton:

- more ocfs2 changes

- a few hotfixes

- Andy's compat cleanups

- misc fixes to fatfs, ptrace, coredump, cpumask, creds, eventfd,
panic, ipmi, kgdb, profile, kfifo, ubsan, etc.

- many rapidio updates: fixes, new drivers.

- kcov: kernel code coverage feature. Like gcov, but not
"prohibitively expensive".

- extable code consolidation for various archs

* emailed patches from Andrew Morton : (81 commits)
ia64/extable: use generic search and sort routines
x86/extable: use generic search and sort routines
s390/extable: use generic search and sort routines
alpha/extable: use generic search and sort routines
kernel/...: convert pr_warning to pr_warn
drivers: dma-coherent: use memset_io for DMA_MEMORY_IO mappings
drivers: dma-coherent: use MEMREMAP_WC for DMA_MEMORY_MAP
memremap: add MEMREMAP_WC flag
memremap: don't modify flags
kernel/signal.c: add compile-time check for __ARCH_SI_PREAMBLE_SIZE
mm/mprotect.c: don't imply PROT_EXEC on non-exec fs
ipc/sem: make semctl setting sempid consistent
ubsan: fix tree-wide -Wmaybe-uninitialized false positives
kfifo: fix sparse complaints
scripts/gdb: account for changes in module data structure
scripts/gdb: add cmdline reader command
scripts/gdb: add version command
kernel: add kcov code coverage
profile: hide unused functions when !CONFIG_PROC_FS
hpwdt: use nmi_panic() when kernel panics in NMI handler
...

Linus Torvalds
2016-03-23 08:09:14 +0800
b91d9c671 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull more KVM updates from Paolo Bonzini:
"Second round of KVM changes for 4.6:

- build fixes for PPC KVM
- miscellaneous bugfixes for ARM KVM
- cleanup of memory barrier and removal of redundant barriers
- x86 fixes: page tracking oops, support for old buggy KVM nested on 4.5
- support for protection keys in guests
- lockdep fix
- another conversion to simple wait queues and raw spinlocks,
backported from PREEMPT_RT"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits)
KVM: page_track: fix access to NULL slot
KVM: PPC: do not compile in vfio.o unconditionally
kvm, rt: change async pagefault code locking for PREEMPT_RT
KVM/PPC: update the comment of memory barrier in the kvmppc_prepare_to_enter()
KVM/x86: update the comment of memory barrier in the vcpu_enter_guest()
KVM: Replace smp_mb() with smp_load_acquire() in the kvm_flush_remote_tlbs()
KVM/x86: Call smp_wmb() before increasing tlbs_dirty
KVM: Replace smp_mb() with smp_mb_after_atomic() in the kvm_make_all_cpus_request()
KVM/x86: Replace smp_mb() with smp_store_mb/release() in the walk_shadow_page_lockless_begin/end()
KVM: Remove redundant smp_mb() in the kvm_mmu_commit_zap_page()
KVM, pkeys: expose CPUID/CR4 to guest
KVM, pkeys: add pkeys support for permission_fault
KVM, pkeys: introduce pkru_mask to cache conditions
KVM, pkeys: save/restore PKRU when guest/host switches
x86: pkey: introduce write_pkru() for KVM
KVM, pkeys: add pkeys support for xsave state
KVM, pkeys: disable pkeys for guests in non-paging mode
KVM: x86: remove magic number with enum cpuid_leafs
KVM: MMU: return page fault error code from permission_fault
KVM: fix spin_lock_init order on x86
...

Linus Torvalds
2016-03-23 07:28:22 +0800
29934b0fb x86/extable: use generic search and sort routines ... Browse Code »

Replace the arch specific versions of search_extable() and
sort_extable() with calls to the generic ones, which now support
relative exception tables as well.

Signed-off-by: Ard Biesheuvel
Acked-by: H. Peter Anvin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ard Biesheuvel
2016-03-23 06:36:02 +0800
5c9a8750a kernel: add kcov code coverage ... Browse Code »

kcov provides code coverage collection for coverage-guided fuzzing
(randomized testing). Coverage-guided fuzzing is a testing technique
that uses coverage feedback to determine new interesting inputs to a
system. A notable user-space example is AFL
(http://lcamtuf.coredump.cx/afl/). However, this technique is not
widely used for kernel testing due to missing compiler and kernel
support.

kcov does not aim to collect as much coverage as possible. It aims to
collect more or less stable coverage that is function of syscall inputs.
To achieve this goal it does not collect coverage in soft/hard
interrupts and instrumentation of some inherently non-deterministic or
non-interesting parts of kernel is disbled (e.g. scheduler, locking).

Currently there is a single coverage collection mode (tracing), but the
API anticipates additional collection modes. Initially I also
implemented a second mode which exposes coverage in a fixed-size hash
table of counters (what Quentin used in his original patch). I've
dropped the second mode for simplicity.

This patch adds the necessary support on kernel side. The complimentary
compiler support was added in gcc revision 231296.

We've used this support to build syzkaller system call fuzzer, which has
found 90 kernel bugs in just 2 months:

https://github.com/google/syzkaller/wiki/Found-Bugs

We've also found 30+ bugs in our internal systems with syzkaller.
Another (yet unexplored) direction where kcov coverage would greatly
help is more traditional "blob mutation". For example, mounting a
random blob as a filesystem, or receiving a random blob over wire.

Why not gcov. Typical fuzzing loop looks as follows: (1) reset
coverage, (2) execute a bit of code, (3) collect coverage, repeat. A
typical coverage can be just a dozen of basic blocks (e.g. an invalid
input). In such context gcov becomes prohibitively expensive as
reset/collect coverage steps depend on total number of basic
blocks/edges in program (in case of kernel it is about 2M). Cost of
kcov depends only on number of executed basic blocks/edges. On top of
that, kernel requires per-thread coverage because there are always
background threads and unrelated processes that also produce coverage.
With inlined gcov instrumentation per-thread coverage is not possible.

kcov exposes kernel PCs and control flow to user-space which is
insecure. But debugfs should not be mapped as user accessible.

Based on a patch by Quentin Casasnovas.

[akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
[akpm@linux-foundation.org: unbreak allmodconfig]
[akpm@linux-foundation.org: follow x86 Makefile layout standards]
Signed-off-by: Dmitry Vyukov
Reviewed-by: Kees Cook
Cc: syzkaller
Cc: Vegard Nossum
Cc: Catalin Marinas
Cc: Tavis Ormandy
Cc: Will Deacon
Cc: Quentin Casasnovas
Cc: Kostya Serebryany
Cc: Eric Dumazet
Cc: Alexander Potapenko
Cc: Kees Cook
Cc: Bjorn Helgaas
Cc: Sasha Levin
Cc: David Drysdale
Cc: Ard Biesheuvel
Cc: Andrey Ryabinin
Cc: Kirill A. Shutemov
Cc: Jiri Slaby
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dmitry Vyukov
2016-03-23 06:36:02 +0800