Eric Lee / smarc-fsl-linux-kernel

30 Dec, 2016

1 commit

b91e1302a mm: optimize PageWaiters bit use for unlock_page() ... Browse Code »

In commit 62906027091f ("mm: add PageWaiters indicating tasks are
waiting for a page bit") Nick Piggin made our page locking no longer
unconditionally touch the hashed page waitqueue, which not only helps
performance in general, but is particularly helpful on NUMA machines
where the hashed wait queues can bounce around a lot.

However, the "clear lock bit atomically and then test the waiters bit"
sequence turns out to be much more expensive than it needs to be,
because you get a nasty stall when trying to access the same word that
just got updated atomically.

On architectures where locking is done with LL/SC, this would be trivial
to fix with a new primitive that clears one bit and tests another
atomically, but that ends up not working on x86, where the only atomic
operations that return the result end up being cmpxchg and xadd. The
atomic bit operations return the old value of the same bit we changed,
not the value of an unrelated bit.

On x86, we could put the lock bit in the high bit of the byte, and use
"xadd" with that bit (where the overflow ends up not touching other
bits), and look at the other bits of the result. However, an even
simpler model is to just use a regular atomic "and" to clear the lock
bit, and then the sign bit in eflags will indicate the resulting state
of the unrelated bit #7.

So by moving the PageWaiters bit up to bit #7, we can atomically clear
the lock bit and test the waiters bit on x86 too. And architectures
with LL/SC (which is all the usual RISC suspects), the particular bit
doesn't matter, so they are fine with this approach too.

This avoids the extra access to the same atomic word, and thus avoids
the costly stall at page unlock time.

The only downside is that the interface ends up being a bit odd and
specialized: clear a bit in a byte, and test the sign bit. Nick doesn't
love the resulting name of the new primitive, but I'd rather make the
name be descriptive and very clear about the limitation imposed by
trying to work across all relevant architectures than make it be some
generic thing that doesn't make the odd semantics explicit.

So this introduces the new architecture primitive

clear_bit_unlock_is_negative_byte();

and adds the trivial implementation for x86. We have a generic
non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
combination) which can be overridden by any architecture that can do
better. According to Nick, Power has the same hickup x86 has, for
example, but some other architectures may not even care.

All these optimizations mean that my page locking stress-test (which is
just executing a lot of small short-lived shell scripts: "make test" in
the git source tree) no longer makes our page locking look horribly bad.
Before all these optimizations, just the unlock_page() costs were just
over 3% of all CPU overhead on "make test". After this, it's down to
0.66%, so just a quarter of the cost it used to be.

(The difference on NUMA is bigger, but there this micro-optimization is
likely less noticeable, since the big issue on NUMA was not the accesses
to 'struct page', but the waitqueue accesses that were already removed
by Nick's earlier commit).

Acked-by: Nick Piggin
Cc: Dave Hansen
Cc: Bob Peterson
Cc: Steven Whitehouse
Cc: Andrew Lutomirski
Cc: Andreas Gruenbacher
Cc: Peter Zijlstra
Cc: Mel Gorman
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-12-30 03:03:15 +0800

27 Dec, 2016

2 commits

0dad3a301 x86/mce/AMD: Make the init code more robust ... Browse Code »

If mce_device_init() fails then the mce device pointer is NULL and the
AMD mce code happily dereferences it.

Add a sanity check.

Reported-by: Markus Trippelsdorf
Reported-by: Boris Ostrovsky
Signed-off-by: Thomas Gleixner
Signed-off-by: Linus Torvalds

Thomas Gleixner
2016-12-27 09:30:24 +0800
b4b8664d2 arm64: don't pull uaccess.h into *.S ... Browse Code »

Split asm-only parts of arm64 uaccess.h into a new header and use that
from *.S.

Signed-off-by: Al Viro

Al Viro
2016-12-27 02:05:17 +0800

26 Dec, 2016

4 commits

8ae679c4b powerpc: Fix build warning on 32-bit PPC ... Browse Code »

I am getting the following warning when I build kernel 4.9-git on my
PowerBook G4 with a 32-bit PPC processor:

AS arch/powerpc/kernel/misc_32.o
arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef]

This problem is evident after commit 989cea5c14be ("kbuild: prevent
lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an
error that has been in the code since 2005 when this source file was
created. That was with commit 9994a33865f4 ("powerpc: Introduce
entry_{32,64}.S, misc_{32,64}.S, systbl.S").

The offending line does not make a lot of sense. This error does not
seem to cause any errors in the executable, thus I am not recommending
that it be applied to any stable versions.

Thanks to Nicholas Piggin for suggesting this solution.

Fixes: 9994a33865f4 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S")
Signed-off-by: Larry Finger
Cc: Nicholas Piggin
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Linus Torvalds

Larry Finger
2016-12-26 08:12:20 +0800
3ddc76dfc Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer type cleanups from Thomas Gleixner:
"This series does a tree wide cleanup of types related to
timers/timekeeping.

- Get rid of cycles_t and use a plain u64. The type is not really
helpful and caused more confusion than clarity

- Get rid of the ktime union. The union has become useless as we use
the scalar nanoseconds storage unconditionally now. The 32bit
timespec alike storage got removed due to the Y2038 limitations
some time ago.

That leaves the odd union access around for no reason. Clean it up.

Both changes have been done with coccinelle and a small amount of
manual mopping up"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ktime: Get rid of ktime_equal()
ktime: Cleanup ktime_set() usage
ktime: Get rid of the union
clocksource: Use a plain u64 instead of cycle_t

Linus Torvalds
2016-12-26 06:30:04 +0800
b272f732f Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull SMP hotplug notifier removal from Thomas Gleixner:
"This is the final cleanup of the hotplug notifier infrastructure. The
series has been reintgrated in the last two days because there came a
new driver using the old infrastructure via the SCSI tree.

Summary:

- convert the last leftover drivers utilizing notifiers

- fixup for a completely broken hotplug user

- prevent setup of already used states

- removal of the notifiers

- treewide cleanup of hotplug state names

- consolidation of state space

There is a sphinx based documentation pending, but that needs review
from the documentation folks"

* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/armada-xp: Consolidate hotplug state space
irqchip/gic: Consolidate hotplug state space
coresight/etm3/4x: Consolidate hotplug state space
cpu/hotplug: Cleanup state names
cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions
staging/lustre/libcfs: Convert to hotplug state machine
scsi/bnx2i: Convert to hotplug state machine
scsi/bnx2fc: Convert to hotplug state machine
cpu/hotplug: Prevent overwriting of callbacks
x86/msr: Remove bogus cleanup from the error path
bus: arm-ccn: Prevent hotplug callback leak
perf/x86/intel/cstate: Prevent hotplug callback leak
ARM/imx/mmcd: Fix broken cpu hotplug handling
scsi: qedi: Convert to hotplug state machine

Linus Torvalds
2016-12-26 06:05:56 +0800
8b0e19531 ktime: Cleanup ktime_set() usage ... Browse Code »

ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra

Thomas Gleixner
2016-12-26 00:21:22 +0800

25 Dec, 2016

6 commits

a5a1d1c29 clocksource: Use a plain u64 instead of cycle_t ... Browse Code »

There is no point in having an extra type for extra confusion. u64 is
unambiguous.

Conversion was done with the following coccinelle script:

@rem@
@@
-typedef u64 cycle_t;

@fix@
typedef cycle_t;
@@
-cycle_t
+u64

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: John Stultz

Thomas Gleixner
2016-12-25 18:04:12 +0800
73c1b41e6 cpu/hotplug: Cleanup state names ... Browse Code »

When the state names got added a script was used to add the extra argument
to the calls. The script basically converted the state constant to a
string, but the cleanup to convert these strings into meaningful ones did
not happen.

Replace all the useless strings with 'subsys/xxx/yyy:state' strings which
are used in all the other places already.

Signed-off-by: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Sebastian Siewior
Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2016-12-25 17:47:44 +0800
59fefd089 x86/msr: Remove bogus cleanup from the error path ... Browse Code »

The error cleanup which is invoked when the hotplug state setup failed
tries to remove the failed state, which is broken.

Fixes: 8fba38c937cd ("x86/msr: Convert to hotplug state machine")
Reported-by: kernel test robot
Signed-off-by: Thomas Gleixner
Cc: Sebastian Siewior

Thomas Gleixner
2016-12-25 17:47:41 +0800
834fcd298 perf/x86/intel/cstate: Prevent hotplug callback leak ... Browse Code »

If the pmu registration fails the registered hotplug callbacks are not
removed. Wrong in any case, but fatal in case of a modular driver.

Replace the nonsensical state names with proper ones while at it.

Fixes: 77c34ef1c319 ("perf/x86/intel/cstate: Convert Intel CSTATE to hotplug state machine")
Signed-off-by: Thomas Gleixner
Cc: Sebastian Siewior
Cc: Peter Zijlstra
Cc: stable@vger.kernel.org

Thomas Gleixner
2016-12-25 17:47:40 +0800
a051f220d ARM/imx/mmcd: Fix broken cpu hotplug handling ... Browse Code »

The cpu hotplug support of this perf driver is broken in several ways:

1) It adds a instance before setting up the state.

2) The state for the instance is different from the state of the
callback. It's just a randomly chosen state.

3) The instance registration is not error checked so nobody noticed that
the call can never succeed.

4) The state for the multi install callbacks is chosen randomly and
overwrites existing state. This is now prevented by the core code so the
call is guaranteed to fail.

5) The error exit path in the init function leaves the instance registered
and then frees the memory which contains the enqueued hlist node.

6) The remove function is removing the state and not the instance.

Fix it by:

- Setting up the state before adding instances. Use a dynamically allocated
state for it.

- Installing instances after the state has been set up

- Removing the instance in the error path before freeing memory

- Removing the instance not the state in the driver remove callback

While at is use raw_cpu_processor_id(), because cpu_processor_id() cannot
be used in preemptible context, and set the driver data after successful
registration of the pmu.

Signed-off-by: Thomas Gleixner
Acked-by: Shawn Guo
Cc: Peter Zijlstra
Cc: Sebastian Siewior
Cc: Frank Li
Cc: Zhengyu Shen
Link: http://lkml.kernel.org/r/20161221192111.596204211@linutronix.de
Signed-off-by: Thomas Gleixner

Thomas Gleixner
2016-12-25 17:47:40 +0800
7c0f6ba68 Replace <asm/uaccess.h> with <linux/uaccess.h> globally ... Browse Code »

This was entirely automated, using the script by Al:

PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
sed -i -e "s!$PATT!#include !" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-12-25 03:46:01 +0800

24 Dec, 2016

4 commits

6ac3bb167 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Ingo Molnar:
"There's a number of fixes:

- a round of fixes for CPUID-less legacy CPUs
- a number of microcode loader fixes
- i8042 detection robustization fixes
- stack dump/unwinder fixes
- x86 SoC platform driver fixes
- a GCC 7 warning fix
- virtualization related fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
Revert "x86/unwind: Detect bad stack return address"
x86/paravirt: Mark unused patch_default label
x86/microcode/AMD: Reload proper initrd start address
x86/platform/intel/quark: Add printf attribute to imr_self_test_result()
x86/platform/intel-mid: Switch MPU3050 driver to IIO
x86/alternatives: Do not use sync_core() to serialize I$
x86/topology: Document cpu_llc_id
x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
x86/asm: Rewrite sync_core() to use IRET-to-self
x86/microcode/intel: Replace sync_core() with native_cpuid()
Revert "x86/boot: Fail the boot if !M486 and CPUID is missing"
x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6
x86/tools: Fix gcc-7 warning in relocs.c
x86/unwind: Dump stack data on warnings
x86/unwind: Adjust last frame check for aligned function stacks
x86/init: Fix a couple of comment typos
x86/init: Remove i8042_detect() from platform ops
Input: i8042 - Trust firmware a bit more when probing on X86
x86/init: Add i8042 state to the platform data
...

Linus Torvalds
2016-12-24 08:54:46 +0800
00198dab3 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"On the kernel side there's two x86 PMU driver fixes and a uprobes fix,
plus on the tooling side there's a number of fixes and some late
updates"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
perf sched timehist: Fix invalid period calculation
perf sched timehist: Remove hardcoded 'comm_width' check at print_summary
perf sched timehist: Enlarge default 'comm_width'
perf sched timehist: Honour 'comm_width' when aligning the headers
perf/x86: Fix overlap counter scheduling bug
perf/x86/pebs: Fix handling of PEBS buffer overflows
samples/bpf: Move open_raw_sock to separate header
samples/bpf: Remove perf_event_open() declaration
samples/bpf: Be consistent with bpf_load_program bpf_insn parameter
tools lib bpf: Add bpf_prog_{attach,detach}
samples/bpf: Switch over to libbpf
perf diff: Do not overwrite valid build id
perf annotate: Don't throw error for zero length symbols
perf bench futex: Fix lock-pi help string
perf trace: Check if MAP_32BIT is defined (again)
samples/bpf: Make perf_event_read() static
uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation
samples/bpf: Make samples more libbpf-centric
tools lib bpf: Add flags to bpf_create_map()
tools lib bpf: use __u32 from linux/types.h
...

Linus Torvalds
2016-12-24 08:49:12 +0800
c280f7736 Revert "x86/unwind: Detect bad stack return address" ... Browse Code »

Revert the following commit:

b6959a362177 ("x86/unwind: Detect bad stack return address")

... because Andrey Konovalov reported an unwinder warning:

WARNING: unrecognized kernel stack return address ffffffffa0000001 at ffff88006377fa18 in a.out:4467

The unwind was initiated from an interrupt which occurred while running in the
generated code for a kprobe. The unwinder printed the warning because it
expected regs->ip to point to a valid text address, but instead it pointed to
the generated code.

Eventually we may want come up with a way to identify generated kprobe
code so the unwinder can know that it's a valid return address. Until
then, just remove the warning.

Reported-by: Andrey Konovalov
Signed-off-by: Josh Poimboeuf
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Masami Hiramatsu
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/02f296848fbf49fb72dfeea706413ecbd9d4caf6.1482418739.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar

Josh Poimboeuf
2016-12-24 03:32:30 +0800
42e0372c0 Merge tag 'arc-4.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc ... Browse Code »

Pull more ARC updates from Vineet Gupta:

- Fix for aliasing VIPT dcache in old ARC700 cores

- micro-optimization in ARC700 ProtV handler

- Enable SG_CHAIN [Vladimir]

- ARC HS38 core intc default to prio 1

* tag 'arc-4.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: mm: arc700: Don't assume 2 colours for aliasing VIPT dcache
ARC: mm: No need to save cache version in @cpuinfo
ARC: enable SG chaining
ARCv2: intc: default all interrupts to priority 1
ARCv2: entry: document intr disable in hard isr
ARC: ARCompact entry: elide re-reading ECR in ProtV handler

Linus Torvalds
2016-12-24 02:22:47 +0800

23 Dec, 2016

5 commits

9be962d52 Merge tag 'acpi-extra-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull more ACPI updates from Rafael Wysocki:
"Here are new versions of two ACPICA changes that were deferred
previously due to a problem they had introduced, two cleanups on top
of them and the removal of a useless warning message from the ACPI
core.

Specifics:

- Move some Linux-specific functionality to upstream ACPICA and
update the in-kernel users of it accordingly (Lv Zheng)

- Drop a useless warning (triggered by the lack of an optional
object) from the ACPI namespace scanning code (Zhang Rui)"

* tag 'acpi-extra-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / osl: Remove deprecated acpi_get_table_with_size()/early_acpi_os_unmap_memory()
ACPI / osl: Remove acpi_get_table_with_size()/early_acpi_os_unmap_memory() users
ACPICA: Tables: Allow FADT to be customized with virtual address
ACPICA: Tables: Back port acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux kernel
ACPI: do not warn if _BQC does not exist

Linus Torvalds
2016-12-23 02:19:32 +0800
eb254f323 Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 cache allocation interface from Thomas Gleixner:
"This provides support for Intel's Cache Allocation Technology, a cache
partitioning mechanism.

The interface is odd, but the hardware interface of that CAT stuff is
odd as well.

We tried hard to come up with an abstraction, but that only allows
rather simple partitioning, but no way of sharing and dealing with the
per package nature of this mechanism.

In the end we decided to expose the allocation bitmaps directly so all
combinations of the hardware can be utilized.

There are two ways of associating a cache partition:

- Task

A task can be added to a resource group. It uses the cache
partition associated to the group.

- CPU

All tasks which are not member of a resource group use the group to
which the CPU they are running on is associated with.

That allows for simple CPU based partitioning schemes.

The main expected user sare:

- Virtualization so a VM can only trash only the associated part of
the cash w/o disturbing others

- Real-Time systems to seperate RT and general workloads.

- Latency sensitive enterprise workloads

- In theory this also can be used to protect against cache side
channel attacks"

[ Intel RDT is "Resource Director Technology". The interface really is
rather odd and very specific, which delayed this pull request while I
was thinking about it. The pull request itself came in early during
the merge window, I just delayed it until things had calmed down and I
had more time.

But people tell me they'll use this, and the good news is that it is
_so_ specific that it's rather independent of anything else, and no
user is going to depend on the interface since it's pretty rare. So if
push comes to shove, we can just remove the interface and nothing will
break ]

* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
x86/intel_rdt: Implement show_options() for resctrlfs
x86/intel_rdt: Call intel_rdt_sched_in() with preemption disabled
x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount
x86/intel_rdt: Fix setting of closid when adding CPUs to a group
x86/intel_rdt: Update percpu closid immeditately on CPUs affected by changee
x86/intel_rdt: Reset per cpu closids on unmount
x86/intel_rdt: Select KERNFS when enabling INTEL_RDT_A
x86/intel_rdt: Prevent deadlock against hotplug lock
x86/intel_rdt: Protect info directory from removal
x86/intel_rdt: Add info files to Documentation
x86/intel_rdt: Export the minimum number of set mask bits in sysfs
x86/intel_rdt: Propagate error in rdt_mount() properly
x86/intel_rdt: Add a missing #include
MAINTAINERS: Add maintainer for Intel RDT resource allocation
x86/intel_rdt: Add scheduler hook
x86/intel_rdt: Add schemata file
x86/intel_rdt: Add tasks files
x86/intel_rdt: Add cpus file
x86/intel_rdt: Add mkdir to resctrl file system
x86/intel_rdt: Add "info" files to resctrl file system
...

Linus Torvalds
2016-12-23 01:25:45 +0800
1134c2b5c perf/x86: Fix overlap counter scheduling bug ... Browse Code »

Jiri reported the overlap scheduling exceeding its max stack.

Looking at the constraint that triggered this, it turns out the
overlap marker isn't needed.

The comment with EVENT_CONSTRAINT_OVERLAP states: "This is the case if
the counter mask of such an event is not a subset of any other counter
mask of a constraint with an equal or higher weight".

Esp. that latter part is of interest here I think, our overlapping mask
is 0x0e, that has 3 bits set and is the highest weight mask in on the
PMU, therefore it will be placed last. Can we still create a scenario
where we would need to rewind that?

The scenario for AMD Fam15h is we're having masks like:

0x3F -- 111111
0x38 -- 111000
0x07 -- 000111

0x09 -- 001001

And we mark 0x09 as overlapping, because it is not a direct subset of
0x38 or 0x07 and has less weight than either of those. This means we'll
first try and place the 0x09 event, then try and place 0x38/0x07 events.
Now imagine we have:

3 * 0x07 + 0x09

and the initial pick for the 0x09 event is counter 0, then we'll fail to
place all 0x07 events. So we'll pop back, try counter 4 for the 0x09
event, and then re-try all 0x07 events, which will now work.

The masks on the PMU in question are:

0x01 - 0001
0x03 - 0011
0x0e - 1110
0x0c - 1100

But since all the masks that have overlap (0xe -> {0xc,0x3}) and (0x3 ->
0x1) are of heavier weight, it should all work out.

Reported-by: Jiri Olsa
Tested-by: Jiri Olsa
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Liang Kan
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Robert Richter
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: Vince Weaver
Link: http://lkml.kernel.org/r/20161109155153.GQ3142@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-12-23 00:45:43 +0800
daa864b8f perf/x86/pebs: Fix handling of PEBS buffer overflows ... Browse Code »

This patch solves a race condition between PEBS and the PMU handler.

In case multiple PEBS events are sampled at the same time,
it is possible to have GLOBAL_STATUS bit 62 set indicating
PEBS buffer overflow and also seeing at most 3 PEBS counters
having their bits set in the status register. This is a sign
that there was at least one PEBS record pending at the time
of the PMU interrupt. PEBS counters must only be processed
via the drain_pebs() calls, and not via the regular sample
processing loop coming after that the function, otherwise
phony regular samples may be generated in the sampling buffer
not marked with the EXACT tag.

Another possibility is to have one PEBS event and at least
one non-PEBS event whic hoverflows while PEBS has armed. In this
case, bit 62 of GLOBAL_STATUS will not be set, yet the overflow
status bit for the PEBS counter will be on Skylake.

To avoid this problem, we systematically ignore the PEBS-enabled
counters from the GLOBAL_STATUS mask and we always process PEBS
events via drain_pebs().

The problem manifested itself by having non-exact samples when
sampling only PEBS events, i.e., the PERF_SAMPLE_RECORD would
not have the EXACT flag set.

Note that this problem is only present on Skylake processor.
This fix is harmless on older processors.

Reported-by: Peter Zijlstra
Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Thomas Gleixner
Cc: Vince Weaver
Link: http://lkml.kernel.org/r/1482395366-8992-1-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar

Stephane Eranian
2016-12-23 00:45:36 +0800
cef4402d7 x86/paravirt: Mark unused patch_default label ... Browse Code »

A bugfix commit:

45dbea5f55c0 ("x86/paravirt: Fix native_patch()")

... introduced a harmless warning:

arch/x86/kernel/paravirt_patch_32.c: In function 'native_patch':
arch/x86/kernel/paravirt_patch_32.c:71:1: error: label 'patch_default' defined but not used [-Werror=unused-label]

Fix it by annotating the label as __maybe_unused.

Reported-by: Arnd Bergmann
Reported-by: Piotr Gregor
Signed-off-by: Peter Zijlstra (Intel)
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Josh Poimboeuf
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Fixes: 45dbea5f55c0 ("x86/paravirt: Fix native_patch()")
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-12-23 00:43:35 +0800

22 Dec, 2016

2 commits

c8e008e2a Merge branches 'acpica' and 'acpi-scan' ... Browse Code »

* acpica:
ACPI / osl: Remove deprecated acpi_get_table_with_size()/early_acpi_os_unmap_memory()
ACPI / osl: Remove acpi_get_table_with_size()/early_acpi_os_unmap_memory() users
ACPICA: Tables: Allow FADT to be customized with virtual address
ACPICA: Tables: Back port acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux kernel

* acpi-scan:
ACPI: do not warn if _BQC does not exist

Rafael J. Wysocki
2016-12-22 21:34:24 +0800
0c961c551 Merge branch 'parisc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux ... Browse Code »

Pull parisc updates from Helge Deller:

- add Kernel address space layout randomization support

- re-enable interrupts earlier now that we have a working IRQ stack

- optimize the timer interrupt function to better cope with missed
timer irqs

- fix error return code in parisc perf code (by Dan Carpenter)

- fix PAT debug code

* 'parisc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Optimize timer interrupt function
parisc: perf: return -EFAULT on error
parisc: Enhance CPU detection code on PAT machines
parisc: Re-enable interrupts early
parisc: Enable KASLR

Linus Torvalds
2016-12-22 02:47:13 +0800

21 Dec, 2016

13 commits

8877ebdd3 x86/microcode/AMD: Reload proper initrd start address ... Browse Code »

When we switch to virtual addresses and, especially after
reserve_initrd()->relocate_initrd() have run, we have the updated initrd
address in initrd_start. Use initrd_start then instead of the address
which has been passed to us through boot params. (That still gets used
when we're running the very early routines on the BSP).

Reported-and-tested-by: Boris Ostrovsky
Signed-off-by: Borislav Petkov
Link: http://lkml.kernel.org/r/20161220144012.lc4cwrg6dphqbyqu@pd.tnic
Signed-off-by: Thomas Gleixner

Borislav Petkov
2016-12-21 17:50:04 +0800
8d3523fb3 ACPI / osl: Remove deprecated acpi_get_table_with_size()/early_acpi_os_unmap_memory() ... Browse Code »

Since all users are cleaned up, remove the 2 deprecated APIs due to no
users.
As a Linux variable rather than an ACPICA variable, acpi_gbl_permanent_mmap
is renamed to acpi_permanent_mmap to have a consistent coding style across
entire Linux ACPI subsystem.

Signed-off-by: Lv Zheng
Signed-off-by: Rafael J. Wysocki

Lv Zheng
2016-12-21 09:36:38 +0800
6b11d1d67 ACPI / osl: Remove acpi_get_table_with_size()/early_acpi_os_unmap_memory() users ... Browse Code »

This patch removes the users of the deprectated APIs:
acpi_get_table_with_size()
early_acpi_os_unmap_memory()
The following APIs should be used instead of:
acpi_get_table()
acpi_put_table()

The deprecated APIs are invented to be a replacement of acpi_get_table()
during the early stage so that the early mapped pointer will not be stored
in ACPICA core and thus the late stage acpi_get_table() won't return a
wrong pointer. The mapping size is returned just because it is required by
early_acpi_os_unmap_memory() to unmap the pointer during early stage.

But as the mapping size equals to the acpi_table_header.length
(see acpi_tb_init_table_descriptor() and acpi_tb_validate_table()), when
such a convenient result is returned, driver code will start to use it
instead of accessing acpi_table_header to obtain the length.

Thus this patch cleans up the drivers by replacing returned table size with
acpi_table_header.length, and should be a no-op.

Reported-by: Dan Williams
Signed-off-by: Lv Zheng
Signed-off-by: Rafael J. Wysocki

Lv Zheng
2016-12-21 09:36:38 +0800
ba6d973f7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes and cleanups from David Miller:

1) Use rb_entry() instead of hardcoded container_of(), from Geliang
Tang.

2) Use correct memory barriers in stammac driver, from Pavel Machek.

3) Fix assoc bind address handling in SCTP, from Xin Long.

4) Make the length check for UFO handling consistent between
__ip_append_data() and ip_finish_output(), from Zheng Li.

5) HSI driver compatible strings were busted fro hix5hd2, from Dongpo
Li.

6) Handle devm_ioremap() errors properly in cavium driver, from Arvind
Yadav.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits)
RDS: use rb_entry()
net_sched: sch_netem: use rb_entry()
net_sched: sch_fq: use rb_entry()
net/mlx5: use rb_entry()
ethernet: sfc: Add Kconfig entry for vendor Solarflare
sctp: not copying duplicate addrs to the assoc's bind address list
sctp: reduce indent level in sctp_copy_local_addr_list
ARM: dts: hix5hd2: don't change the existing compatible string
net: hix5hd2_gmac: fix compatible strings name
openvswitch: Add a missing break statement.
net: netcp: ethss: fix 10gbe host port tx pri map configuration
net: netcp: ethss: fix errors in ethtool ops
fsl/fman: enable compilation on ARM64
fsl/fman: A007273 only applies to PPC SoCs
powerpc: fsl/fman: remove fsl,fman from of_device_ids[]
fsl/fman: fix 1G support for QSGMII interfaces
dt: bindings: net: use boolean dt properties for eee broken modes
net: phy: use boolean dt properties for eee broken modes
net: phy: fix sign type error in genphy_config_eee_advert
ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
...

Linus Torvalds
2016-12-21 07:48:34 +0800
3eb86259e Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge final set of updates from Andrew Morton:

- a series to make IMA play better across kexec

- a handful of random fixes

* emailed patches from Andrew Morton :
printk: fix typo in CONSOLE_LOGLEVEL_DEFAULT help text
ratelimit: fix WARN_ON_RATELIMIT return value
kcov: make kcov work properly with KASLR enabled
arm64: setup: introduce kaslr_offset()
mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED
ima: platform-independent hash value
ima: define a canonical binary_runtime_measurements list format
ima: support restoring multiple template formats
ima: store the builtin/custom template definitions in a list
ima: on soft reboot, save the measurement list
powerpc: ima: send the kexec buffer to the next kernel
ima: maintain memory size needed for serializing the measurement list
ima: permit duplicate measurement list entries
ima: on soft reboot, restore the measurement list
powerpc: ima: get the kexec buffer passed by the previous kernel

Linus Torvalds
2016-12-21 07:24:32 +0800
d5379e5ed Merge tag 'microblaze-4.10-rc1' of git://git.monstr.eu/linux-2.6-microblaze ... Browse Code »

Pull arch/microblaze updates from Michal Simek:

- wire-up new syscalls

- add new codes and fpga families

- fix a return value

* tag 'microblaze-4.10-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Add new fpga families
microblaze: Add missing release version code v9.6 and v10
microblaze: Add missing syscalls
microblaze: Fix return value from xilinx_timer_init

Linus Torvalds
2016-12-21 07:16:00 +0800
ec92b88c3 Merge tag 'xtensa-20161219' of git://github.com/jcmvbkbc/linux-xtensa ... Browse Code »

Pull Xtensa updates from Max Filippov:

- enable HAVE_DMA_CONTIGUOUS, configure shared DMA pool reservation in
kc705 DTS

- update xtensa DMA-related Documentation/features entries

- clean up arch/xtensa/kernel/setup.c: move S32C1I self-test out of it,
remove unused declarations, fix screen_info definition

* tag 'xtensa-20161219' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: update DMA-related Documentation/features entries
xtensa: configure shared DMA pool reservation in kc705 DTS
xtensa: enable HAVE_DMA_CONTIGUOUS
xtensa: move S32C1I self-test to a separate file
xtensa: fix screen_info, clean up unused declarations in setup.c

Linus Torvalds
2016-12-21 06:48:53 +0800
160494d38 parisc: Optimize timer interrupt function ... Browse Code »

Restructure the timer interrupt function to better cope with missed timer irqs.
Optimize the calculation when the next interrupt should happen and skip irqs if
they would happen too shortly after exit of the irq function.

The update_process_times() call is done anyway at every timer irq, so we can
safely drop the prof_counter and prof_multiplier variables from the per_cpu
structure.

Signed-off-by: Helge Deller

Helge Deller
2016-12-21 04:39:40 +0800
48fed73ab ARM: dts: hix5hd2: don't change the existing compatible string ... Browse Code »

The SoC hix5hd2 compatible string has the suffix "-gmac" and
we should not change it.
We should only add the generic compatible string "hisi-gmac-v1".

Fixes: 0855950ba580 ("ARM: dts: hix5hd2: add gmac generic compatible and clock names")
Signed-off-by: Dongpo Li
Signed-off-by: David S. Miller

Dongpo Li
2016-12-21 03:12:29 +0800
ae6021d4f powerpc: fsl/fman: remove fsl,fman from of_device_ids[] ... Browse Code »

The fsl/fman drivers will use of_platform_populate() on all
supported platforms. Call of_platform_populate() to probe the
FMan sub-nodes.

Signed-off-by: Igal Liberman
Signed-off-by: Madalin Bucur
Acked-by: Scott Wood
Signed-off-by: David S. Miller

Madalin Bucur
2016-12-21 02:55:34 +0800
7ede8665f arm64: setup: introduce kaslr_offset() ... Browse Code »

Introduce kaslr_offset() similar to x86_64 to fix kcov.

[ Updated by Will Deacon ]

Link: http://lkml.kernel.org/r/1481417456-28826-2-git-send-email-alex.popov@linux.com
Signed-off-by: Alexander Popov
Cc: Catalin Marinas
Cc: Ard Biesheuvel
Cc: Mark Rutland
Cc: Rob Herring
Cc: Kefeng Wang
Cc: AKASHI Takahiro
Cc: Jon Masters
Cc: David Daney
Cc: Ganapatrao Kulkarni
Cc: Dmitry Vyukov
Cc: Nicolai Stange
Cc: James Morse
Cc: Andrey Ryabinin
Cc: Andrey Konovalov
Cc: Alexander Popov
Cc: syzkaller
Signed-off-by: Andrew Morton
Signed-off-by: Will Deacon
Signed-off-by: Linus Torvalds

Alexander Popov
2016-12-21 01:48:46 +0800
ab6b1d1fc powerpc: ima: send the kexec buffer to the next kernel ... Browse Code »

The IMA kexec buffer allows the currently running kernel to pass the
measurement list via a kexec segment to the kernel that will be kexec'd.

This is the architecture-specific part of setting up the IMA kexec
buffer for the next kernel. It will be used in the next patch.

Link: http://lkml.kernel.org/r/1480554346-29071-6-git-send-email-zohar@linux.vnet.ibm.com
Signed-off-by: Thiago Jung Bauermann
Signed-off-by: Mimi Zohar
Acked-by: "Eric W. Biederman"
Cc: Andreas Steffen
Cc: Dmitry Kasatkin
Cc: Josh Sklar
Cc: Dave Young
Cc: Vivek Goyal
Cc: Baoquan He
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Stewart Smith
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thiago Jung Bauermann
2016-12-21 01:48:44 +0800
467d27824 powerpc: ima: get the kexec buffer passed by the previous kernel ... Browse Code »

Patch series "ima: carry the measurement list across kexec", v8.

The TPM PCRs are only reset on a hard reboot. In order to validate a
TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement
list of the running kernel must be saved and then restored on the
subsequent boot, possibly of a different architecture.

The existing securityfs binary_runtime_measurements file conveniently
provides a serialized format of the IMA measurement list. This patch
set serializes the measurement list in this format and restores it.

Up to now, the binary_runtime_measurements was defined as architecture
native format. The assumption being that userspace could and would
handle any architecture conversions. With the ability of carrying the
measurement list across kexec, possibly from one architecture to a
different one, the per boot architecture information is lost and with it
the ability of recalculating the template digest hash. To resolve this
problem, without breaking the existing ABI, this patch set introduces
the boot command line option "ima_canonical_fmt", which is arbitrarily
defined as little endian.

The need for this boot command line option will be limited to the
existing version 1 format of the binary_runtime_measurements.
Subsequent formats will be defined as canonical format (eg. TPM 2.0
support for larger digests).

A simplified method of Thiago Bauermann's "kexec buffer handover" patch
series for carrying the IMA measurement list across kexec is included in
this patch set. The simplified method requires all file measurements be
taken prior to executing the kexec load, as subsequent measurements will
not be carried across the kexec and restored.

This patch (of 10):

The IMA kexec buffer allows the currently running kernel to pass the
measurement list via a kexec segment to the kernel that will be kexec'd.
The second kernel can check whether the previous kernel sent the buffer
and retrieve it.

This is the architecture-specific part which enables IMA to receive the
measurement list passed by the previous kernel. It will be used in the
next patch.

The change in machine_kexec_64.c is to factor out the logic of removing
an FDT memory reservation so that it can be used by remove_ima_buffer.

Link: http://lkml.kernel.org/r/1480554346-29071-2-git-send-email-zohar@linux.vnet.ibm.com
Signed-off-by: Thiago Jung Bauermann
Signed-off-by: Mimi Zohar
Acked-by: "Eric W. Biederman"
Cc: Andreas Steffen
Cc: Dmitry Kasatkin
Cc: Josh Sklar
Cc: Dave Young
Cc: Vivek Goyal
Cc: Baoquan He
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Stewart Smith
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thiago Jung Bauermann
2016-12-21 01:48:40 +0800

20 Dec, 2016

3 commits

9120cf4fd x86/platform/intel/quark: Add printf attribute to imr_self_test_result() ... Browse Code »

__printf() attributes help detecting issues in printf() format strings at
compile time.

Even though imr_selftest.c is only compiled with
CONFIG_DEBUG_IMR_SELFTEST=y, GCC complains about a missing format
attribute when compiling allmodconfig with -Wmissing-format-attribute.

Silence this warning by adding the attribute.

Signed-off-by: Nicolas Iooss
Acked-by: Bryan O'Donoghue
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20161219132144.4108-1-nicolas.iooss_linux@m4x.org
Signed-off-by: Ingo Molnar

Nicolas Iooss
2016-12-20 16:37:24 +0800
634b847b6 x86/platform/intel-mid: Switch MPU3050 driver to IIO ... Browse Code »

The Intel Mid goes in and creates a I2C device for the
MPU3050 if the input driver for MPU-3050 is activated.

As of commit:

3904b28efb2c ("iio: gyro: Add driver for the MPU-3050 gyroscope")

.. there is a proper and fully featured IIO driver for this
device, so deprecate the use of the incomplete input driver
by augmenting the device population code to react to the
presence of the IIO driver's Kconfig symbol instead.

Signed-off-by: Linus Walleij
Acked-by: Andy Shevchenko
Cc: Dmitry Torokhov
Cc: Jonathan Cameron
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1481722794-4348-1-git-send-email-linus.walleij@linaro.org
Signed-off-by: Ingo Molnar

Linus Walleij
2016-12-20 16:37:15 +0800
34bfab0ea x86/alternatives: Do not use sync_core() to serialize I$ ... Browse Code »

We use sync_core() in the alternatives code to stop speculative
execution of prefetched instructions because we are potentially changing
them and don't want to execute stale bytes.

What it does on most machines is call CPUID which is a serializing
instruction. And that's expensive.

However, the instruction cache is serialized when we're on the local CPU
and are changing the data through the same virtual address. So then, we
don't need the serializing CPUID but a simple control flow change. Last
being accomplished with a CALL/RET which the noinline causes.

Suggested-by: Linus Torvalds
Signed-off-by: Borislav Petkov
Reviewed-by: Andy Lutomirski
Cc: Andrew Cooper
Cc: Andy Lutomirski
Cc: Brian Gerst
Cc: Henrique de Moraes Holschuh
Cc: Matthew Whitehead
Cc: One Thousand Gnomes
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20161203150258.vwr5zzco7ctgc4pe@pd.tnic
Signed-off-by: Ingo Molnar

Borislav Petkov
2016-12-20 16:36:42 +0800