30 Dec, 2016

1 commit

  • In commit 62906027091f ("mm: add PageWaiters indicating tasks are
    waiting for a page bit") Nick Piggin made our page locking no longer
    unconditionally touch the hashed page waitqueue, which not only helps
    performance in general, but is particularly helpful on NUMA machines
    where the hashed wait queues can bounce around a lot.

    However, the "clear lock bit atomically and then test the waiters bit"
    sequence turns out to be much more expensive than it needs to be,
    because you get a nasty stall when trying to access the same word that
    just got updated atomically.

    On architectures where locking is done with LL/SC, this would be trivial
    to fix with a new primitive that clears one bit and tests another
    atomically, but that ends up not working on x86, where the only atomic
    operations that return the result end up being cmpxchg and xadd. The
    atomic bit operations return the old value of the same bit we changed,
    not the value of an unrelated bit.

    On x86, we could put the lock bit in the high bit of the byte, and use
    "xadd" with that bit (where the overflow ends up not touching other
    bits), and look at the other bits of the result. However, an even
    simpler model is to just use a regular atomic "and" to clear the lock
    bit, and then the sign bit in eflags will indicate the resulting state
    of the unrelated bit #7.

    So by moving the PageWaiters bit up to bit #7, we can atomically clear
    the lock bit and test the waiters bit on x86 too. And architectures
    with LL/SC (which is all the usual RISC suspects), the particular bit
    doesn't matter, so they are fine with this approach too.

    This avoids the extra access to the same atomic word, and thus avoids
    the costly stall at page unlock time.

    The only downside is that the interface ends up being a bit odd and
    specialized: clear a bit in a byte, and test the sign bit. Nick doesn't
    love the resulting name of the new primitive, but I'd rather make the
    name be descriptive and very clear about the limitation imposed by
    trying to work across all relevant architectures than make it be some
    generic thing that doesn't make the odd semantics explicit.

    So this introduces the new architecture primitive

    clear_bit_unlock_is_negative_byte();

    and adds the trivial implementation for x86. We have a generic
    non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
    combination) which can be overridden by any architecture that can do
    better. According to Nick, Power has the same hickup x86 has, for
    example, but some other architectures may not even care.

    All these optimizations mean that my page locking stress-test (which is
    just executing a lot of small short-lived shell scripts: "make test" in
    the git source tree) no longer makes our page locking look horribly bad.
    Before all these optimizations, just the unlock_page() costs were just
    over 3% of all CPU overhead on "make test". After this, it's down to
    0.66%, so just a quarter of the cost it used to be.

    (The difference on NUMA is bigger, but there this micro-optimization is
    likely less noticeable, since the big issue on NUMA was not the accesses
    to 'struct page', but the waitqueue accesses that were already removed
    by Nick's earlier commit).

    Acked-by: Nick Piggin
    Cc: Dave Hansen
    Cc: Bob Peterson
    Cc: Steven Whitehouse
    Cc: Andrew Lutomirski
    Cc: Andreas Gruenbacher
    Cc: Peter Zijlstra
    Cc: Mel Gorman
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

27 Dec, 2016

2 commits


26 Dec, 2016

4 commits

  • I am getting the following warning when I build kernel 4.9-git on my
    PowerBook G4 with a 32-bit PPC processor:

    AS arch/powerpc/kernel/misc_32.o
    arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef]

    This problem is evident after commit 989cea5c14be ("kbuild: prevent
    lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an
    error that has been in the code since 2005 when this source file was
    created. That was with commit 9994a33865f4 ("powerpc: Introduce
    entry_{32,64}.S, misc_{32,64}.S, systbl.S").

    The offending line does not make a lot of sense. This error does not
    seem to cause any errors in the executable, thus I am not recommending
    that it be applied to any stable versions.

    Thanks to Nicholas Piggin for suggesting this solution.

    Fixes: 9994a33865f4 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S")
    Signed-off-by: Larry Finger
    Cc: Nicholas Piggin
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: linuxppc-dev@lists.ozlabs.org
    Signed-off-by: Linus Torvalds

    Larry Finger
     
  • Pull timer type cleanups from Thomas Gleixner:
    "This series does a tree wide cleanup of types related to
    timers/timekeeping.

    - Get rid of cycles_t and use a plain u64. The type is not really
    helpful and caused more confusion than clarity

    - Get rid of the ktime union. The union has become useless as we use
    the scalar nanoseconds storage unconditionally now. The 32bit
    timespec alike storage got removed due to the Y2038 limitations
    some time ago.

    That leaves the odd union access around for no reason. Clean it up.

    Both changes have been done with coccinelle and a small amount of
    manual mopping up"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    ktime: Get rid of ktime_equal()
    ktime: Cleanup ktime_set() usage
    ktime: Get rid of the union
    clocksource: Use a plain u64 instead of cycle_t

    Linus Torvalds
     
  • Pull SMP hotplug notifier removal from Thomas Gleixner:
    "This is the final cleanup of the hotplug notifier infrastructure. The
    series has been reintgrated in the last two days because there came a
    new driver using the old infrastructure via the SCSI tree.

    Summary:

    - convert the last leftover drivers utilizing notifiers

    - fixup for a completely broken hotplug user

    - prevent setup of already used states

    - removal of the notifiers

    - treewide cleanup of hotplug state names

    - consolidation of state space

    There is a sphinx based documentation pending, but that needs review
    from the documentation folks"

    * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irqchip/armada-xp: Consolidate hotplug state space
    irqchip/gic: Consolidate hotplug state space
    coresight/etm3/4x: Consolidate hotplug state space
    cpu/hotplug: Cleanup state names
    cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions
    staging/lustre/libcfs: Convert to hotplug state machine
    scsi/bnx2i: Convert to hotplug state machine
    scsi/bnx2fc: Convert to hotplug state machine
    cpu/hotplug: Prevent overwriting of callbacks
    x86/msr: Remove bogus cleanup from the error path
    bus: arm-ccn: Prevent hotplug callback leak
    perf/x86/intel/cstate: Prevent hotplug callback leak
    ARM/imx/mmcd: Fix broken cpu hotplug handling
    scsi: qedi: Convert to hotplug state machine

    Linus Torvalds
     
  • ktime_set(S,N) was required for the timespec storage type and is still
    useful for situations where a Seconds and Nanoseconds part of a time value
    needs to be converted. For anything where the Seconds argument is 0, this
    is pointless and can be replaced with a simple assignment.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra

    Thomas Gleixner
     

25 Dec, 2016

6 commits

  • There is no point in having an extra type for extra confusion. u64 is
    unambiguous.

    Conversion was done with the following coccinelle script:

    @rem@
    @@
    -typedef u64 cycle_t;

    @fix@
    typedef cycle_t;
    @@
    -cycle_t
    +u64

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: John Stultz

    Thomas Gleixner
     
  • When the state names got added a script was used to add the extra argument
    to the calls. The script basically converted the state constant to a
    string, but the cleanup to convert these strings into meaningful ones did
    not happen.

    Replace all the useless strings with 'subsys/xxx/yyy:state' strings which
    are used in all the other places already.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • The error cleanup which is invoked when the hotplug state setup failed
    tries to remove the failed state, which is broken.

    Fixes: 8fba38c937cd ("x86/msr: Convert to hotplug state machine")
    Reported-by: kernel test robot
    Signed-off-by: Thomas Gleixner
    Cc: Sebastian Siewior

    Thomas Gleixner
     
  • If the pmu registration fails the registered hotplug callbacks are not
    removed. Wrong in any case, but fatal in case of a modular driver.

    Replace the nonsensical state names with proper ones while at it.

    Fixes: 77c34ef1c319 ("perf/x86/intel/cstate: Convert Intel CSTATE to hotplug state machine")
    Signed-off-by: Thomas Gleixner
    Cc: Sebastian Siewior
    Cc: Peter Zijlstra
    Cc: stable@vger.kernel.org

    Thomas Gleixner
     
  • The cpu hotplug support of this perf driver is broken in several ways:

    1) It adds a instance before setting up the state.

    2) The state for the instance is different from the state of the
    callback. It's just a randomly chosen state.

    3) The instance registration is not error checked so nobody noticed that
    the call can never succeed.

    4) The state for the multi install callbacks is chosen randomly and
    overwrites existing state. This is now prevented by the core code so the
    call is guaranteed to fail.

    5) The error exit path in the init function leaves the instance registered
    and then frees the memory which contains the enqueued hlist node.

    6) The remove function is removing the state and not the instance.

    Fix it by:

    - Setting up the state before adding instances. Use a dynamically allocated
    state for it.

    - Installing instances after the state has been set up

    - Removing the instance in the error path before freeing memory

    - Removing the instance not the state in the driver remove callback

    While at is use raw_cpu_processor_id(), because cpu_processor_id() cannot
    be used in preemptible context, and set the driver data after successful
    registration of the pmu.

    Signed-off-by: Thomas Gleixner
    Acked-by: Shawn Guo
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Cc: Frank Li
    Cc: Zhengyu Shen
    Link: http://lkml.kernel.org/r/20161221192111.596204211@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • This was entirely automated, using the script by Al:

    PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
    sed -i -e "s!$PATT!#include !" \
    $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

    to do the replacement at the end of the merge window.

    Requested-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

24 Dec, 2016

4 commits

  • Pull x86 fixes from Ingo Molnar:
    "There's a number of fixes:

    - a round of fixes for CPUID-less legacy CPUs
    - a number of microcode loader fixes
    - i8042 detection robustization fixes
    - stack dump/unwinder fixes
    - x86 SoC platform driver fixes
    - a GCC 7 warning fix
    - virtualization related fixes"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    Revert "x86/unwind: Detect bad stack return address"
    x86/paravirt: Mark unused patch_default label
    x86/microcode/AMD: Reload proper initrd start address
    x86/platform/intel/quark: Add printf attribute to imr_self_test_result()
    x86/platform/intel-mid: Switch MPU3050 driver to IIO
    x86/alternatives: Do not use sync_core() to serialize I$
    x86/topology: Document cpu_llc_id
    x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
    x86/asm: Rewrite sync_core() to use IRET-to-self
    x86/microcode/intel: Replace sync_core() with native_cpuid()
    Revert "x86/boot: Fail the boot if !M486 and CPUID is missing"
    x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
    x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6
    x86/tools: Fix gcc-7 warning in relocs.c
    x86/unwind: Dump stack data on warnings
    x86/unwind: Adjust last frame check for aligned function stacks
    x86/init: Fix a couple of comment typos
    x86/init: Remove i8042_detect() from platform ops
    Input: i8042 - Trust firmware a bit more when probing on X86
    x86/init: Add i8042 state to the platform data
    ...

    Linus Torvalds
     
  • Pull perf fixes from Ingo Molnar:
    "On the kernel side there's two x86 PMU driver fixes and a uprobes fix,
    plus on the tooling side there's a number of fixes and some late
    updates"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
    perf sched timehist: Fix invalid period calculation
    perf sched timehist: Remove hardcoded 'comm_width' check at print_summary
    perf sched timehist: Enlarge default 'comm_width'
    perf sched timehist: Honour 'comm_width' when aligning the headers
    perf/x86: Fix overlap counter scheduling bug
    perf/x86/pebs: Fix handling of PEBS buffer overflows
    samples/bpf: Move open_raw_sock to separate header
    samples/bpf: Remove perf_event_open() declaration
    samples/bpf: Be consistent with bpf_load_program bpf_insn parameter
    tools lib bpf: Add bpf_prog_{attach,detach}
    samples/bpf: Switch over to libbpf
    perf diff: Do not overwrite valid build id
    perf annotate: Don't throw error for zero length symbols
    perf bench futex: Fix lock-pi help string
    perf trace: Check if MAP_32BIT is defined (again)
    samples/bpf: Make perf_event_read() static
    uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation
    samples/bpf: Make samples more libbpf-centric
    tools lib bpf: Add flags to bpf_create_map()
    tools lib bpf: use __u32 from linux/types.h
    ...

    Linus Torvalds
     
  • Revert the following commit:

    b6959a362177 ("x86/unwind: Detect bad stack return address")

    ... because Andrey Konovalov reported an unwinder warning:

    WARNING: unrecognized kernel stack return address ffffffffa0000001 at ffff88006377fa18 in a.out:4467

    The unwind was initiated from an interrupt which occurred while running in the
    generated code for a kprobe. The unwinder printed the warning because it
    expected regs->ip to point to a valid text address, but instead it pointed to
    the generated code.

    Eventually we may want come up with a way to identify generated kprobe
    code so the unwinder can know that it's a valid return address. Until
    then, just remove the warning.

    Reported-by: Andrey Konovalov
    Signed-off-by: Josh Poimboeuf
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/02f296848fbf49fb72dfeea706413ecbd9d4caf6.1482418739.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar

    Josh Poimboeuf
     
  • Pull more ARC updates from Vineet Gupta:

    - Fix for aliasing VIPT dcache in old ARC700 cores

    - micro-optimization in ARC700 ProtV handler

    - Enable SG_CHAIN [Vladimir]

    - ARC HS38 core intc default to prio 1

    * tag 'arc-4.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
    ARC: mm: arc700: Don't assume 2 colours for aliasing VIPT dcache
    ARC: mm: No need to save cache version in @cpuinfo
    ARC: enable SG chaining
    ARCv2: intc: default all interrupts to priority 1
    ARCv2: entry: document intr disable in hard isr
    ARC: ARCompact entry: elide re-reading ECR in ProtV handler

    Linus Torvalds
     

23 Dec, 2016

5 commits

  • Pull more ACPI updates from Rafael Wysocki:
    "Here are new versions of two ACPICA changes that were deferred
    previously due to a problem they had introduced, two cleanups on top
    of them and the removal of a useless warning message from the ACPI
    core.

    Specifics:

    - Move some Linux-specific functionality to upstream ACPICA and
    update the in-kernel users of it accordingly (Lv Zheng)

    - Drop a useless warning (triggered by the lack of an optional
    object) from the ACPI namespace scanning code (Zhang Rui)"

    * tag 'acpi-extra-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    ACPI / osl: Remove deprecated acpi_get_table_with_size()/early_acpi_os_unmap_memory()
    ACPI / osl: Remove acpi_get_table_with_size()/early_acpi_os_unmap_memory() users
    ACPICA: Tables: Allow FADT to be customized with virtual address
    ACPICA: Tables: Back port acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux kernel
    ACPI: do not warn if _BQC does not exist

    Linus Torvalds
     
  • Pull x86 cache allocation interface from Thomas Gleixner:
    "This provides support for Intel's Cache Allocation Technology, a cache
    partitioning mechanism.

    The interface is odd, but the hardware interface of that CAT stuff is
    odd as well.

    We tried hard to come up with an abstraction, but that only allows
    rather simple partitioning, but no way of sharing and dealing with the
    per package nature of this mechanism.

    In the end we decided to expose the allocation bitmaps directly so all
    combinations of the hardware can be utilized.

    There are two ways of associating a cache partition:

    - Task

    A task can be added to a resource group. It uses the cache
    partition associated to the group.

    - CPU

    All tasks which are not member of a resource group use the group to
    which the CPU they are running on is associated with.

    That allows for simple CPU based partitioning schemes.

    The main expected user sare:

    - Virtualization so a VM can only trash only the associated part of
    the cash w/o disturbing others

    - Real-Time systems to seperate RT and general workloads.

    - Latency sensitive enterprise workloads

    - In theory this also can be used to protect against cache side
    channel attacks"

    [ Intel RDT is "Resource Director Technology". The interface really is
    rather odd and very specific, which delayed this pull request while I
    was thinking about it. The pull request itself came in early during
    the merge window, I just delayed it until things had calmed down and I
    had more time.

    But people tell me they'll use this, and the good news is that it is
    _so_ specific that it's rather independent of anything else, and no
    user is going to depend on the interface since it's pretty rare. So if
    push comes to shove, we can just remove the interface and nothing will
    break ]

    * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
    x86/intel_rdt: Implement show_options() for resctrlfs
    x86/intel_rdt: Call intel_rdt_sched_in() with preemption disabled
    x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount
    x86/intel_rdt: Fix setting of closid when adding CPUs to a group
    x86/intel_rdt: Update percpu closid immeditately on CPUs affected by changee
    x86/intel_rdt: Reset per cpu closids on unmount
    x86/intel_rdt: Select KERNFS when enabling INTEL_RDT_A
    x86/intel_rdt: Prevent deadlock against hotplug lock
    x86/intel_rdt: Protect info directory from removal
    x86/intel_rdt: Add info files to Documentation
    x86/intel_rdt: Export the minimum number of set mask bits in sysfs
    x86/intel_rdt: Propagate error in rdt_mount() properly
    x86/intel_rdt: Add a missing #include
    MAINTAINERS: Add maintainer for Intel RDT resource allocation
    x86/intel_rdt: Add scheduler hook
    x86/intel_rdt: Add schemata file
    x86/intel_rdt: Add tasks files
    x86/intel_rdt: Add cpus file
    x86/intel_rdt: Add mkdir to resctrl file system
    x86/intel_rdt: Add "info" files to resctrl file system
    ...

    Linus Torvalds
     
  • Jiri reported the overlap scheduling exceeding its max stack.

    Looking at the constraint that triggered this, it turns out the
    overlap marker isn't needed.

    The comment with EVENT_CONSTRAINT_OVERLAP states: "This is the case if
    the counter mask of such an event is not a subset of any other counter
    mask of a constraint with an equal or higher weight".

    Esp. that latter part is of interest here I think, our overlapping mask
    is 0x0e, that has 3 bits set and is the highest weight mask in on the
    PMU, therefore it will be placed last. Can we still create a scenario
    where we would need to rewind that?

    The scenario for AMD Fam15h is we're having masks like:

    0x3F -- 111111
    0x38 -- 111000
    0x07 -- 000111

    0x09 -- 001001

    And we mark 0x09 as overlapping, because it is not a direct subset of
    0x38 or 0x07 and has less weight than either of those. This means we'll
    first try and place the 0x09 event, then try and place 0x38/0x07 events.
    Now imagine we have:

    3 * 0x07 + 0x09

    and the initial pick for the 0x09 event is counter 0, then we'll fail to
    place all 0x07 events. So we'll pop back, try counter 4 for the 0x09
    event, and then re-try all 0x07 events, which will now work.

    The masks on the PMU in question are:

    0x01 - 0001
    0x03 - 0011
    0x0e - 1110
    0x0c - 1100

    But since all the masks that have overlap (0xe -> {0xc,0x3}) and (0x3 ->
    0x1) are of heavier weight, it should all work out.

    Reported-by: Jiri Olsa
    Tested-by: Jiri Olsa
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Liang Kan
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: Vince Weaver
    Link: http://lkml.kernel.org/r/20161109155153.GQ3142@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • This patch solves a race condition between PEBS and the PMU handler.

    In case multiple PEBS events are sampled at the same time,
    it is possible to have GLOBAL_STATUS bit 62 set indicating
    PEBS buffer overflow and also seeing at most 3 PEBS counters
    having their bits set in the status register. This is a sign
    that there was at least one PEBS record pending at the time
    of the PMU interrupt. PEBS counters must only be processed
    via the drain_pebs() calls, and not via the regular sample
    processing loop coming after that the function, otherwise
    phony regular samples may be generated in the sampling buffer
    not marked with the EXACT tag.

    Another possibility is to have one PEBS event and at least
    one non-PEBS event whic hoverflows while PEBS has armed. In this
    case, bit 62 of GLOBAL_STATUS will not be set, yet the overflow
    status bit for the PEBS counter will be on Skylake.

    To avoid this problem, we systematically ignore the PEBS-enabled
    counters from the GLOBAL_STATUS mask and we always process PEBS
    events via drain_pebs().

    The problem manifested itself by having non-exact samples when
    sampling only PEBS events, i.e., the PERF_SAMPLE_RECORD would
    not have the EXACT flag set.

    Note that this problem is only present on Skylake processor.
    This fix is harmless on older processors.

    Reported-by: Peter Zijlstra
    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Link: http://lkml.kernel.org/r/1482395366-8992-1-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • A bugfix commit:

    45dbea5f55c0 ("x86/paravirt: Fix native_patch()")

    ... introduced a harmless warning:

    arch/x86/kernel/paravirt_patch_32.c: In function 'native_patch':
    arch/x86/kernel/paravirt_patch_32.c:71:1: error: label 'patch_default' defined but not used [-Werror=unused-label]

    Fix it by annotating the label as __maybe_unused.

    Reported-by: Arnd Bergmann
    Reported-by: Piotr Gregor
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: 45dbea5f55c0 ("x86/paravirt: Fix native_patch()")
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

22 Dec, 2016

2 commits

  • * acpica:
    ACPI / osl: Remove deprecated acpi_get_table_with_size()/early_acpi_os_unmap_memory()
    ACPI / osl: Remove acpi_get_table_with_size()/early_acpi_os_unmap_memory() users
    ACPICA: Tables: Allow FADT to be customized with virtual address
    ACPICA: Tables: Back port acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux kernel

    * acpi-scan:
    ACPI: do not warn if _BQC does not exist

    Rafael J. Wysocki
     
  • Pull parisc updates from Helge Deller:

    - add Kernel address space layout randomization support

    - re-enable interrupts earlier now that we have a working IRQ stack

    - optimize the timer interrupt function to better cope with missed
    timer irqs

    - fix error return code in parisc perf code (by Dan Carpenter)

    - fix PAT debug code

    * 'parisc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    parisc: Optimize timer interrupt function
    parisc: perf: return -EFAULT on error
    parisc: Enhance CPU detection code on PAT machines
    parisc: Re-enable interrupts early
    parisc: Enable KASLR

    Linus Torvalds
     

21 Dec, 2016

13 commits

  • When we switch to virtual addresses and, especially after
    reserve_initrd()->relocate_initrd() have run, we have the updated initrd
    address in initrd_start. Use initrd_start then instead of the address
    which has been passed to us through boot params. (That still gets used
    when we're running the very early routines on the BSP).

    Reported-and-tested-by: Boris Ostrovsky
    Signed-off-by: Borislav Petkov
    Link: http://lkml.kernel.org/r/20161220144012.lc4cwrg6dphqbyqu@pd.tnic
    Signed-off-by: Thomas Gleixner

    Borislav Petkov
     
  • Since all users are cleaned up, remove the 2 deprecated APIs due to no
    users.
    As a Linux variable rather than an ACPICA variable, acpi_gbl_permanent_mmap
    is renamed to acpi_permanent_mmap to have a consistent coding style across
    entire Linux ACPI subsystem.

    Signed-off-by: Lv Zheng
    Signed-off-by: Rafael J. Wysocki

    Lv Zheng
     
  • This patch removes the users of the deprectated APIs:
    acpi_get_table_with_size()
    early_acpi_os_unmap_memory()
    The following APIs should be used instead of:
    acpi_get_table()
    acpi_put_table()

    The deprecated APIs are invented to be a replacement of acpi_get_table()
    during the early stage so that the early mapped pointer will not be stored
    in ACPICA core and thus the late stage acpi_get_table() won't return a
    wrong pointer. The mapping size is returned just because it is required by
    early_acpi_os_unmap_memory() to unmap the pointer during early stage.

    But as the mapping size equals to the acpi_table_header.length
    (see acpi_tb_init_table_descriptor() and acpi_tb_validate_table()), when
    such a convenient result is returned, driver code will start to use it
    instead of accessing acpi_table_header to obtain the length.

    Thus this patch cleans up the drivers by replacing returned table size with
    acpi_table_header.length, and should be a no-op.

    Reported-by: Dan Williams
    Signed-off-by: Lv Zheng
    Signed-off-by: Rafael J. Wysocki

    Lv Zheng
     
  • Pull networking fixes and cleanups from David Miller:

    1) Use rb_entry() instead of hardcoded container_of(), from Geliang
    Tang.

    2) Use correct memory barriers in stammac driver, from Pavel Machek.

    3) Fix assoc bind address handling in SCTP, from Xin Long.

    4) Make the length check for UFO handling consistent between
    __ip_append_data() and ip_finish_output(), from Zheng Li.

    5) HSI driver compatible strings were busted fro hix5hd2, from Dongpo
    Li.

    6) Handle devm_ioremap() errors properly in cavium driver, from Arvind
    Yadav.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits)
    RDS: use rb_entry()
    net_sched: sch_netem: use rb_entry()
    net_sched: sch_fq: use rb_entry()
    net/mlx5: use rb_entry()
    ethernet: sfc: Add Kconfig entry for vendor Solarflare
    sctp: not copying duplicate addrs to the assoc's bind address list
    sctp: reduce indent level in sctp_copy_local_addr_list
    ARM: dts: hix5hd2: don't change the existing compatible string
    net: hix5hd2_gmac: fix compatible strings name
    openvswitch: Add a missing break statement.
    net: netcp: ethss: fix 10gbe host port tx pri map configuration
    net: netcp: ethss: fix errors in ethtool ops
    fsl/fman: enable compilation on ARM64
    fsl/fman: A007273 only applies to PPC SoCs
    powerpc: fsl/fman: remove fsl,fman from of_device_ids[]
    fsl/fman: fix 1G support for QSGMII interfaces
    dt: bindings: net: use boolean dt properties for eee broken modes
    net: phy: use boolean dt properties for eee broken modes
    net: phy: fix sign type error in genphy_config_eee_advert
    ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
    ...

    Linus Torvalds
     
  • Merge final set of updates from Andrew Morton:

    - a series to make IMA play better across kexec

    - a handful of random fixes

    * emailed patches from Andrew Morton :
    printk: fix typo in CONSOLE_LOGLEVEL_DEFAULT help text
    ratelimit: fix WARN_ON_RATELIMIT return value
    kcov: make kcov work properly with KASLR enabled
    arm64: setup: introduce kaslr_offset()
    mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED
    ima: platform-independent hash value
    ima: define a canonical binary_runtime_measurements list format
    ima: support restoring multiple template formats
    ima: store the builtin/custom template definitions in a list
    ima: on soft reboot, save the measurement list
    powerpc: ima: send the kexec buffer to the next kernel
    ima: maintain memory size needed for serializing the measurement list
    ima: permit duplicate measurement list entries
    ima: on soft reboot, restore the measurement list
    powerpc: ima: get the kexec buffer passed by the previous kernel

    Linus Torvalds
     
  • Pull arch/microblaze updates from Michal Simek:

    - wire-up new syscalls

    - add new codes and fpga families

    - fix a return value

    * tag 'microblaze-4.10-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
    microblaze: Add new fpga families
    microblaze: Add missing release version code v9.6 and v10
    microblaze: Add missing syscalls
    microblaze: Fix return value from xilinx_timer_init

    Linus Torvalds
     
  • Pull Xtensa updates from Max Filippov:

    - enable HAVE_DMA_CONTIGUOUS, configure shared DMA pool reservation in
    kc705 DTS

    - update xtensa DMA-related Documentation/features entries

    - clean up arch/xtensa/kernel/setup.c: move S32C1I self-test out of it,
    remove unused declarations, fix screen_info definition

    * tag 'xtensa-20161219' of git://github.com/jcmvbkbc/linux-xtensa:
    xtensa: update DMA-related Documentation/features entries
    xtensa: configure shared DMA pool reservation in kc705 DTS
    xtensa: enable HAVE_DMA_CONTIGUOUS
    xtensa: move S32C1I self-test to a separate file
    xtensa: fix screen_info, clean up unused declarations in setup.c

    Linus Torvalds
     
  • Restructure the timer interrupt function to better cope with missed timer irqs.
    Optimize the calculation when the next interrupt should happen and skip irqs if
    they would happen too shortly after exit of the irq function.

    The update_process_times() call is done anyway at every timer irq, so we can
    safely drop the prof_counter and prof_multiplier variables from the per_cpu
    structure.

    Signed-off-by: Helge Deller

    Helge Deller
     
  • The SoC hix5hd2 compatible string has the suffix "-gmac" and
    we should not change it.
    We should only add the generic compatible string "hisi-gmac-v1".

    Fixes: 0855950ba580 ("ARM: dts: hix5hd2: add gmac generic compatible and clock names")
    Signed-off-by: Dongpo Li
    Signed-off-by: David S. Miller

    Dongpo Li
     
  • The fsl/fman drivers will use of_platform_populate() on all
    supported platforms. Call of_platform_populate() to probe the
    FMan sub-nodes.

    Signed-off-by: Igal Liberman
    Signed-off-by: Madalin Bucur
    Acked-by: Scott Wood
    Signed-off-by: David S. Miller

    Madalin Bucur
     
  • Introduce kaslr_offset() similar to x86_64 to fix kcov.

    [ Updated by Will Deacon ]

    Link: http://lkml.kernel.org/r/1481417456-28826-2-git-send-email-alex.popov@linux.com
    Signed-off-by: Alexander Popov
    Cc: Catalin Marinas
    Cc: Ard Biesheuvel
    Cc: Mark Rutland
    Cc: Rob Herring
    Cc: Kefeng Wang
    Cc: AKASHI Takahiro
    Cc: Jon Masters
    Cc: David Daney
    Cc: Ganapatrao Kulkarni
    Cc: Dmitry Vyukov
    Cc: Nicolai Stange
    Cc: James Morse
    Cc: Andrey Ryabinin
    Cc: Andrey Konovalov
    Cc: Alexander Popov
    Cc: syzkaller
    Signed-off-by: Andrew Morton
    Signed-off-by: Will Deacon
    Signed-off-by: Linus Torvalds

    Alexander Popov
     
  • The IMA kexec buffer allows the currently running kernel to pass the
    measurement list via a kexec segment to the kernel that will be kexec'd.

    This is the architecture-specific part of setting up the IMA kexec
    buffer for the next kernel. It will be used in the next patch.

    Link: http://lkml.kernel.org/r/1480554346-29071-6-git-send-email-zohar@linux.vnet.ibm.com
    Signed-off-by: Thiago Jung Bauermann
    Signed-off-by: Mimi Zohar
    Acked-by: "Eric W. Biederman"
    Cc: Andreas Steffen
    Cc: Dmitry Kasatkin
    Cc: Josh Sklar
    Cc: Dave Young
    Cc: Vivek Goyal
    Cc: Baoquan He
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Stewart Smith
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thiago Jung Bauermann
     
  • Patch series "ima: carry the measurement list across kexec", v8.

    The TPM PCRs are only reset on a hard reboot. In order to validate a
    TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement
    list of the running kernel must be saved and then restored on the
    subsequent boot, possibly of a different architecture.

    The existing securityfs binary_runtime_measurements file conveniently
    provides a serialized format of the IMA measurement list. This patch
    set serializes the measurement list in this format and restores it.

    Up to now, the binary_runtime_measurements was defined as architecture
    native format. The assumption being that userspace could and would
    handle any architecture conversions. With the ability of carrying the
    measurement list across kexec, possibly from one architecture to a
    different one, the per boot architecture information is lost and with it
    the ability of recalculating the template digest hash. To resolve this
    problem, without breaking the existing ABI, this patch set introduces
    the boot command line option "ima_canonical_fmt", which is arbitrarily
    defined as little endian.

    The need for this boot command line option will be limited to the
    existing version 1 format of the binary_runtime_measurements.
    Subsequent formats will be defined as canonical format (eg. TPM 2.0
    support for larger digests).

    A simplified method of Thiago Bauermann's "kexec buffer handover" patch
    series for carrying the IMA measurement list across kexec is included in
    this patch set. The simplified method requires all file measurements be
    taken prior to executing the kexec load, as subsequent measurements will
    not be carried across the kexec and restored.

    This patch (of 10):

    The IMA kexec buffer allows the currently running kernel to pass the
    measurement list via a kexec segment to the kernel that will be kexec'd.
    The second kernel can check whether the previous kernel sent the buffer
    and retrieve it.

    This is the architecture-specific part which enables IMA to receive the
    measurement list passed by the previous kernel. It will be used in the
    next patch.

    The change in machine_kexec_64.c is to factor out the logic of removing
    an FDT memory reservation so that it can be used by remove_ima_buffer.

    Link: http://lkml.kernel.org/r/1480554346-29071-2-git-send-email-zohar@linux.vnet.ibm.com
    Signed-off-by: Thiago Jung Bauermann
    Signed-off-by: Mimi Zohar
    Acked-by: "Eric W. Biederman"
    Cc: Andreas Steffen
    Cc: Dmitry Kasatkin
    Cc: Josh Sklar
    Cc: Dave Young
    Cc: Vivek Goyal
    Cc: Baoquan He
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Stewart Smith
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thiago Jung Bauermann
     

20 Dec, 2016

3 commits

  • __printf() attributes help detecting issues in printf() format strings at
    compile time.

    Even though imr_selftest.c is only compiled with
    CONFIG_DEBUG_IMR_SELFTEST=y, GCC complains about a missing format
    attribute when compiling allmodconfig with -Wmissing-format-attribute.

    Silence this warning by adding the attribute.

    Signed-off-by: Nicolas Iooss
    Acked-by: Bryan O'Donoghue
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20161219132144.4108-1-nicolas.iooss_linux@m4x.org
    Signed-off-by: Ingo Molnar

    Nicolas Iooss
     
  • The Intel Mid goes in and creates a I2C device for the
    MPU3050 if the input driver for MPU-3050 is activated.

    As of commit:

    3904b28efb2c ("iio: gyro: Add driver for the MPU-3050 gyroscope")

    .. there is a proper and fully featured IIO driver for this
    device, so deprecate the use of the incomplete input driver
    by augmenting the device population code to react to the
    presence of the IIO driver's Kconfig symbol instead.

    Signed-off-by: Linus Walleij
    Acked-by: Andy Shevchenko
    Cc: Dmitry Torokhov
    Cc: Jonathan Cameron
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1481722794-4348-1-git-send-email-linus.walleij@linaro.org
    Signed-off-by: Ingo Molnar

    Linus Walleij
     
  • We use sync_core() in the alternatives code to stop speculative
    execution of prefetched instructions because we are potentially changing
    them and don't want to execute stale bytes.

    What it does on most machines is call CPUID which is a serializing
    instruction. And that's expensive.

    However, the instruction cache is serialized when we're on the local CPU
    and are changing the data through the same virtual address. So then, we
    don't need the serializing CPUID but a simple control flow change. Last
    being accomplished with a CALL/RET which the noinline causes.

    Suggested-by: Linus Torvalds
    Signed-off-by: Borislav Petkov
    Reviewed-by: Andy Lutomirski
    Cc: Andrew Cooper
    Cc: Andy Lutomirski
    Cc: Brian Gerst
    Cc: Henrique de Moraes Holschuh
    Cc: Matthew Whitehead
    Cc: One Thousand Gnomes
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20161203150258.vwr5zzco7ctgc4pe@pd.tnic
    Signed-off-by: Ingo Molnar

    Borislav Petkov