10 Apr, 2016

1 commit

  • Pull power management and ACPI fixes from Rafael Wysocki:
    "Fixes for some issues discovered after recent changes and for some
    that have just been found lately regardless of those changes
    (intel_pstate, intel_idle, PM core, mailbox/pcc, turbostat) plus
    support for some new CPU models (intel_idle, Intel RAPL driver,
    turbostat) and documentation updates (intel_pstate, PM core).

    Specifics:

    - intel_pstate fixes for two issues exposed by the recent switch over
    from using timers and for one issue introduced during the 4.4 cycle
    plus new comments describing data structures used by the driver
    (Rafael Wysocki, Srinivas Pandruvada).

    - intel_idle fixes related to CPU offline/online (Richard Cochran).

    - intel_idle support (new CPU IDs and state definitions mostly) for
    Skylake-X and Kabylake processors (Len Brown).

    - PCC mailbox driver fix for an out-of-bounds memory access that may
    cause the kernel to panic() (Shanker Donthineni).

    - New (missing) CPU ID for one apparently overlooked Haswell model in
    the Intel RAPL power capping driver (Srinivas Pandruvada).

    - Fix for the PM core's wakeup IRQs framework to make it work after
    wakeup settings reconfiguration from sysfs (Grygorii Strashko).

    - Runtime PM documentation update to make it describe what needs to
    be done during device removal more precisely (Krzysztof Kozlowski).

    - Stale comment removal cleanup in the cpufreq-dt driver (Viresh
    Kumar).

    - turbostat utility fixes and support for Broxton, Skylake-X and
    Kabylake processors (Len Brown)"

    * tag 'pm+acpi-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (28 commits)
    PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
    tools/power turbostat: work around RC6 counter wrap
    tools/power turbostat: initial KBL support
    tools/power turbostat: initial SKX support
    tools/power turbostat: decode BXT TSC frequency via CPUID
    tools/power turbostat: initial BXT support
    tools/power turbostat: print IRTL MSRs
    tools/power turbostat: SGX state should print only if --debug
    intel_idle: Add KBL support
    intel_idle: Add SKX support
    intel_idle: Clean up all registered devices on exit.
    intel_idle: Propagate hot plug errors.
    intel_idle: Don't overreact to a cpuidle registration failure.
    intel_idle: Setup the timer broadcast only on successful driver load.
    intel_idle: Avoid a double free of the per-CPU data.
    intel_idle: Fix dangling registration on error path.
    intel_idle: Fix deallocation order on the driver exit path.
    intel_idle: Remove redundant initialization calls.
    intel_idle: Fix a helper function's return value.
    intel_idle: remove useless return from void function.
    ...

    Linus Torvalds
     

09 Apr, 2016

1 commit

  • * pm-core:
    PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
    PM / runtime: Document steps for device removal

    * powercap:
    powercap: intel_rapl: Add missing Haswell model

    * pm-tools:
    tools/power turbostat: work around RC6 counter wrap
    tools/power turbostat: initial KBL support
    tools/power turbostat: initial SKX support
    tools/power turbostat: decode BXT TSC frequency via CPUID
    tools/power turbostat: initial BXT support
    tools/power turbostat: print IRTL MSRs
    tools/power turbostat: SGX state should print only if --debug

    Rafael J. Wysocki
     

08 Apr, 2016

1 commit

  • Some processors use the Interrupt Response Time Limit (IRTL) MSR value
    to describe the maximum IRQ response time latency for deep
    package C-states. (Though others have the register, but do not use it)
    Lets print it out to give insight into the cases where it is used.

    IRTL begain in SNB, with PC3/PC6/PC7, and HSW added PC8/PC9/PC10.

    Signed-off-by: Len Brown
    Signed-off-by: Rafael J. Wysocki

    Len Brown
     

07 Apr, 2016

1 commit

  • Let's see if anybody even notices. I doubt anybody uses this, and it
    does expose addresses that should be randomized, so let's just remove
    the code. It's old and traditional, and it used to be cute, but we
    should have removed this long ago.

    If it turns out anybody notices and this breaks something, we'll have to
    revert this, and maybe we'll end up using other approaches instead
    (using %pK or similar). But removing unnecessary code is always the
    preferred option.

    Noted-by: Emrah Demir
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

06 Apr, 2016

1 commit

  • Pull KVM fixes from Paolo Bonzini:
    "Miscellaneous bugfixes.

    The ARM and s390 fixes are for new regressions from the merge window,
    others are usual stable material"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    compiler-gcc: disable -ftracer for __noclone functions
    kvm: x86: make lapic hrtimer pinned
    s390/mm/kvm: fix mis-merge in gmap handling
    kvm: set page dirty only if page has been writable
    KVM: x86: reduce default value of halt_poll_ns parameter
    KVM: Hyper-V: do not do hypercall userspace exits if SynIC is disabled
    KVM: x86: Inject pending interrupt even if pending nmi exist
    arm64: KVM: Register CPU notifiers when the kernel runs at HYP
    arm64: kvm: 4.6-rc1: Fix VTCR_EL2 VS setting

    Linus Torvalds
     

05 Apr, 2016

2 commits

  • When a vCPU runs on a nohz_full core, the hrtimer used by
    the lapic emulation code can be migrated to another core.
    When this happens, it's possible to observe milisecond
    latency when delivering timer IRQs to KVM guests.

    The huge latency is mainly due to the fact that
    apic_timer_fn() expects to run during a kvm exit. It
    sets KVM_REQ_PENDING_TIMER and let it be handled on kvm
    entry. However, if the timer fires on a different core,
    we have to wait until the next kvm exit for the guest
    to see KVM_REQ_PENDING_TIMER set.

    This problem became visible after commit 9642d18ee. This
    commit changed the timer migration code to always attempt
    to migrate timers away from nohz_full cores. While it's
    discussable if this is correct/desirable (I don't think
    it is), it's clear that the lapic emulation code has
    a requirement on firing the hrtimer in the same core
    where it was started. This is achieved by making the
    hrtimer pinned.

    Lastly, note that KVM has code to migrate timers when a
    vCPU is scheduled to run in different core. However, this
    forced migration may fail. When this happens, we can have
    the same problem. If we want 100% correctness, we'll have
    to modify apic_timer_fn() to cause a kvm exit when it runs
    on a different core than the vCPU. Not sure if this is
    possible.

    Here's a reproducer for the issue being fixed:

    1. Set all cores but core0 to be nohz_full cores
    2. Start a guest with a single vCPU
    3. Trace apic_timer_fn() and kvm_inject_apic_timer_irqs()

    You'll see that apic_timer_fn() will run in core0 while
    kvm_inject_apic_timer_irqs() runs in a different core. If
    you get both on core0, try running a program that takes 100%
    of the CPU and pin it to core0 to force the vCPU out.

    Signed-off-by: Luiz Capitulino
    Signed-off-by: Paolo Bonzini

    Luiz Capitulino
     
  • Pull xen fixes from David Vrabel:
    "Regression and bug fixes for 4.6-rc2:

    - safely migrate event channels between CPUs
    - fix CPU hotplug
    - maintainer changes"

    * tag 'for-linus-4.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    MAINTAINERS: xen: Konrad to step down and Juergen to pick up
    xen/events: Mask a moving irq
    Xen on ARM and ARM64: update MAINTAINERS info
    xen/x86: Call cpu_startup_entry(CPUHP_AP_ONLINE_IDLE) from xen_play_dead()
    xen/apic: Provide Xen-specific version of cpu_present_to_apicid APIC op

    Linus Torvalds
     

03 Apr, 2016

2 commits

  • Pull perf fixes from Ingo Molnar:
    "Misc kernel side fixes:

    - fix event leak
    - fix AMD PMU driver bug
    - fix core event handling bug
    - fix build bug on certain randconfigs

    Plus misc tooling fixes"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/x86/amd/ibs: Fix pmu::stop() nesting
    perf/core: Don't leak event in the syscall error path
    perf/core: Fix time tracking bug with multiplexing
    perf jit: genelf makes assumptions about endian
    perf hists: Fix determination of a callchain node's childlessness
    perf tools: Add missing initialization of perf_sample.cpumode in synthesized samples
    perf tools: Fix build break on powerpc
    perf/x86: Move events_sysfs_show() outside CPU_SUP_INTEL
    perf bench: Fix detached tarball building due to missing 'perf bench memcpy' headers
    perf tests: Fix tarpkg build test error output redirection

    Linus Torvalds
     
  • Pull x86 fixes from Thomas Gleixner:
    "This lot contains:

    - Some fixups for the fallout of the topology consolidation which
    unearthed AMD/Intel inconsistencies
    - Documentation for the x86 topology management
    - Support for AMD advanced power management bits
    - Two simple cleanups removing duplicated code"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/cpu: Add advanced power management bits
    x86/thread_info: Merge two !__ASSEMBLY__ sections
    x86/cpufreq: Remove duplicated TDP MSR macro definitions
    x86/Documentation: Start documenting x86 topology
    x86/cpu: Get rid of compute_unit_id
    perf/x86/amd: Cleanup Fam10h NB event constraints
    x86/topology: Fix AMD core count

    Linus Torvalds
     

02 Apr, 2016

4 commits

  • Pull power management and ACPI fix from Rafael J. Wysocki:
    "Just one fix for a nasty boot failure on some systems based on Intel
    Skylake that shipped with broken firmware where enabling
    hardware-coordinated P-states management (HWP) causes a faulty
    interrupt handler in SMM to be invoked and crash the system (Srinivas
    Pandruvada)"

    * tag 'pm+acpi-4.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    ACPI / processor: Request native thermal interrupt handling via _OSC

    Linus Torvalds
     
  • * acpi-processor:
    ACPI / processor: Request native thermal interrupt handling via _OSC

    Rafael J. Wysocki
     
  • The recently introduced batched invalidations mechanism uses its own
    mechanism for shootdown. However, it does wrong accounting of
    interrupts (e.g., inc_irq_stat is called for local invalidations),
    trace-points (e.g., TLB_REMOTE_SHOOTDOWN for local invalidations) and
    may break some platforms as it bypasses the invalidation mechanisms of
    Xen and SGI UV.

    This patch reuses the existing TLB flushing mechnaisms instead. We use
    NULL as mm to indicate a global invalidation is required.

    Fixes 72b252aed506b8 ("mm: send one IPI per CPU to TLB flush all entries after unmapping pages")
    Signed-off-by: Nadav Amit
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Dave Hansen
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadav Amit
     
  • TLB_REMOTE_SEND_IPI was recently introduced, but it counts bytes instead
    of pages. In addition, it does not report correctly the case in which
    flush_tlb_page flushes a page. Fix it to be consistent with other TLB
    counters.

    Fixes: 5b74283ab251b9d ("x86, mm: trace when an IPI is about to be sent")
    Signed-off-by: Nadav Amit
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Dave Hansen
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadav Amit
     

01 Apr, 2016

4 commits

  • In absence of shadow dirty mask, there is no need to set page dirty
    if page has never been writable. This is a tiny optimization but
    good to have for people who care much about dirty page tracking.

    Signed-off-by: Yu Zhao
    Signed-off-by: Paolo Bonzini

    Yu Zhao
     
  • Windows lets applications choose the frequency of the timer tick,
    and in Windows 10 the maximum rate was changed from 1024 Hz to
    2048 Hz. Unfortunately, because of the way the Windows API
    works, most applications who need a higher rate than the default
    64 Hz will just do

    timeGetDevCaps(&tc, sizeof(tc));
    timeBeginPeriod(tc.wPeriodMin);

    and pick the maximum rate. This causes very high CPU usage when
    playing media or games on Windows 10, even if the guest does not
    actually use the CPU very much, because the frequent timer tick
    causes halt_poll_ns to kick in.

    There is no really good solution, especially because Microsoft
    could sooner or later bump the limit to 4096 Hz, but for now
    the best we can do is lower a bit the upper limit for
    halt_poll_ns. :-(

    Reported-by: Jon Panozzo
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • If SynIC is disabled, there is nothing that userspace can do to
    handle these exits; on the other hand, userspace probably will
    not know about KVM_EXIT_HYPERV_HCALL and complain about it or
    even exit. Just prevent anything bad from happening by handling
    the hypercall in KVM and returning an "invalid hypercall" code.

    Fixes: 83326e43f27e9a8a501427a0060f8af519a39bb2
    Cc: Andrey Smetanin
    Reviewed-by: Roman Kagan
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • Non maskable interrupts (NMI) are preferred to interrupts in current
    implementation. If a NMI is pending and NMI is blocked by the result
    of nmi_allowed(), pending interrupt is not injected and
    enable_irq_window() is not executed, even if interrupts injection is
    allowed.

    In old kernel (e.g. 2.6.32), schedule() is often called in NMI context.
    In this case, interrupts are needed to execute iret that intends end
    of NMI. The flag of blocking new NMI is not cleared until the guest
    execute the iret, and interrupts are blocked by pending NMI. Due to
    this, iret can't be invoked in the guest, and the guest is starved
    until block is cleared by some events (e.g. canceling injection).

    This patch injects pending interrupts, when it's allowed, even if NMI
    is blocked. And, If an interrupts is pending after executing
    inject_pending_event(), enable_irq_window() is executed regardless of
    NMI pending counter.

    Cc: stable@vger.kernel.org
    Signed-off-by: Yuki Shibuya
    Suggested-by: Paolo Bonzini
    Signed-off-by: Paolo Bonzini

    Yuki Shibuya
     

31 Mar, 2016

1 commit

  • Patch 5a50f5291701 ("perf/x86/ibs: Fix race with IBS_STARTING state")
    closed a big hole while opening another, smaller hole.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Fixes: 5a50f5291701 ("perf/x86/ibs: Fix race with IBS_STARTING state")
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

29 Mar, 2016

9 commits

  • This call has always been missing from xen_play dead() but until
    recently this was rather benign. With new cpu hotplug framework
    (commit 8df3e07e7f21 ("cpu/hotplug: Let upcoming cpu bring itself fully up").
    however this call is required, otherwise a hot-plugged CPU will not
    be properly brough up (by never calling cpuhp_online_idle())

    Signed-off-by: Boris Ostrovsky
    Signed-off-by: Konrad Rzeszutek Wilk

    Boris Ostrovsky
     
  • Linux 4.6-rc1

    * tag 'v4.6-rc1': (12823 commits)
    Linux 4.6-rc1
    f2fs/crypto: fix xts_tweak initialization
    NTB: Remove _addr functions from ntb_hw_amd
    orangefs: fix orangefs_superblock locking
    orangefs: fix do_readv_writev() handling of error halfway through
    orangefs: have ->kill_sb() evict the VFS side of things first
    orangefs: sanitize ->llseek()
    orangefs-bufmap.h: trim unused junk
    orangefs: saner calling conventions for getting a slot
    orangefs_copy_{to,from}_bufmap(): don't pass bufmap pointer
    orangefs: get rid of readdir_handle_s
    thp: fix typo in khugepaged_scan_pmd()
    MAINTAINERS: fill entries for KASAN
    mm/filemap: generic_file_read_iter(): check for zero reads unconditionally
    kasan: test fix: warn if the UAF could not be detected in kmalloc_uaf2
    mm, kasan: stackdepot implementation. Enable stackdepot for SLAB
    arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections
    mm, kasan: add GFP flags to KASAN API
    mm, kasan: SLAB support
    kasan: modify kmalloc_large_oob_right(), add kmalloc_pagealloc_oob_right()
    ...

    Konrad Rzeszutek Wilk
     
  • Bit 11 of CPUID 8000_0007 edx is processor feedback interface.
    Bit 12 of CPUID 8000_0007 edx is accumulated power.

    Print proper names in proc/cpuinfo

    Reported-and-tested-by: Borislav Petkov
    Signed-off-by: Huang Rui
    Cc: Tony Li
    Cc: Fenghua Yu
    Cc: Tony Luck
    Cc: Peter Zijlstra
    Cc: "Rafael J. Wysocki"
    Cc: Andy Lutomirski
    Cc: Fengguang Wu
    Cc: Sherry Hurwitz
    Cc: Borislav Petkov
    Cc: "Len Brown"
    Link: http://lkml.kernel.org/r/1458871720-3209-1-git-send-email-ray.huang@amd.com
    Signed-off-by: Thomas Gleixner

    Huang Rui
     
  • We have

    #ifndef __ASSEMBLY__
    ...
    #endif

    #ifndef __ASSEMBLY__
    ...
    #endif

    Merge the two.

    No functionality change.

    Signed-off-by: Borislav Petkov
    Link: http://lkml.kernel.org/r/1459189217-25532-1-git-send-email-bp@alien8.de
    Signed-off-by: Thomas Gleixner

    Borislav Petkov
     
  • The list of CPU model specific registers contains two copies of TDP
    registers, remove the one, which is out of numerical order in the
    list.

    Fixes: 6a35fc2d6c22 ("cpufreq: intel_pstate: get P1 from TAR when available")
    Signed-off-by: Vladimir Zapolskiy
    Cc: Len Brown
    Cc: "Rafael J. Wysocki"
    Cc: Kristen Carlson
    Accardi
    Cc: Srinivas Pandruvada
    Link: http://lkml.kernel.org/r/1459018020-24577-1-git-send-email-vladimir_zapolskiy@mentor.com
    Signed-off-by: Thomas Gleixner

    Vladimir Zapolskiy
     
  • It is cpu_core_id anyway.

    Signed-off-by: Borislav Petkov
    Link: http://lkml.kernel.org/r/1458917557-8757-3-git-send-email-bp@alien8.de
    Signed-off-by: Thomas Gleixner

    Borislav Petkov
     
  • Avoid allocating the AMD NB event constraints data structure when not
    needed. This gets rid of x86_max_cores usage and avoids allocating
    this on AMD Core Perfctr supporting hardware (which has separate MSRs
    for NB events).

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Borislav Petkov
    Cc: aherrmann@suse.com
    Cc: Rui Huang
    Cc: Borislav Petkov
    Cc: jencce.kernel@gmail.com
    Link: http://lkml.kernel.org/r/20160320124629.GY6375@twins.programming.kicks-ass.net
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     
  • It turns out AMD gets x86_max_cores wrong when there are compute
    units.

    The issue is that Linux assumes:

    nr_logical_cpus = nr_cores * nr_siblings

    But AMD reports its CU unit as 2 cores, but then sets num_smp_siblings
    to 2 as well.

    Boris: fixup ras/mce_amd_inj.c too, to compute the Node Base Core
    properly, according to the new nomenclature.

    Fixes: 1f12e32f4cd5 ("x86/topology: Create logical package id")
    Reported-by: Xiong Zhou
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Borislav Petkov
    Cc: Andreas Herrmann
    Cc: Andy Lutomirski
    Link: http://lkml.kernel.org/r/20160317095220.GO6344@twins.programming.kicks-ass.net
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     
  • Update the definition of memcpy_from_pmem() to return 0 or a negative
    error code. Implement x86/arch_memcpy_from_pmem() with memcpy_mcsafe().

    Cc: Borislav Petkov
    Cc: Tony Luck
    Cc: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Acked-by: Ingo Molnar
    Reviewed-by: Ross Zwisler
    Signed-off-by: Dan Williams

    Dan Williams
     

26 Mar, 2016

3 commits

  • There are several reports of freeze on enabling HWP (Hardware PStates)
    feature on Skylake-based systems by the Intel P-states driver. The root
    cause is identified as the HWP interrupts causing BIOS code to freeze.

    HWP interrupts use the thermal LVT which can be handled by Linux
    natively, but on the affected Skylake-based systems SMM will respond
    to it by default. This is a problem for several reasons:
    - On the affected systems the SMM thermal LVT handler is broken (it
    will crash when invoked) and a BIOS update is necessary to fix it.
    - With thermal interrupt handled in SMM we lose all of the reporting
    features of the arch/x86/kernel/cpu/mcheck/therm_throt driver.
    - Some thermal drivers like x86-package-temp depend on the thermal
    threshold interrupts signaled via the thermal LVT.
    - The HWP interrupts are useful for debugging and tuning
    performance (if the kernel can handle them).
    The native handling of thermal interrupts needs to be enabled
    because of that.

    This requires some way to tell SMM that the OS can handle thermal
    interrupts. That can be done by using _OSC/_PDC in processor
    scope very early during ACPI initialization.

    The meaning of _OSC/_PDC bit 12 in processor scope is whether or
    not the OS supports native handling of interrupts for Collaborative
    Processor Performance Control (CPPC) notifications. Since on
    HWP-capable systems CPPC is a firmware interface to HWP, setting
    this bit effectively tells the firmware that the OS will handle
    thermal interrupts natively going forward.

    For details on _OSC/_PDC refer to:
    http://www.intel.com/content/www/us/en/standards/processor-vendor-specific-acpi-specification.html

    To implement the _OSC/_PDC handshake as described, introduce a new
    function, acpi_early_processor_osc(), that walks the ACPI
    namespace looking for ACPI processor objects and invokes _OSC for
    them with bit 12 in the capabilities buffer set and terminates the
    namespace walk on the first success.

    Also modify intel_thermal_interrupt() to clear HWP status bits in
    the HWP_STATUS MSR to acknowledge HWP interrupts (which prevents
    them from firing continuously).

    Signed-off-by: Srinivas Pandruvada
    [ rjw: Subject & changelog, function rename ]
    Signed-off-by: Rafael J. Wysocki

    Srinivas Pandruvada
     
  • Implement the stack depot and provide CONFIG_STACKDEPOT. Stack depot
    will allow KASAN store allocation/deallocation stack traces for memory
    chunks. The stack traces are stored in a hash table and referenced by
    handles which reside in the kasan_alloc_meta and kasan_free_meta
    structures in the allocated memory chunks.

    IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
    duplication.

    Right now stackdepot support is only enabled in SLAB allocator. Once
    KASAN features in SLAB are on par with those in SLUB we can switch SLUB
    to stackdepot as well, thus removing the dependency on SLUB stack
    bookkeeping, which wastes a lot of memory.

    This patch is based on the "mm: kasan: stack depots" patch originally
    prepared by Dmitry Chernenkov.

    Joonsoo has said that he plans to reuse the stackdepot code for the
    mm/page_owner.c debugging facility.

    [akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
    [aryabinin@virtuozzo.com: comment style fixes]
    Signed-off-by: Alexander Potapenko
    Signed-off-by: Andrey Ryabinin
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Steven Rostedt
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • KASAN needs to know whether the allocation happens in an IRQ handler.
    This lets us strip everything below the IRQ entry point to reduce the
    number of unique stack traces needed to be stored.

    Move the definition of __irq_entry to so that the
    users don't need to pull in . Also introduce the
    __softirq_entry macro which is similar to __irq_entry, but puts the
    corresponding functions to the .softirqentry.text section.

    Signed-off-by: Alexander Potapenko
    Acked-by: Steven Rostedt
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     

25 Mar, 2016

5 commits

  • Currently Xen uses default_cpu_present_to_apicid() which will always
    report BAD_APICID for PV guests since x86_bios_cpu_apic_id is initialised
    to that value and is never updated.

    With commit 1f12e32f4cd5 ("x86/topology: Create logical package id"), this
    op is now called by smp_init_package_map() when deciding whether to call
    topology_update_package_map() which sets cpu_data(cpu).logical_proc_id.
    The latter (as topology_logical_package_id(cpu)) may be used, for example,
    by cpu_to_rapl_pmu() as an array index. Since uninitialized
    logical_package_id is set to -1, the index will become 64K which is
    obviously problematic.

    While RAPL code (and any other users of logical_package_id) should be
    careful in their assumptions about id's validity, Xen's
    cpu_present_to_apicid op should still provide value consistent with its
    own xen_apic_read(APIC_ID).

    Signed-off-by: Boris Ostrovsky
    Signed-off-by: Konrad Rzeszutek Wilk

    Boris Ostrovsky
     
  • randconfig builds can sometimes disable CONFIG_CPU_SUP_INTEL while
    enabling the AMD power reporting PMU driver, resulting in this
    build failure:

    arch/x86/kernel/cpu/perf_event.h:663:31: error: 'events_sysfs_show' undeclared here (not in a function)

    To fix it, move events_sysfs_show() outside of #ifdef CONFIG_CPU_SUP_INTEL.

    Reported-by: Randy Dunlap
    Reported-by: build test robot
    Signed-off-by: Huang Rui
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Fengguang Wu
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Sherry Hurwitz
    Cc: Stephen Rothwell
    Cc: Thomas Gleixner
    Cc: acme@kernel.org
    Cc: kbuild-all@01.org
    Cc: linux-next@vger.kernel.org
    Cc: spg_linux_kernel@amd.com
    Link: http://lkml.kernel.org/r/1458875905-4278-1-git-send-email-ray.huang@amd.com
    Signed-off-by: Ingo Molnar

    Huang Rui
     
  • Pull tracing updates from Steven Rostedt:
    "Nothing major this round. Mostly small clean ups and fixes.

    Some visible changes:

    - A new flag was added to distinguish traces done in NMI context.

    - Preempt tracer now shows functions where preemption is disabled but
    interrupts are still enabled.

    Other notes:

    - Updates were done to function tracing to allow better performance
    with perf.

    - Infrastructure code has been added to allow for a new histogram
    feature for recording live trace event histograms that can be
    configured by simple user commands. The feature itself was just
    finished, but needs a round in linux-next before being pulled.

    This only includes some infrastructure changes that will be needed"

    * tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (22 commits)
    tracing: Record and show NMI state
    tracing: Fix trace_printk() to print when not using bprintk()
    tracing: Remove redundant reset per-CPU buff in irqsoff tracer
    x86: ftrace: Fix the misleading comment for arch/x86/kernel/ftrace.c
    tracing: Fix crash from reading trace_pipe with sendfile
    tracing: Have preempt(irqs)off trace preempt disabled functions
    tracing: Fix return while holding a lock in register_tracer()
    ftrace: Use kasprintf() in ftrace_profile_tracefs()
    ftrace: Update dynamic ftrace calls only if necessary
    ftrace: Make ftrace_hash_rec_enable return update bool
    tracing: Fix typoes in code comment and printk in trace_nop.c
    tracing, writeback: Replace cgroup path to cgroup ino
    tracing: Use flags instead of bool in trigger structure
    tracing: Add an unreg_all() callback to trigger commands
    tracing: Add needs_rec flag to event triggers
    tracing: Add a per-event-trigger 'paused' field
    tracing: Add get_syscall_name()
    tracing: Add event record param to trigger_ops.func()
    tracing: Make event trigger functions available
    tracing: Make ftrace_event_field checking functions available
    ...

    Linus Torvalds
     
  • Pull perf fixes from Ingo Molnar:
    "This tree contains various perf fixes on the kernel side, plus three
    hw/event-enablement late additions:

    - Intel Memory Bandwidth Monitoring events and handling
    - the AMD Accumulated Power Mechanism reporting facility
    - more IOMMU events

    ... and a final round of perf tooling updates/fixes"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
    perf llvm: Use strerror_r instead of the thread unsafe strerror one
    perf llvm: Use realpath to canonicalize paths
    perf tools: Unexport some methods unused outside strbuf.c
    perf probe: No need to use formatting strbuf method
    perf help: Use asprintf instead of adhoc equivalents
    perf tools: Remove unused perf_pathdup, xstrdup functions
    perf tools: Do not include stringify.h from the kernel sources
    tools include: Copy linux/stringify.h from the kernel
    tools lib traceevent: Remove redundant CPU output
    perf tools: Remove needless 'extern' from function prototypes
    perf tools: Simplify die() mechanism
    perf tools: Remove unused DIE_IF macro
    perf script: Remove lots of unused arguments
    perf thread: Rename perf_event__preprocess_sample_addr to thread__resolve
    perf machine: Rename perf_event__preprocess_sample to machine__resolve
    perf tools: Add cpumode to struct perf_sample
    perf tests: Forward the perf_sample in the dwarf unwind test
    perf tools: Remove misplaced __maybe_unused
    perf list: Fix documentation of :ppp
    perf bench numa: Fix assertion for nodes bitfield
    ...

    Linus Torvalds
     
  • Pull x86 fixes from Ingo Molnar:
    "Misc fixes:

    - fix hotplug bugs
    - fix irq live lock
    - fix various topology handling bugs
    - fix APIC ACK ordering
    - fix PV iopl handling
    - fix speling
    - fix/tweak memcpy_mcsafe() return value
    - fix fbcon bug
    - remove stray prototypes"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/msr: Remove unused native_read_tscp()
    x86/apic: Remove declaration of unused hw_nmi_is_cpu_stuck
    x86/oprofile/nmi: Add missing hotplug FROZEN handling
    x86/hpet: Use proper mask to modify hotplug action
    x86/apic/uv: Fix the hotplug notifier
    x86/apb/timer: Use proper mask to modify hotplug action
    x86/topology: Use total_cpus not nr_cpu_ids for logical packages
    x86/topology: Fix Intel HT disable
    x86/topology: Fix logical package mapping
    x86/irq: Cure live lock in fixup_irqs()
    x86/tsc: Prevent NULL pointer deref in calibrate_delay_is_known()
    x86/apic: Fix suspicious RCU usage in smp_trace_call_function_interrupt()
    x86/iopl: Fix iopl capability check on Xen PV
    x86/iopl/64: Properly context-switch IOPL on Xen PV
    selftests/x86: Add an iopl test
    x86/mm, x86/mce: Fix return type/value for memcpy_mcsafe()
    x86/video: Don't assume all FB devices are PCI devices
    arch/x86/irq: Purge useless handler declarations from hw_irq.h
    x86: Fix misspellings in comments

    Linus Torvalds
     

23 Mar, 2016

5 commits

  • After e76b027 ("x86,vdso: Use LSL unconditionally for vgetcpu")
    native_read_tscp() is unused in the kernel. The function can be removed like
    native_read_tsc() was.

    Signed-off-by: Prarit Bhargava
    Acked-by: Andy Lutomirski
    Cc: Borislav Petkov
    Link: http://lkml.kernel.org/r/1458687968-9106-1-git-send-email-prarit@redhat.com
    Signed-off-by: Thomas Gleixner

    Prarit Bhargava
     
  • Merge third patch-bomb from Andrew Morton:

    - more ocfs2 changes

    - a few hotfixes

    - Andy's compat cleanups

    - misc fixes to fatfs, ptrace, coredump, cpumask, creds, eventfd,
    panic, ipmi, kgdb, profile, kfifo, ubsan, etc.

    - many rapidio updates: fixes, new drivers.

    - kcov: kernel code coverage feature. Like gcov, but not
    "prohibitively expensive".

    - extable code consolidation for various archs

    * emailed patches from Andrew Morton : (81 commits)
    ia64/extable: use generic search and sort routines
    x86/extable: use generic search and sort routines
    s390/extable: use generic search and sort routines
    alpha/extable: use generic search and sort routines
    kernel/...: convert pr_warning to pr_warn
    drivers: dma-coherent: use memset_io for DMA_MEMORY_IO mappings
    drivers: dma-coherent: use MEMREMAP_WC for DMA_MEMORY_MAP
    memremap: add MEMREMAP_WC flag
    memremap: don't modify flags
    kernel/signal.c: add compile-time check for __ARCH_SI_PREAMBLE_SIZE
    mm/mprotect.c: don't imply PROT_EXEC on non-exec fs
    ipc/sem: make semctl setting sempid consistent
    ubsan: fix tree-wide -Wmaybe-uninitialized false positives
    kfifo: fix sparse complaints
    scripts/gdb: account for changes in module data structure
    scripts/gdb: add cmdline reader command
    scripts/gdb: add version command
    kernel: add kcov code coverage
    profile: hide unused functions when !CONFIG_PROC_FS
    hpwdt: use nmi_panic() when kernel panics in NMI handler
    ...

    Linus Torvalds
     
  • Pull more KVM updates from Paolo Bonzini:
    "Second round of KVM changes for 4.6:

    - build fixes for PPC KVM
    - miscellaneous bugfixes for ARM KVM
    - cleanup of memory barrier and removal of redundant barriers
    - x86 fixes: page tracking oops, support for old buggy KVM nested on 4.5
    - support for protection keys in guests
    - lockdep fix
    - another conversion to simple wait queues and raw spinlocks,
    backported from PREEMPT_RT"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits)
    KVM: page_track: fix access to NULL slot
    KVM: PPC: do not compile in vfio.o unconditionally
    kvm, rt: change async pagefault code locking for PREEMPT_RT
    KVM/PPC: update the comment of memory barrier in the kvmppc_prepare_to_enter()
    KVM/x86: update the comment of memory barrier in the vcpu_enter_guest()
    KVM: Replace smp_mb() with smp_load_acquire() in the kvm_flush_remote_tlbs()
    KVM/x86: Call smp_wmb() before increasing tlbs_dirty
    KVM: Replace smp_mb() with smp_mb_after_atomic() in the kvm_make_all_cpus_request()
    KVM/x86: Replace smp_mb() with smp_store_mb/release() in the walk_shadow_page_lockless_begin/end()
    KVM: Remove redundant smp_mb() in the kvm_mmu_commit_zap_page()
    KVM, pkeys: expose CPUID/CR4 to guest
    KVM, pkeys: add pkeys support for permission_fault
    KVM, pkeys: introduce pkru_mask to cache conditions
    KVM, pkeys: save/restore PKRU when guest/host switches
    x86: pkey: introduce write_pkru() for KVM
    KVM, pkeys: add pkeys support for xsave state
    KVM, pkeys: disable pkeys for guests in non-paging mode
    KVM: x86: remove magic number with enum cpuid_leafs
    KVM: MMU: return page fault error code from permission_fault
    KVM: fix spin_lock_init order on x86
    ...

    Linus Torvalds
     
  • Replace the arch specific versions of search_extable() and
    sort_extable() with calls to the generic ones, which now support
    relative exception tables as well.

    Signed-off-by: Ard Biesheuvel
    Acked-by: H. Peter Anvin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ard Biesheuvel
     
  • kcov provides code coverage collection for coverage-guided fuzzing
    (randomized testing). Coverage-guided fuzzing is a testing technique
    that uses coverage feedback to determine new interesting inputs to a
    system. A notable user-space example is AFL
    (http://lcamtuf.coredump.cx/afl/). However, this technique is not
    widely used for kernel testing due to missing compiler and kernel
    support.

    kcov does not aim to collect as much coverage as possible. It aims to
    collect more or less stable coverage that is function of syscall inputs.
    To achieve this goal it does not collect coverage in soft/hard
    interrupts and instrumentation of some inherently non-deterministic or
    non-interesting parts of kernel is disbled (e.g. scheduler, locking).

    Currently there is a single coverage collection mode (tracing), but the
    API anticipates additional collection modes. Initially I also
    implemented a second mode which exposes coverage in a fixed-size hash
    table of counters (what Quentin used in his original patch). I've
    dropped the second mode for simplicity.

    This patch adds the necessary support on kernel side. The complimentary
    compiler support was added in gcc revision 231296.

    We've used this support to build syzkaller system call fuzzer, which has
    found 90 kernel bugs in just 2 months:

    https://github.com/google/syzkaller/wiki/Found-Bugs

    We've also found 30+ bugs in our internal systems with syzkaller.
    Another (yet unexplored) direction where kcov coverage would greatly
    help is more traditional "blob mutation". For example, mounting a
    random blob as a filesystem, or receiving a random blob over wire.

    Why not gcov. Typical fuzzing loop looks as follows: (1) reset
    coverage, (2) execute a bit of code, (3) collect coverage, repeat. A
    typical coverage can be just a dozen of basic blocks (e.g. an invalid
    input). In such context gcov becomes prohibitively expensive as
    reset/collect coverage steps depend on total number of basic
    blocks/edges in program (in case of kernel it is about 2M). Cost of
    kcov depends only on number of executed basic blocks/edges. On top of
    that, kernel requires per-thread coverage because there are always
    background threads and unrelated processes that also produce coverage.
    With inlined gcov instrumentation per-thread coverage is not possible.

    kcov exposes kernel PCs and control flow to user-space which is
    insecure. But debugfs should not be mapped as user accessible.

    Based on a patch by Quentin Casasnovas.

    [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
    [akpm@linux-foundation.org: unbreak allmodconfig]
    [akpm@linux-foundation.org: follow x86 Makefile layout standards]
    Signed-off-by: Dmitry Vyukov
    Reviewed-by: Kees Cook
    Cc: syzkaller
    Cc: Vegard Nossum
    Cc: Catalin Marinas
    Cc: Tavis Ormandy
    Cc: Will Deacon
    Cc: Quentin Casasnovas
    Cc: Kostya Serebryany
    Cc: Eric Dumazet
    Cc: Alexander Potapenko
    Cc: Kees Cook
    Cc: Bjorn Helgaas
    Cc: Sasha Levin
    Cc: David Drysdale
    Cc: Ard Biesheuvel
    Cc: Andrey Ryabinin
    Cc: Kirill A. Shutemov
    Cc: Jiri Slaby
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov