26 Dec, 2016

2 commits

  • Pull timer type cleanups from Thomas Gleixner:
    "This series does a tree wide cleanup of types related to
    timers/timekeeping.

    - Get rid of cycles_t and use a plain u64. The type is not really
    helpful and caused more confusion than clarity

    - Get rid of the ktime union. The union has become useless as we use
    the scalar nanoseconds storage unconditionally now. The 32bit
    timespec alike storage got removed due to the Y2038 limitations
    some time ago.

    That leaves the odd union access around for no reason. Clean it up.

    Both changes have been done with coccinelle and a small amount of
    manual mopping up"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    ktime: Get rid of ktime_equal()
    ktime: Cleanup ktime_set() usage
    ktime: Get rid of the union
    clocksource: Use a plain u64 instead of cycle_t

    Linus Torvalds
     
  • Pull SMP hotplug notifier removal from Thomas Gleixner:
    "This is the final cleanup of the hotplug notifier infrastructure. The
    series has been reintgrated in the last two days because there came a
    new driver using the old infrastructure via the SCSI tree.

    Summary:

    - convert the last leftover drivers utilizing notifiers

    - fixup for a completely broken hotplug user

    - prevent setup of already used states

    - removal of the notifiers

    - treewide cleanup of hotplug state names

    - consolidation of state space

    There is a sphinx based documentation pending, but that needs review
    from the documentation folks"

    * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irqchip/armada-xp: Consolidate hotplug state space
    irqchip/gic: Consolidate hotplug state space
    coresight/etm3/4x: Consolidate hotplug state space
    cpu/hotplug: Cleanup state names
    cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions
    staging/lustre/libcfs: Convert to hotplug state machine
    scsi/bnx2i: Convert to hotplug state machine
    scsi/bnx2fc: Convert to hotplug state machine
    cpu/hotplug: Prevent overwriting of callbacks
    x86/msr: Remove bogus cleanup from the error path
    bus: arm-ccn: Prevent hotplug callback leak
    perf/x86/intel/cstate: Prevent hotplug callback leak
    ARM/imx/mmcd: Fix broken cpu hotplug handling
    scsi: qedi: Convert to hotplug state machine

    Linus Torvalds
     

25 Dec, 2016

4 commits

  • There is no point in having an extra type for extra confusion. u64 is
    unambiguous.

    Conversion was done with the following coccinelle script:

    @rem@
    @@
    -typedef u64 cycle_t;

    @fix@
    typedef cycle_t;
    @@
    -cycle_t
    +u64

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: John Stultz

    Thomas Gleixner
     
  • When the state names got added a script was used to add the extra argument
    to the calls. The script basically converted the state constant to a
    string, but the cleanup to convert these strings into meaningful ones did
    not happen.

    Replace all the useless strings with 'subsys/xxx/yyy:state' strings which
    are used in all the other places already.

    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Sebastian Siewior
    Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • The error cleanup which is invoked when the hotplug state setup failed
    tries to remove the failed state, which is broken.

    Fixes: 8fba38c937cd ("x86/msr: Convert to hotplug state machine")
    Reported-by: kernel test robot
    Signed-off-by: Thomas Gleixner
    Cc: Sebastian Siewior

    Thomas Gleixner
     
  • This was entirely automated, using the script by Al:

    PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
    sed -i -e "s!$PATT!#include !" \
    $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

    to do the replacement at the end of the merge window.

    Requested-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

24 Dec, 2016

2 commits

  • Pull x86 fixes from Ingo Molnar:
    "There's a number of fixes:

    - a round of fixes for CPUID-less legacy CPUs
    - a number of microcode loader fixes
    - i8042 detection robustization fixes
    - stack dump/unwinder fixes
    - x86 SoC platform driver fixes
    - a GCC 7 warning fix
    - virtualization related fixes"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    Revert "x86/unwind: Detect bad stack return address"
    x86/paravirt: Mark unused patch_default label
    x86/microcode/AMD: Reload proper initrd start address
    x86/platform/intel/quark: Add printf attribute to imr_self_test_result()
    x86/platform/intel-mid: Switch MPU3050 driver to IIO
    x86/alternatives: Do not use sync_core() to serialize I$
    x86/topology: Document cpu_llc_id
    x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
    x86/asm: Rewrite sync_core() to use IRET-to-self
    x86/microcode/intel: Replace sync_core() with native_cpuid()
    Revert "x86/boot: Fail the boot if !M486 and CPUID is missing"
    x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
    x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6
    x86/tools: Fix gcc-7 warning in relocs.c
    x86/unwind: Dump stack data on warnings
    x86/unwind: Adjust last frame check for aligned function stacks
    x86/init: Fix a couple of comment typos
    x86/init: Remove i8042_detect() from platform ops
    Input: i8042 - Trust firmware a bit more when probing on X86
    x86/init: Add i8042 state to the platform data
    ...

    Linus Torvalds
     
  • Revert the following commit:

    b6959a362177 ("x86/unwind: Detect bad stack return address")

    ... because Andrey Konovalov reported an unwinder warning:

    WARNING: unrecognized kernel stack return address ffffffffa0000001 at ffff88006377fa18 in a.out:4467

    The unwind was initiated from an interrupt which occurred while running in the
    generated code for a kprobe. The unwinder printed the warning because it
    expected regs->ip to point to a valid text address, but instead it pointed to
    the generated code.

    Eventually we may want come up with a way to identify generated kprobe
    code so the unwinder can know that it's a valid return address. Until
    then, just remove the warning.

    Reported-by: Andrey Konovalov
    Signed-off-by: Josh Poimboeuf
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/02f296848fbf49fb72dfeea706413ecbd9d4caf6.1482418739.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar

    Josh Poimboeuf
     

23 Dec, 2016

2 commits

  • Pull x86 cache allocation interface from Thomas Gleixner:
    "This provides support for Intel's Cache Allocation Technology, a cache
    partitioning mechanism.

    The interface is odd, but the hardware interface of that CAT stuff is
    odd as well.

    We tried hard to come up with an abstraction, but that only allows
    rather simple partitioning, but no way of sharing and dealing with the
    per package nature of this mechanism.

    In the end we decided to expose the allocation bitmaps directly so all
    combinations of the hardware can be utilized.

    There are two ways of associating a cache partition:

    - Task

    A task can be added to a resource group. It uses the cache
    partition associated to the group.

    - CPU

    All tasks which are not member of a resource group use the group to
    which the CPU they are running on is associated with.

    That allows for simple CPU based partitioning schemes.

    The main expected user sare:

    - Virtualization so a VM can only trash only the associated part of
    the cash w/o disturbing others

    - Real-Time systems to seperate RT and general workloads.

    - Latency sensitive enterprise workloads

    - In theory this also can be used to protect against cache side
    channel attacks"

    [ Intel RDT is "Resource Director Technology". The interface really is
    rather odd and very specific, which delayed this pull request while I
    was thinking about it. The pull request itself came in early during
    the merge window, I just delayed it until things had calmed down and I
    had more time.

    But people tell me they'll use this, and the good news is that it is
    _so_ specific that it's rather independent of anything else, and no
    user is going to depend on the interface since it's pretty rare. So if
    push comes to shove, we can just remove the interface and nothing will
    break ]

    * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
    x86/intel_rdt: Implement show_options() for resctrlfs
    x86/intel_rdt: Call intel_rdt_sched_in() with preemption disabled
    x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount
    x86/intel_rdt: Fix setting of closid when adding CPUs to a group
    x86/intel_rdt: Update percpu closid immeditately on CPUs affected by changee
    x86/intel_rdt: Reset per cpu closids on unmount
    x86/intel_rdt: Select KERNFS when enabling INTEL_RDT_A
    x86/intel_rdt: Prevent deadlock against hotplug lock
    x86/intel_rdt: Protect info directory from removal
    x86/intel_rdt: Add info files to Documentation
    x86/intel_rdt: Export the minimum number of set mask bits in sysfs
    x86/intel_rdt: Propagate error in rdt_mount() properly
    x86/intel_rdt: Add a missing #include
    MAINTAINERS: Add maintainer for Intel RDT resource allocation
    x86/intel_rdt: Add scheduler hook
    x86/intel_rdt: Add schemata file
    x86/intel_rdt: Add tasks files
    x86/intel_rdt: Add cpus file
    x86/intel_rdt: Add mkdir to resctrl file system
    x86/intel_rdt: Add "info" files to resctrl file system
    ...

    Linus Torvalds
     
  • A bugfix commit:

    45dbea5f55c0 ("x86/paravirt: Fix native_patch()")

    ... introduced a harmless warning:

    arch/x86/kernel/paravirt_patch_32.c: In function 'native_patch':
    arch/x86/kernel/paravirt_patch_32.c:71:1: error: label 'patch_default' defined but not used [-Werror=unused-label]

    Fix it by annotating the label as __maybe_unused.

    Reported-by: Arnd Bergmann
    Reported-by: Piotr Gregor
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: 45dbea5f55c0 ("x86/paravirt: Fix native_patch()")
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

21 Dec, 2016

1 commit

  • When we switch to virtual addresses and, especially after
    reserve_initrd()->relocate_initrd() have run, we have the updated initrd
    address in initrd_start. Use initrd_start then instead of the address
    which has been passed to us through boot params. (That still gets used
    when we're running the very early routines on the BSP).

    Reported-and-tested-by: Boris Ostrovsky
    Signed-off-by: Borislav Petkov
    Link: http://lkml.kernel.org/r/20161220144012.lc4cwrg6dphqbyqu@pd.tnic
    Signed-off-by: Thomas Gleixner

    Borislav Petkov
     

20 Dec, 2016

2 commits

  • We use sync_core() in the alternatives code to stop speculative
    execution of prefetched instructions because we are potentially changing
    them and don't want to execute stale bytes.

    What it does on most machines is call CPUID which is a serializing
    instruction. And that's expensive.

    However, the instruction cache is serialized when we're on the local CPU
    and are changing the data through the same virtual address. So then, we
    don't need the serializing CPUID but a simple control flow change. Last
    being accomplished with a CALL/RET which the noinline causes.

    Suggested-by: Linus Torvalds
    Signed-off-by: Borislav Petkov
    Reviewed-by: Andy Lutomirski
    Cc: Andrew Cooper
    Cc: Andy Lutomirski
    Cc: Brian Gerst
    Cc: Henrique de Moraes Holschuh
    Cc: Matthew Whitehead
    Cc: One Thousand Gnomes
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20161203150258.vwr5zzco7ctgc4pe@pd.tnic
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • There is a feature in Hyper-V ('Debug-VM --InjectNonMaskableInterrupt')
    which injects NMI to the guest. We may want to crash the guest and do kdump
    on this NMI by enabling unknown_nmi_panic. To make kdump succeed we need to
    allow the kdump kernel to re-establish VMBus connection so it will see
    VMBus devices (storage, network,..).

    To properly unload VMBus making it possible to start over during kdump we
    need to do the following:

    - Send an 'unload' message to the hypervisor. This can be done on any CPU
    so we do this the crashing CPU.

    - Receive the 'unload finished' reply message. WS2012R2 delivers this
    message to the CPU which was used to establish VMBus connection during
    module load and this CPU may differ from the CPU sending 'unload'.

    Receiving a VMBus message means the following:

    - There is a per-CPU slot in memory for one message. This slot can in
    theory be accessed by any CPU.

    - We get an interrupt on the CPU when a message was placed into the slot.

    - When we read the message we need to clear the slot and signal the fact
    to the hypervisor. In case there are more messages to this CPU pending
    the hypervisor will deliver the next message. The signaling is done by
    writing to an MSR so this can only be done on the appropriate CPU.

    To avoid doing cross-CPU work on crash we have vmbus_wait_for_unload()
    function which checks message slots for all CPUs in a loop waiting for the
    'unload finished' messages. However, there is an issue which arises when
    these conditions are met:

    - We're crashing on a CPU which is different from the one which was used
    to initially contact the hypervisor.

    - The CPU which was used for the initial contact is blocked with interrupts
    disabled and there is a message pending in the message slot.

    In this case we won't be able to read the 'unload finished' message on the
    crashing CPU. This is reproducible when we receive unknown NMIs on all CPUs
    simultaneously: the first CPU entering panic() will proceed to crash and
    all other CPUs will stop themselves with interrupts disabled.

    The suggested solution is to handle unknown NMIs for Hyper-V guests on the
    first CPU which gets them only. This will allow us to rely on VMBus
    interrupt handler being able to receive the 'unload finish' message in
    case it is delivered to a different CPU.

    The issue is not reproducible on WS2016 as Debug-VM delivers NMI to the
    boot CPU only, WS2012R2 and earlier Hyper-V versions are affected.

    Signed-off-by: Vitaly Kuznetsov
    Acked-by: K. Y. Srinivasan
    Cc: devel@linuxdriverproject.org
    Cc: Haiyang Zhang
    Link: http://lkml.kernel.org/r/20161202100720.28121-1-vkuznets@redhat.com
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Vitaly Kuznetsov
     

19 Dec, 2016

12 commits

  • The Intel microcode driver is using sync_core() to mean "do CPUID
    with EAX=1". I want to rework sync_core(), but first the Intel
    microcode driver needs to stop depending on its current behavior.

    Reported-by: Henrique de Moraes Holschuh
    Signed-off-by: Andy Lutomirski
    Acked-by: Borislav Petkov
    Cc: Juergen Gross
    Cc: One Thousand Gnomes
    Cc: Peter Zijlstra
    Cc: Brian Gerst
    Cc: Matthew Whitehead
    Cc: Andrew Cooper
    Cc: Boris Ostrovsky
    Cc: xen-devel
    Link: http://lkml.kernel.org/r/535a025bb91fed1a019c5412b036337ad239e5bb.1481307769.git.luto@kernel.org
    Signed-off-by: Thomas Gleixner

    Andy Lutomirski
     
  • A typo (or mis-merge?) resulted in leaf 6 only being probed if
    cpuid_level >= 7.

    Fixes: 2ccd71f1b278 ("x86/cpufeature: Move some of the scattered feature bits to x86_capability")
    Signed-off-by: Andy Lutomirski
    Acked-by: Borislav Petkov
    Cc: Brian Gerst
    Link: http://lkml.kernel.org/r/6ea30c0e9daec21e488b54761881a6dfcf3e04d0.1481825597.git.luto@kernel.org
    Signed-off-by: Thomas Gleixner

    Andy Lutomirski
     
  • The unwinder warnings are good at finding unexpected unwinder issues,
    but they often don't give enough data to be able to fully diagnose them.
    Print a one-time stack dump when a warning is detected.

    Signed-off-by: Josh Poimboeuf
    Cc: Borislav Petkov
    Cc: Andy Lutomirski
    Link: http://lkml.kernel.org/r/15607370e3ddb1732b6a73d5c65937864df16ac8.1481904011.git.jpoimboe@redhat.com
    Signed-off-by: Thomas Gleixner

    Josh Poimboeuf
     
  • Somehow, CONFIG_PARAVIRT=n convinces gcc to change the
    x86_64_start_kernel() prologue from:

    0000000000000129 :
    129: 55 push %rbp
    12a: 48 89 e5 mov %rsp,%rbp

    to:

    0000000000000124 :
    124: 4c 8d 54 24 08 lea 0x8(%rsp),%r10
    129: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
    12d: 41 ff 72 f8 pushq -0x8(%r10)
    131: 55 push %rbp
    132: 48 89 e5 mov %rsp,%rbp

    This is an unusual pattern which aligns rsp (though in this case it's
    already aligned) and saves the start_cpu() return address again on the
    stack before storing the frame pointer.

    The unwinder assumes the last stack frame header is at a certain offset,
    but the above code breaks that assumption, resulting in the following
    warning:

    WARNING: kernel stack frame pointer at ffffffff82e03f40 in swapper:0 has bad value (null)

    Fix it by checking for the last task stack frame at the aligned offset
    in addition to the normal unaligned offset.

    Fixes: acb4608ad186 ("x86/unwind: Create stack frames for saved syscall registers")
    Reported-by: Borislav Petkov
    Signed-off-by: Josh Poimboeuf
    Cc: Andy Lutomirski
    Link: http://lkml.kernel.org/r/9d7b4eb8cf55a7d6002cb738f25c23e7429c99a0.1481904011.git.jpoimboe@redhat.com
    Signed-off-by: Thomas Gleixner

    Josh Poimboeuf
     
  • Now that i8042 uses flag in legacy platform data, i8042_detect() is
    no longer used and can be removed.

    Signed-off-by: Dmitry Torokhov
    Tested-by: Takashi Iwai
    Acked-by: Marcos Paulo de Souza
    Cc: linux-input@vger.kernel.org
    Link: http://lkml.kernel.org/r/1481317061-31486-4-git-send-email-dmitry.torokhov@gmail.com
    Signed-off-by: Thomas Gleixner

    Dmitry Torokhov
     
  • Add i8042 state to the platform data to help i8042 driver make decision
    whether to probe for i8042 or not. We recognize 3 states: platform/subarch
    ca not possible have i8042 (as is the case with Inrel MID platform),
    firmware (such as ACPI) reports that i8042 is absent from the device,
    or i8042 may be present and the driver should probe for it.

    The intent is to allow i8042 driver abort initialization on x86 if PNP data
    (absence of both keyboard and mouse PNP devices) agrees with firmware data.

    It will also allow us to remove i8042_detect later.

    Signed-off-by: Dmitry Torokhov
    Tested-by: Takashi Iwai
    Acked-by: Marcos Paulo de Souza
    Cc: linux-input@vger.kernel.org
    Link: http://lkml.kernel.org/r/1481317061-31486-2-git-send-email-dmitry.torokhov@gmail.com
    Signed-off-by: Thomas Gleixner

    Dmitry Torokhov
     
  • When CONFIG_PARAVIRT is selected, cpuid() becomes a call. Since
    for 32-bit kernels load_ucode_amd_bsp() is executed before paging
    is enabled the call cannot be completed (as kernel virtual addresses
    are not reachable yet).

    Use native_cpuid() instead which is an asm wrapper for the CPUID
    instruction.

    Signed-off-by: Boris Ostrovsky
    Signed-off-by: Borislav Petkov
    Cc: Jürgen Gross
    Link: http://lkml.kernel.org/r/1481906392-3847-1-git-send-email-boris.ostrovsky@oracle.com
    Link: http://lkml.kernel.org/r/20161218164414.9649-5-bp@alien8.de
    Signed-off-by: Thomas Gleixner

    Boris Ostrovsky
     
  • Doing so is completely void of sense for multiple reasons so prevent
    it. Set dis_ucode_ldr to true and thus disable the microcode loader by
    default to address xen pv guests which execute the AP path but not the
    BSP path.

    By having it turned off by default, the APs won't run into the loader
    either.

    Also, check CPUID(1).ECX[31] which hypervisors set. Well almost, not the
    xen pv one. That one gets the aforementioned "fix".

    Also, improve the detection method by caching the final decision whether
    to continue loading in dis_ucode_ldr and do it once on the BSP. The APs
    then simply test that value.

    Signed-off-by: Borislav Petkov
    Tested-by: Juergen Gross
    Tested-by: Boris Ostrovsky
    Acked-by: Juergen Gross
    Link: http://lkml.kernel.org/r/20161218164414.9649-4-bp@alien8.de
    Signed-off-by: Thomas Gleixner

    Borislav Petkov
     
  • Make it simply return bool to denote whether it found a container or not
    and return the pointer to the container and its size in the handed-in
    container pointer instead, as returning a struct was just silly.

    Signed-off-by: Borislav Petkov
    Cc: Jürgen Gross
    Cc: Boris Ostrovsky
    Link: http://lkml.kernel.org/r/20161218164414.9649-3-bp@alien8.de
    Signed-off-by: Thomas Gleixner

    Borislav Petkov
     
  • Fixup signature and retvals, return the container struct through the
    passed in pointer, not as a function return value.

    Signed-off-by: Borislav Petkov
    Cc: Jürgen Gross
    Cc: Boris Ostrovsky
    Link: http://lkml.kernel.org/r/20161218164414.9649-2-bp@alien8.de
    Signed-off-by: Thomas Gleixner

    Borislav Petkov
     
  • Pull timer updates from Thomas Gleixner:
    "This is the last functional update from the tip tree for 4.10. It got
    delayed due to a newly reported and anlyzed variant of BIOS bug and
    the resulting wreckage:

    - Seperation of TSC being marked realiable and the fact that the
    platform provides the TSC frequency via CPUID/MSRs and making use
    for it for GOLDMONT.

    - TSC adjust MSR validation and sanitizing:

    The TSC adjust MSR contains the offset to the hardware counter. The
    sum of the adjust MSR and the counter is the TSC value which is
    read via RDTSC.

    On at least two machines from different vendors the BIOS sets the
    TSC adjust MSR to negative values. This happens on cold and warm
    boot. While on cold boot the offset is a few milliseconds, on warm
    boot it basically compensates the power on time of the system. The
    BIOSes are not even using the adjust MSR to set all CPUs in the
    package to the same offset. The offsets are different which renders
    the TSC unusable,

    What's worse is that the TSC deadline timer has a HW feature^Wbug.
    It malfunctions when the TSC adjust value is negative or greater
    equal 0x80000000 resulting in silent boot failures, hard lockups or
    non firing timers. This looks like some hardware internal 32/64bit
    issue with a sign extension problem. Intel has been silent so far
    on the issue.

    The update contains sanity checks and keeps the adjust register
    within working limits and in sync on the package.

    As it looks like this disease is spreading via BIOS crapware, we
    need to address this urgently as the boot failures are hard to
    debug for users"

    * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/tsc: Limit the adjust value further
    x86/tsc: Annotate printouts as firmware bug
    x86/tsc: Force TSC_ADJUST register to value >= zero
    x86/tsc: Validate TSC_ADJUST after resume
    x86/tsc: Validate cpumask pointer before accessing it
    x86/tsc: Fix broken CONFIG_X86_TSC=n build
    x86/tsc: Try to adjust TSC if sync test fails
    x86/tsc: Prepare warp test for TSC adjustment
    x86/tsc: Move sync cleanup to a safe place
    x86/tsc: Sync test only for the first cpu in a package
    x86/tsc: Verify TSC_ADJUST from idle
    x86/tsc: Store and check TSC ADJUST MSR
    x86/tsc: Detect random warps
    x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art()
    x86/tsc: Finalize the split of the TSC_RELIABLE flag
    x86/tsc: Set TSC_KNOWN_FREQ and TSC_RELIABLE flags on Intel Atom SoCs
    x86/tsc: Mark Intel ATOM_GOLDMONT TSC reliable
    x86/tsc: Mark TSC frequency determined by CPUID as known
    x86/tsc: Add X86_FEATURE_TSC_KNOWN_FREQ flag

    Linus Torvalds
     
  • Pull x86 fixes and cleanups from Thomas Gleixner:
    "This set of updates contains:

    - Robustification for the logical package managment. Cures the AMD
    and virtualization issues.

    - Put the correct start_cpu() return address on the stack of the idle
    task.

    - Fixups for the fallout of the nodeid cpuid persistent mapping
    modifciations

    - Move the x86/MPX specific mm_struct member to the arch specific
    mm_context where it belongs

    - Cleanups for C89 struct initializers and useless function
    arguments"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/floppy: Use designated initializers
    x86/mpx: Move bd_addr to mm_context_t
    x86/mm: Drop unused argument 'removed' from sync_global_pgds()
    ACPI/NUMA: Do not map pxm to node when NUMA is turned off
    x86/acpi: Use proper macro for invalid node
    x86/smpboot: Prevent false positive out of bounds cpumask access warning
    x86/boot/64: Push correct start_cpu() return address
    x86/boot/64: Use 'push' instead of 'call' in start_cpu()
    x86/smpboot: Make logical package management more robust

    Linus Torvalds
     

18 Dec, 2016

2 commits

  • Adjust value 0x80000000 and other values larger than that render the TSC
    deadline timer disfunctional.

    We have not yet any information about this from Intel, but experimentation
    clearly proves that this is a 32/64 bit and sign extension issue.

    If adjust values larger than that are actually required, which might be the
    case for physical CPU hotplug, then we need to disable the deadline timer
    on the affected package/CPUs and use the local APIC timer instead.

    That requires some surgery in the APIC setup code, so we just limit the
    ADJUST register value into the known to work range for now and revisit this
    when Intel comes forth with proper information.

    Signed-off-by: Thomas Gleixner
    Cc: Roland Scheidegger
    Cc: Bruce Schlobohm
    Cc: Kevin Stanton
    Cc: Peter Zijlstra
    Cc: Borislav Petkov

    Thomas Gleixner
     
  • Make it more obvious that the BIOS is screwed up.

    Signed-off-by: Thomas Gleixner
    Cc: Roland Scheidegger
    Cc: Bruce Schlobohm
    Cc: Kevin Stanton
    Cc: Peter Zijlstra
    Cc: Borislav Petkov

    Thomas Gleixner
     

17 Dec, 2016

1 commit

  • Pull powerpc updates from Michael Ellerman:
    "Highlights include:

    - Support for the kexec_file_load() syscall, which is a prereq for
    secure and trusted boot.

    - Prevent kernel execution of userspace on P9 Radix (similar to
    SMEP/PXN).

    - Sort the exception tables at build time, to save time at boot, and
    store them as relative offsets to save space in the kernel image &
    memory.

    - Allow building the kernel with thin archives, which should allow us
    to build an allyesconfig once some other fixes land.

    - Build fixes to allow us to correctly rebuild when changing the
    kernel endian from big to little or vice versa.

    - Plumbing so that we can avoid doing a full mm TLB flush on P9
    Radix.

    - Initial stack protector support (-fstack-protector).

    - Support for dumping the radix (aka. Linux) and hash page tables via
    debugfs.

    - Fix an oops in cxl coredump generation when cxl_get_fd() is used.

    - Freescale updates from Scott: "Highlights include 8xx hugepage
    support, qbman fixes/cleanup, device tree updates, and some misc
    cleanup."

    - Many and varied fixes and minor enhancements as always.

    Thanks to:
    Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Anshuman
    Khandual, Anton Blanchard, Balbir Singh, Bartlomiej Zolnierkiewicz,
    Christophe Jaillet, Christophe Leroy, Denis Kirjanov, Elimar
    Riesebieter, Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff
    Levand, Jack Miller, Johan Hovold, Lars-Peter Clausen, Libin,
    Madhavan Srinivasan, Michael Neuling, Nathan Fontenot, Naveen N.
    Rao, Nicholas Piggin, Pan Xinhui, Peter Senna Tschudin, Rashmica
    Gupta, Rui Teng, Russell Currey, Scott Wood, Simon Guo, Suraj
    Jitindar Singh, Thiago Jung Bauermann, Tobias Klauser, Vaibhav Jain"

    [ And thanks to Michael, who took time off from a new baby to get this
    pull request done. - Linus ]

    * tag 'powerpc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (174 commits)
    powerpc/fsl/dts: add FMan node for t1042d4rdb
    powerpc/fsl/dts: add sg_2500_aqr105_phy4 alias on t1024rdb
    powerpc/fsl/dts: add QMan and BMan nodes on t1024
    powerpc/fsl/dts: add QMan and BMan nodes on t1023
    soc/fsl/qman: test: use DEFINE_SPINLOCK()
    powerpc/fsl-lbc: use DEFINE_SPINLOCK()
    powerpc/8xx: Implement support of hugepages
    powerpc: get hugetlbpage handling more generic
    powerpc: port 64 bits pgtable_cache to 32 bits
    powerpc/boot: Request no dynamic linker for boot wrapper
    soc/fsl/bman: Use resource_size instead of computation
    soc/fsl/qe: use builtin_platform_driver
    powerpc/fsl_pmc: use builtin_platform_driver
    powerpc/83xx/suspend: use builtin_platform_driver
    powerpc/ftrace: Fix the comments for ftrace_modify_code
    powerpc/perf: macros for power9 format encoding
    powerpc/perf: power9 raw event format encoding
    powerpc/perf: update attribute_group data structure
    powerpc/perf: factor out the event format field
    powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown
    ...

    Linus Torvalds
     

16 Dec, 2016

1 commit

  • Pull tracing updates from Steven Rostedt:
    "This release has a few updates:

    - STM can hook into the function tracer
    - Function filtering now supports more advance glob matching
    - Ftrace selftests updates and added tests
    - Softirq tag in traces now show only softirqs
    - ARM nop added to non traced locations at compile time
    - New trace_marker_raw file that allows for binary input
    - Optimizations to the ring buffer
    - Removal of kmap in trace_marker
    - Wakeup and irqsoff tracers now adhere to the set_graph_notrace file
    - Other various fixes and clean ups"

    * tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (42 commits)
    selftests: ftrace: Shift down default message verbosity
    kprobes/trace: Fix kprobe selftest for newer gcc
    tracing/kprobes: Add a helper method to return number of probe hits
    tracing/rb: Init the CPU mask on allocation
    tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
    tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions too
    fgraph: Handle a case where a tracer ignores set_graph_notrace
    tracing: Replace kmap with copy_from_user() in trace_marker writing
    ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it
    tracing: Allow benchmark to be enabled at early_initcall()
    tracing: Have system enable return error if one of the events fail
    tracing: Do not start benchmark on boot up
    tracing: Have the reg function allow to fail
    ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inline
    ring-buffer: Froce rb_update_write_stamp() to be inlined
    ring-buffer: Force inline of hotpath helper functions
    tracing: Make __buffer_unlock_commit() always_inline
    tracing: Make tracepoint_printk a static_key
    ring-buffer: Always inline rb_event_data()
    ring-buffer: Make rb_reserve_next_event() always inlined
    ...

    Linus Torvalds
     

15 Dec, 2016

6 commits

  • Roland reported that his DELL T5810 sports a value add BIOS which
    completely wreckages the TSC. The squirmware [(TM) Ingo Molnar] boots with
    random negative TSC_ADJUST values, different on all CPUs. That renders the
    TSC useless because the sycnchronization check fails.

    Roland tested the new TSC_ADJUST mechanism. While it manages to readjust
    the TSCs he needs to disable the TSC deadline timer, otherwise the machine
    just stops booting.

    Deeper investigation unearthed that the TSC deadline timer is sensitive to
    the TSC_ADJUST value. Writing TSC_ADJUST to a negative value results in an
    interrupt storm caused by the TSC deadline timer.

    This does not make any sense and it's hard to imagine what kind of hardware
    wreckage is behind that misfeature, but it's reliably reproducible on other
    systems which have TSC_ADJUST and TSC deadline timer.

    While it would be understandable that a big enough negative value which
    moves the resulting TSC readout into the negative space could have the
    described effect, this happens even with a adjust value of -1, which keeps
    the TSC readout definitely in the positive space. The compare register for
    the TSC deadline timer is set to a positive value larger than the TSC, but
    despite not having reached the deadline the interrupt is raised
    immediately. If this happens on the boot CPU, then the machine dies
    silently because this setup happens before the NMI watchdog is armed.

    Further experiments showed that any other adjustment of TSC_ADJUST works as
    expected as long as it stays in the positive range. The direction of the
    adjustment has no influence either. See the lkml link for further analysis.

    Yet another proof for the theory that timers are designed by janitors and
    the underlying (obviously undocumented) mechanisms which allow BIOSes to
    wreckage them are considered a feature. Well done Intel - NOT!

    To address this wreckage add the following sanity measures:

    - If the TSC_ADJUST value on the boot cpu is not 0, set it to 0

    - If the TSC_ADJUST value on any cpu is negative, set it to 0

    - Prevent the cross package synchronization mechanism from setting negative
    TSC_ADJUST values.

    Reported-and-tested-by: Roland Scheidegger
    Signed-off-by: Thomas Gleixner
    Cc: Bruce Schlobohm
    Cc: Kevin Stanton
    Cc: Peter Zijlstra
    Cc: Allen Hung
    Cc: Borislav Petkov
    Link: http://lkml.kernel.org/r/20161213131211.397588033@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Some 'feature' BIOSes fiddle with the TSC_ADJUST register during
    suspend/resume which renders the TSC unusable.

    Add sanity checks into the resume path and restore the
    original value if it was adjusted.

    Reported-and-tested-by: Roland Scheidegger
    Signed-off-by: Thomas Gleixner
    Cc: Bruce Schlobohm
    Cc: Kevin Stanton
    Cc: Peter Zijlstra
    Cc: Allen Hung
    Cc: Borislav Petkov
    Link: http://lkml.kernel.org/r/20161213131211.317654500@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Use NUMA_NO_NODE instead of -1.

    Signed-off-by: Boris Ostrovsky
    Cc: len.brown@intel.com
    Cc: rjw@rjwysocki.net
    Cc: linux-acpi@vger.kernel.org
    Cc: pavel@ucw.cz
    Link: http://lkml.kernel.org/r/1481570993-13941-1-git-send-email-boris.ostrovsky@oracle.com
    Signed-off-by: Thomas Gleixner

    Boris Ostrovsky
     
  • prefill_possible_map() reinitializes the cpu_possible_map by setting the
    possible cpu bits and clearing all other bits up to NR_CPUS.

    This is technically always correct because cpu_possible_map is statically
    allocated and sized NR_CPUS. With CPUMASK_OFFSTACK and DEBUG_PER_CPU_MAPS
    enabled the bounds check of cpu masks happens on nr_cpu_ids. nr_cpu_ids is
    initialized to NR_CPUS and only limited after the set/clear bit loops have
    been executed.

    But if the system was booted with "nr_cpus=N" on the command line, where N
    is < NR_CPUS then nr_cpu_ids is limited in the parameter parsing function
    before prefill_possible_map() is invoked. As a consequence the cpumask
    bounds check triggers when clearing the bits past nr_cpu_ids.

    Add a helper which allows to reset cpu_possible_map w/o the bounds check
    and then set only the possible bits which are well inside bounds.

    Reported-by: Dmitry Safonov
    Cc: Rusty Russell
    Cc: 0x7f454c46@gmail.com
    Cc: Jan Beulich
    Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612131836050.3415@nanos
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • Currently in x86_64, the symbol address of phys_base is exported to
    vmcoreinfo. Dave Anderson complained this is really useless for his
    Crash implementation. Because in user-space utility Crash and
    Makedumpfile which exported vmcore information is mainly used for, value
    of phys_base is needed to covert virtual address of exported kernel
    symbol to physical address. Especially init_level4_pgt, if we want to
    access and go over the page table to look up a PA corresponding to VA,
    firstly we need calculate

    page_dir = SYMBOL(init_level4_pgt) - __START_KERNEL_map + phys_base;

    Now in Crash and Makedumpfile, we have to analyze the vmcore elf program
    header to get value of phys_base. As Dave said, it would be preferable
    if it were readily availabl in vmcoreinfo rather than depending upon the
    PT_LOAD semantics.

    Hence in this patch change to export the value of phys_base instead of
    its virtual address.

    And people also complained that KERNEL_IMAGE_SIZE exporting is x86_64
    only, should be moved into arch dependent function
    arch_crash_save_vmcoreinfo. Do the moving in this patch.

    Link: http://lkml.kernel.org/r/1478568596-30060-2-git-send-email-bhe@redhat.com
    Signed-off-by: Baoquan He
    Cc: Thomas Garnier
    Cc: Baoquan He
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H . Peter Anvin"
    Cc: Eric Biederman
    Cc: Xunlei Pang
    Cc: HATAYAMA Daisuke
    Cc: Kees Cook
    Cc: Eugene Surovegin
    Cc: Dave Young
    Cc: AKASHI Takahiro
    Cc: Atsushi Kumagai
    Cc: Dave Anderson
    Cc: Pratyush Anand
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baoquan He
     
  • This reverts commit 0549a3c02efb ("kdump, vmcoreinfo: report memory
    sections virtual addresses").

    Commit 0549a3c02efb tells the userspace utility makedumpfile the
    randomized base address of these memmory sections when mm kaslr is
    enabled. However the following patch "kexec: export the value of
    phys_base instead of symbol address" makes makedumpfile not need these
    addresses any more.

    Besides we should use VMCOREINFO_NUMBER to export the value of the
    variable so that we can use the existing number_table mechanism of
    Makedumpfile to fetch it. So revert it now. If needed we can add it
    later.

    http://lists.infradead.org/pipermail/kexec/2016-October/017540.html
    Link: http://lkml.kernel.org/r/1478568596-30060-1-git-send-email-bhe@redhat.com
    Signed-off-by: Baoquan He
    Cc: Thomas Garnier
    Cc: Baoquan He
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H . Peter Anvin"
    Cc: Eric Biederman
    Cc: Xunlei Pang
    Cc: HATAYAMA Daisuke
    Cc: Kees Cook
    Cc: Eugene Surovegin
    Cc: Dave Young
    Cc: AKASHI Takahiro
    Cc: Atsushi Kumagai
    Cc: Dave Anderson
    Cc: Pratyush Anand
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baoquan He
     

14 Dec, 2016

5 commits

  • start_cpu() pushes a text address on the stack so that stack traces from
    idle tasks will show start_cpu() at the end. But it currently shows the
    wrong function offset. It's more correct to show the address
    immediately after the 'lretq' instruction.

    Signed-off-by: Josh Poimboeuf
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/2cadd9f16c77da7ee7957bfc5e1c67928c23ca48.1481685203.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar

    Josh Poimboeuf
     
  • start_cpu() pushes a text address on the stack so that stack traces from
    idle tasks will show start_cpu() at the end. But it uses a call
    instruction to do that, which is rather obtuse. Use a straightforward
    push instead.

    Suggested-by: Borislav Petkov
    Signed-off-by: Josh Poimboeuf
    Cc: Andy Lutomirski
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/4d8a1952759721d42d1e62ba9e4a7e3ac5df8574.1481685203.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar

    Josh Poimboeuf
     
  • Pull workqueue updates from Tejun Heo:
    "Mostly patches to initialize workqueue subsystem earlier and get rid
    of keventd_up().

    The patches were headed for the last merge cycle but got delayed due
    to a bug found late minute, which is fixed now.

    Also, to help debugging, destroy_workqueue() is more chatty now on a
    sanity check failure."

    * 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: move wq_numa_init() to workqueue_init()
    workqueue: remove keventd_up()
    debugobj, workqueue: remove keventd_up() usage
    slab, workqueue: remove keventd_up() usage
    power, workqueue: remove keventd_up() usage
    tty, workqueue: remove keventd_up() usage
    mce, workqueue: remove keventd_up() usage
    workqueue: make workqueue available early during boot
    workqueue: dump workqueue state on sanity check failures in destroy_workqueue()

    Linus Torvalds
     
  • Pull driver core updates from Greg KH:
    "Here's the new driver core patches for 4.10-rc1.

    Big thing here is the nice addition of "functional dependencies" to
    the driver core. The idea has been talked about for a very long time,
    great job to Rafael for stepping up and implementing it. It's been
    tested for longer than the 4.9-rc1 date, we held off on merging it
    earlier in order to feel more comfortable about it.

    Other than that, it's just a handful of small other patches, some good
    cleanups to the mess that is the firmware class code, and we have a
    test driver for the deferred probe logic.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'driver-core-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (30 commits)
    firmware: Correct handling of fw_state_wait() return value
    driver core: Silence device links sphinx warning
    firmware: remove warning at documentation generation time
    drivers: base: dma-mapping: Fix typo in dmam_alloc_non_coherent comments
    driver core: test_async: fix up typo found by 0-day
    firmware: move fw_state_is_done() into UHM section
    firmware: do not use fw_lock for fw_state protection
    firmware: drop bit ops in favor of simple state machine
    firmware: refactor loading status
    firmware: fix usermode helper fallback loading
    driver core: firmware_class: convert to use class_groups
    driver core: devcoredump: convert to use class_groups
    driver core: class: add class_groups support
    kernfs: Declare two local data structures static
    driver-core: fix platform_no_drv_owner.cocci warnings
    drivers/base/memory.c: Remove unused 'first_page' variable
    driver core: add CLASS_ATTR_WO()
    drivers: base: cacheinfo: support DT overrides for cache properties
    drivers: base: cacheinfo: add pr_fmt logging
    drivers: base: cacheinfo: fix boot error message when acpi is enabled
    ...

    Linus Torvalds
     
  • Pull ACPI updates from Rafael Wysocki:
    "The ACPICA code in the kernel gets updated as usual (included is
    upstream revision 20160930 and a few commits from the next one, with
    the rest waiting for an issue discovered in linux-next to be
    addressed) which brings in a couple of fixes and cleanups

    On top of that initial support for APEI on ARM64 is added, two new
    pieces of documentation are introduced, the properties-parsing code is
    updated to follow changes in the (external) documentation it is based
    on and there are a few updates of SoC drivers, some new blacklist
    entries, plus some assorted fixes and cleanups

    Specifics:

    - ACPICA update including upstream revision 20160930 and several
    commits beyond it (Bob Moore, Lv Zheng)

    - Initial support for ACPI APEI on ARM64 (Tomasz Nowicki)

    - New document describing the handling of _OSI and _REV in Linux (Len
    Brown)

    - New document describing the usage rules for _DSD properties (Rafael
    Wysocki)

    - Update of the ACPI properties-parsing code to reflect recent
    changes in the (external) documentation it is based on (Rafael
    Wysocki)

    - Updates of the ACPI LPSS and ACPI APD SoC drivers for additional
    hardware support (Andy Shevchenko, Nehal Shah)

    - New blacklist entries for _REV and video handling (Alex Hung, Hans
    de Goede, Michael Pobega)

    - ACPI battery driver fix to fall back to _BIF if _BIX fails (Dave
    Lambley)

    - NMI notifications handling fix for APEI (Prarit Bhargava)

    - Error code path fix for the ACPI CPPC library (Dan Carpenter)

    - Assorted cleanups (Andy Shevchenko, Longpeng Mike)"

    * tag 'acpi-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (31 commits)
    ACPICA: Utilities: Add new decode function for parser values
    ACPI / osl: Refactor acpi_os_get_root_pointer() to drop 'else':s
    ACPI / osl: Propagate actual error code for kstrtoul()
    ACPI / property: Document usage rules for _DSD properties
    ACPI: Document _OSI and _REV for Linux BIOS writers
    ACPI / APEI / ARM64: APEI initial support for ARM64
    ACPI / APEI: Fix NMI notification handling
    ACPICA: Tables: Add an error message complaining driver bugs
    ACPICA: Tables: Add acpi_tb_unload_table()
    ACPICA: Tables: Cleanup acpi_tb_install_and_load_table()
    ACPICA: Events: Fix acpi_ev_initialize_region() return value
    ACPICA: Back port of "ACPICA: Dispatcher: Tune interpreter lock around AcpiEvInitializeRegion()"
    ACPICA: Namespace: Add acpi_ns_handle_to_name()
    ACPI / CPPC: set an error code on probe error path
    ACPI / video: Add force_native quirk for HP Pavilion dv6
    ACPI / video: Add force_native quirk for Dell XPS 17 L702X
    ACPI / property: Hierarchical properties support update
    ACPI / LPSS: enable hard LLP for DMA
    ACPI / battery: If _BIX fails, retry with _BIF
    ACPI / video: Move ACPI_VIDEO_NOTIFY_* defines to acpi/video.h
    ..

    Linus Torvalds