15 Jan, 2016

1 commit


10 Dec, 2015

1 commit

  • commit 54a20552e1eae07aa240fa370a0293e006b5faed upstream.

    It was found that a guest can DoS a host by triggering an infinite
    stream of "alignment check" (#AC) exceptions. This causes the
    microcode to enter an infinite loop where the core never receives
    another interrupt. The host kernel panics pretty quickly due to the
    effects (CVE-2015-5307).

    Signed-off-by: Eric Northup
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Eric Northup
     

27 Oct, 2015

1 commit

  • commit fe32d3cd5e8eb0f82e459763374aa80797023403 upstream.

    These functions check should_resched() before unlocking spinlock/bh-enable:
    preempt_count always non-zero => should_resched() always returns false.
    cond_resched_lock() worked iff spin_needbreak is set.

    This patch adds argument "preempt_offset" to should_resched().

    preempt_count offset constants for that:

    PREEMPT_DISABLE_OFFSET - offset after preempt_disable()
    PREEMPT_LOCK_OFFSET - offset after spin_lock()
    SOFTIRQ_DISABLE_OFFSET - offset after local_bh_distable()
    SOFTIRQ_LOCK_OFFSET - offset after spin_lock_bh()

    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Graf
    Cc: Boris Ostrovsky
    Cc: David Vrabel
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: bdb438065890 ("sched: Extract the basic add/sub preempt_count modifiers")
    Link: http://lkml.kernel.org/r/20150715095204.12246.98268.stgit@buzz
    Signed-off-by: Ingo Molnar
    Signed-off-by: Mike Galbraith
    Signed-off-by: Greg Kroah-Hartman

    Konstantin Khlebnikov
     

22 Sep, 2015

1 commit

  • commit 37868fe113ff2ba814b3b4eb12df214df555f8dc upstream.

    modify_ldt() has questionable locking and does not synchronize
    threads. Improve it: redesign the locking and synchronize all
    threads' LDTs using an IPI on all modifications.

    This will dramatically slow down modify_ldt in multithreaded
    programs, but there shouldn't be any multithreaded programs that
    care about modify_ldt's performance in the first place.

    This fixes some fallout from the CVE-2015-5157 fixes.

    Signed-off-by: Andy Lutomirski
    Reviewed-by: Borislav Petkov
    Cc: Andrew Cooper
    Cc: Andy Lutomirski
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Jan Beulich
    Cc: Konrad Rzeszutek Wilk
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Sasha Levin
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: security@kernel.org
    Cc: xen-devel
    Link: http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.luto@kernel.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Andy Lutomirski
     

14 Sep, 2015

1 commit

  • commit ed596cde9425509ec6ce88e19f03e9b13b6f518b upstream.

    This reverts commits 9a036b93a344 ("x86/signal/64: Remove 'fs' and 'gs'
    from sigcontext") and c6f2062935c8 ("x86/signal/64: Fix SS handling for
    signals delivered to 64-bit programs").

    They were cleanups, but they break dosemu by changing the signal return
    behavior (and removing 'fs' and 'gs' from the sigcontext struct - while
    not actually changing any behavior - causes build problems).

    Reported-and-tested-by: Stas Sergeev
    Acked-by: Andy Lutomirski
    Cc: Ingo Molnar
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Linus Torvalds
     

11 Aug, 2015

2 commits

  • commit a833581e372a4adae2319d8dc379493edbc444e9 upstream.

    Mikulas reported his K6-3 not booting. This is because the
    static_key API confusion struck and bit Andy, this wants to be
    static_key_false().

    Reported-by: Mikulas Patocka
    Tested-by: Mikulas Patocka
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrea Arcangeli
    Cc: Andy Lutomirski
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Valdis Kletnieks
    Cc: Vince Weaver
    Cc: hillf.zj
    Fixes: a66734297f78 ("perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks")
    Link: http://lkml.kernel.org/r/20150709172338.GC19282@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • commit 5d5aa3cfca5cf74cd928daf3674642e6004328d1 upstream.

    Currently KASAN shadow region page tables created without
    respect of physical offset (phys_base). This causes kernel halt
    when phys_base is not zero.

    So let's initialize KASAN shadow region page tables in
    kasan_early_init() using __pa_nodebug() which considers
    phys_base.

    This patch also separates x86_64_start_kernel() from KASAN low
    level details by moving kasan_map_early_shadow(init_level4_pgt)
    into kasan_early_init().

    Remove the comment before clear_bss() which stopped bringing
    much profit to the code readability. Otherwise describing all
    the new order dependencies would be too verbose.

    Signed-off-by: Alexander Popov
    Signed-off-by: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Andrey Konovalov
    Cc: Borislav Petkov
    Cc: Dmitry Vyukov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1435828178-10975-3-git-send-email-a.ryabinin@samsung.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Alexander Popov
     

11 Jul, 2015

1 commit

  • commit 42720138b06301cc8a7ee8a495a6d021c4b6a9bc upstream.

    Writes were a bit racy, but hard to turn into a bug at the same time.
    (Particularly because modern Linux doesn't use this feature anymore.)

    Signed-off-by: Radim Krčmář
    [Actually the next patch makes it much, much easier to trigger the race
    so I'm including this one for stable@ as well. - Paolo]
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Radim Krčmář
     

06 Jun, 2015

1 commit

  • Pull x86 fixes from Ingo Molnar:
    "Misc fixes:

    - early_idt_handlers[] fix that fixes the build with bleeding edge
    tooling

    - build warning fix on GCC 5.1

    - vm86 fix plus self-test to make it harder to break it again"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers
    x86/asm/entry/32, selftests: Add a selftest for kernel entries from VM86 mode
    x86/boot: Add CONFIG_PARAVIRT_SPINLOCKS quirk to arch/x86/boot/compressed/misc.h
    x86/asm/entry/32: Really make user_mode() work correctly for VM86 mode

    Linus Torvalds
     

02 Jun, 2015

1 commit

  • The early_idt_handlers asm code generates an array of entry
    points spaced nine bytes apart. It's not really clear from that
    code or from the places that reference it what's going on, and
    the code only works in the first place because GAS never
    generates two-byte JMP instructions when jumping to global
    labels.

    Clean up the code to generate the correct array stride (member size)
    explicitly. This should be considerably more robust against
    screw-ups, as GAS will warn if a .fill directive has a negative
    count. Using '. =' to advance would have been even more robust
    (it would generate an actual error if it tried to move
    backwards), but it would pad with nulls, confusing anyone who
    tries to disassemble the code. The new scheme should be much
    clearer to future readers.

    While we're at it, improve the comments and rename the array and
    common code.

    Binutils may start relaxing jumps to non-weak labels. If so,
    this change will fix our build, and we may need to backport this
    change.

    Before, on x86_64:

    0000000000000000 :
    0: 6a 00 pushq $0x0
    2: 6a 00 pushq $0x0
    4: e9 00 00 00 00 jmpq 9
    5: R_X86_64_PC32 early_idt_handler-0x4
    ...
    48: 66 90 xchg %ax,%ax
    4a: 6a 08 pushq $0x8
    4c: e9 00 00 00 00 jmpq 51
    4d: R_X86_64_PC32 early_idt_handler-0x4
    ...
    117: 6a 00 pushq $0x0
    119: 6a 1f pushq $0x1f
    11b: e9 00 00 00 00 jmpq 120
    11c: R_X86_64_PC32 early_idt_handler-0x4

    After:

    0000000000000000 :
    0: 6a 00 pushq $0x0
    2: 6a 00 pushq $0x0
    4: e9 14 01 00 00 jmpq 11d
    ...
    48: 6a 08 pushq $0x8
    4a: e9 d1 00 00 00 jmpq 120
    4f: cc int3
    50: cc int3
    ...
    117: 6a 00 pushq $0x0
    119: 6a 1f pushq $0x1f
    11b: eb 03 jmp 120
    11d: cc int3
    11e: cc int3
    11f: cc int3

    Signed-off-by: Andy Lutomirski
    Acked-by: H. Peter Anvin
    Cc: Binutils
    Cc: Borislav Petkov
    Cc: H.J. Lu
    Cc: Jan Beulich
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc:
    Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

29 May, 2015

1 commit

  • While commit efa7045103 ("x86/asm/entry: Make user_mode() work
    correctly if regs came from VM86 mode") claims that "user_mode()
    is now identical to user_mode_vm()", this wasn't actually the
    case - no prior commit made it so.

    Signed-off-by: Jan Beulich
    Acked-by: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/5566EB0D020000780007E655@mail.emea.novell.com
    Signed-off-by: Ingo Molnar

    Jan Beulich
     

28 May, 2015

1 commit


22 May, 2015

1 commit

  • Pull KVM fixes from Paolo Bonzini:
    "This includes a fix for two oopses, one on PPC and on x86.

    The rest is fixes for bugs with newer Intel processors"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    kvm/fpu: Enable eager restore kvm FPU for MPX
    Revert "KVM: x86: drop fpu_activate hook"
    kvm: fix crash in kvm_vcpu_reload_apic_access_page
    KVM: MMU: fix SMAP virtualization
    KVM: MMU: fix CR4.SMEP=1, CR0.WP=0 with shadow pages
    KVM: MMU: fix smap permission check
    KVM: PPC: Book3S HV: Fix list traversal in error case

    Linus Torvalds
     

20 May, 2015

2 commits

  • The MPX feature requires eager KVM FPU restore support. We have verified
    that MPX cannot work correctly with the current lazy KVM FPU restore
    mechanism. Eager KVM FPU restore should be enabled if the MPX feature is
    exposed to VM.

    Signed-off-by: Yang Zhang
    Signed-off-by: Liang Li
    [Also activate the FPU on AMD processors. - Paolo]
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Bonzini

    Liang Li
     
  • This reverts commit 4473b570a7ebb502f63f292ccfba7df622e5fdd3. We'll
    use the hook again.

    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     

11 May, 2015

1 commit

  • KVM may turn a user page to a kernel page when kernel writes a readonly
    user page if CR0.WP = 1. This shadow page entry will be reused after
    SMAP is enabled so that kernel is allowed to access this user page

    Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu
    once CR4.SMAP is updated

    Signed-off-by: Xiao Guangrong
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Bonzini

    Xiao Guangrong
     

07 May, 2015

2 commits

  • Pull xen bug fixes from David Vrabel:

    - fix blkback regression if using persistent grants

    - fix various event channel related suspend/resume bugs

    - fix AMD x86 regression with X86_BUG_SYSRET_SS_ATTRS

    - SWIOTLB on ARM now uses frames evtchn before binding the channel to CPU in __startup_pirq()
    xen/console: Update console event channel on resume
    xen/xenbus: Update xenbus event channel on resume
    xen/events: Clear cpu_evtchn_mask before resuming
    xen-pciback: Add name prefix to global 'permissive' variable
    xen: Suspend ticks on all CPUs during suspend
    xen/grant: introduce func gnttab_unmap_refs_sync()
    xen/blkback: safely unmap purge persistent grants

    Linus Torvalds
     
  • Pull x86 fixes from Ingo Molnar:
    "EFI fixes, and FPU fix, a ticket spinlock boundary condition fix and
    two build fixes"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/fpu: Always restore_xinit_state() when use_eager_cpu()
    x86: Make cpu_tss available to external modules
    efi: Fix error handling in add_sysfs_runtime_map_entry()
    x86/spinlocks: Fix regression in spinlock contention detection
    x86/mm: Clean up types in xlate_dev_mem_ptr()
    x86/efi: Store upper bits of command line buffer address in ext_cmd_line_ptr
    efivarfs: Ensure VariableName is NUL-terminated

    Linus Torvalds
     

06 May, 2015

2 commits

  • Make sure that xen_swiotlb_init allocates buffers that are DMA capable
    when at least one memblock is available below 4G. Otherwise we assume
    that all devices on the SoC can cope with >4G addresses. We do this on
    ARM and ARM64, where dom0 is mapped 1:1, so pfn == mfn in this case.

    No functional changes on x86.

    From: Chen Baozi

    Signed-off-by: Chen Baozi
    Signed-off-by: Stefano Stabellini
    Tested-by: Chen Baozi
    Acked-by: Konrad Rzeszutek Wilk
    Signed-off-by: David Vrabel

    Stefano Stabellini
     
  • Commit 61f01dd941ba ("x86_64, asm: Work around AMD SYSRET SS descriptor
    attribute issue") makes AMD processors set SS to __KERNEL_DS in
    __switch_to() to deal with cases when SS is NULL.

    This breaks Xen PV guests who do not want to load SS with__KERNEL_DS.

    Since the problem that the commit is trying to address would have to be
    fixed in the hypervisor (if it in fact exists under Xen) there is no
    reason to set X86_BUG_SYSRET_SS_ATTRS flag for PV VPCUs here.

    This can be easily achieved by adding x86_hyper_xen_hvm.set_cpu_features
    op which will clear this flag. (And since this structure is no longer
    HVM-specific we should do some renaming).

    Signed-off-by: Boris Ostrovsky
    Reported-by: Sander Eikelenboom
    Signed-off-by: David Vrabel

    Boris Ostrovsky
     

05 May, 2015

1 commit

  • A spinlock is regarded as contended when there is at least one waiter.
    Currently, the code that checks whether there are any waiters rely on
    tail value being greater than head. However, this is not true if tail
    reaches the max value and wraps back to zero, so arch_spin_is_contended()
    incorrectly returns 0 (not contended) when tail is smaller than head.

    The original code (before regression) handled this case by casting the
    (tail - head) to an unsigned value. This change simply restores that
    behavior.

    Fixes: d6abfdb20223 ("x86/spinlocks/paravirt: Fix memory corruption on unlock")
    Signed-off-by: Tahsin Erdogan
    Cc: peterz@infradead.org
    Cc: Waiman.Long@hp.com
    Cc: borntraeger@de.ibm.com
    Cc: oleg@redhat.com
    Cc: raghavendra.kt@linux.vnet.ibm.com
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/1430799331-20445-1-git-send-email-tahsin@google.com
    Signed-off-by: Thomas Gleixner

    Tahsin Erdogan
     

27 Apr, 2015

2 commits

  • This reverts commits 0a4e6be9ca17c54817cf814b4b5aa60478c6df27
    and 80f7fdb1c7f0f9266421f823964fd1962681f6ce.

    The task migration notifier was originally introduced in order to support
    the pvclock vsyscall with non-synchronized TSC, but KVM only supports it
    with synchronized TSC. Hence, on KVM the race condition is only needed
    due to a bad implementation on the host side, and even then it's so rare
    that it's mostly theoretical.

    As far as KVM is concerned it's possible to fix the host, avoiding the
    additional complexity in the vDSO and the (re)introduction of the task
    migration notifier.

    Xen, on the other hand, hasn't yet implemented vsyscall support at
    all, so we do not care about its plans for non-synchronized TSC.

    Reported-by: Peter Zijlstra
    Suggested-by: Marcelo Tosatti
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with
    SS == 0 results in an invalid usermode state in which SS is apparently
    equal to __USER_DS but causes #SS if used.

    Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
    ensuring that SYSRET never happens with SS set to NULL.

    This was exposed by a recent vDSO cleanup.

    Fixes: e7d6eefaaa44 x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
    Signed-off-by: Andy Lutomirski
    Cc: Peter Anvin
    Cc: Borislav Petkov
    Cc: Denys Vlasenko
    Cc: Brian Gerst
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     

23 Apr, 2015

1 commit

  • Pull virtio updates from Rusty Russell:
    "Some virtio internal cleanups, a new virtio device "virtio input", and
    a change to allow the legacy virtio balloon.

    Most excitingly, some lguest work! No seriously, I got some cleanup
    patches"

    * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    virtio: drop virtio_device_is_legacy_only
    virtio_pci: support non-legacy balloon devices
    virtio_mmio: support non-legacy balloon devices
    virtio_ccw: support non-legacy balloon devices
    virtio: balloon might not be a legacy device
    virtio_balloon: transitional interface
    virtio_ring: Update weak barriers to use dma_wmb/rmb
    virtio_pci_modern: switch to type-safe io accessors
    virtio_pci_modern: type-safe io accessors
    lguest: handle traps on the "interrupt suppressed" iret instruction.
    virtio: drop a useless config read
    virtio_config: reorder functions
    Add virtio-input driver.
    lguest: suppress interrupts for single insn, not range.
    lguest: simplify lguest_iret
    lguest: rename i386_head.S in the comments
    lguest: explicitly set miscdevice's private_data NULL
    lguest: fix pending interrupt test.

    Linus Torvalds
     

22 Apr, 2015

2 commits

  • Pull char/misc driver updates from Greg KH:
    "Here's the big char/misc driver patchset for 4.1-rc1.

    Lots of different driver subsystem updates here, nothing major, full
    details are in the shortlog.

    All of this has been in linux-next for a while"

    * tag 'char-misc-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (133 commits)
    mei: trace: remove unused TRACE_SYSTEM_STRING
    DTS: ARM: OMAP3-N900: Add lis3lv02d support
    Documentation: DT: lis302: update wakeup binding
    lis3lv02d: DT: add wakeup unit 2 and wakeup threshold
    lis3lv02d: DT: use s32 to support negative values
    Drivers: hv: hv_balloon: correctly handle num_pages>INT_MAX case
    Drivers: hv: hv_balloon: correctly handle val.freeram directory
    coresight-tmc: Adding a status interface to sysfs
    coresight: remove the unnecessary configuration coresight-default-sink
    ...

    Linus Torvalds
     
  • Pull tty/serial updates from Greg KH:
    "Here's the big tty/serial driver update for 4.1-rc1.

    It was delayed for a bit due to some questions surrounding some of the
    console command line parsing changes that are in here. There's still
    one tiny regression for people who were previously putting multiple
    console command lines and expecting them all to be ignored for some
    odd reason, but Peter is working on fixing that. If not, I'll send a
    revert for the offending patch, but I have faith that Peter can
    address it.

    Other than the console work here, there's the usual serial driver
    updates and changes, and a buch of 8250 reworks to try to make that
    driver easier to maintain over time, and have it support more devices
    in the future.

    All of these have been in linux-next for a while"

    * tag 'tty-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (119 commits)
    n_gsm: Drop unneeded cast on netdev_priv
    sc16is7xx: expose RTS inversion in RS-485 mode
    serial: 8250_pci: port failed after wakeup from S3
    earlycon: 8250: Document kernel command line options
    earlycon: 8250: Fix command line regression
    earlycon: Fix __earlycon_table stride
    tty: clean up the tty time logic a bit
    serial: 8250_dw: only get the clock rate in one place
    serial: 8250_dw: remove useless ACPI ID check
    dmaengine: hsu: move memory allocation to GFP_NOWAIT
    dmaengine: hsu: remove redundant pieces of code
    serial: 8250_pci: add Intel Tangier support
    dmaengine: hsu: add Intel Tangier PCI ID
    serial: 8250_pci: replace switch-case by formula for Intel MID
    serial: 8250_pci: replace switch-case by formula
    tty: cpm_uart: replace CONFIG_8xx by CONFIG_CPM1
    serial: jsm: some off by one bugs
    serial: xuartps: Fix check in console_setup().
    serial: xuartps: Get rid of register access macros.
    serial: xuartps: Fix iobase use.
    ...

    Linus Torvalds
     

20 Apr, 2015

1 commit

  • Pull turbostat update from Len Brown:
    "Updates to the turbostat utility.

    Just one kernel dependency in this batch -- added a #define to
    msr-index.h"

    * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
    tools/power turbostat: correct dumped pkg-cstate-limit value
    tools/power turbostat: calculate TSC frequency from CPUID(0x15) on SKL
    tools/power turbostat: correct DRAM RAPL units on recent Xeon processors
    tools/power turbostat: Initial Skylake support
    tools/power turbostat: Use $(CURDIR) instead of $(PWD) and add support for O= option in Makefile
    tools/power turbostat: modprobe msr, if needed
    tools/power turbostat: dump MSR_TURBO_RATIO_LIMIT2
    tools/power turbostat: use new MSR_TURBO_RATIO_LIMIT names
    x86 msr-index: define MSR_TURBO_RATIO_LIMIT,1,2
    tools/power turbostat: label base frequency
    tools/power turbostat: update PERF_LIMIT_REASONS decoding
    tools/power turbostat: simplify default output

    Linus Torvalds
     

19 Apr, 2015

1 commit


18 Apr, 2015

1 commit

  • Pull PMEM driver from Ingo Molnar:
    "This is the initial support for the pmem block device driver:
    persistent non-volatile memory space mapped into the system's physical
    memory space as large physical memory regions.

    The driver is based on Intel code, written by Ross Zwisler, with fixes
    by Boaz Harrosh, integrated with x86 e820 memory resource management
    and tidied up by Christoph Hellwig.

    Note that there were two other separate pmem driver submissions to
    lkml: but apparently all parties (Ross Zwisler, Boaz Harrosh) are
    reasonably happy with this initial version.

    This version enables minimal support that enables persistent memory
    devices out in the wild to work as block devices, identified through a
    magic (non-standard) e820 flag and auto-discovered if
    CONFIG_X86_PMEM_LEGACY=y, or added explicitly through manipulating the
    memory maps via the "memmap=..." boot option with the new, special '!'
    modifier character.

    Limitations: this is a regular block device, and since the pmem areas
    are not struct page backed, they are invisible to the rest of the
    system (other than the block IO device), so direct IO to/from pmem
    areas, direct mmap() or XIP is not possible yet. The page cache will
    also shadow and double buffer pmem contents, etc.

    Initial support is for x86"

    * 'x86-pmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    drivers/block/pmem: Fix 32-bit build warning in pmem_alloc()
    drivers/block/pmem: Add a driver for persistent memory
    x86/mm: Add support for the non-standard protected e820 type

    Linus Torvalds
     

17 Apr, 2015

1 commit

  • Switch to using the newly created asm-generic/seccomp.h for the seccomp
    strict mode syscall definitions. The obsolete sigreturn syscall override
    is retained in 32-bit mode, and the ia32 syscall overrides are used in
    the compat case. Remaining definitions were identical.

    Signed-off-by: Kees Cook
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

16 Apr, 2015

1 commit

  • Pull exec domain removal from Richard Weinberger:
    "This series removes execution domain support from Linux.

    The idea behind exec domains was to support different ABIs. The
    feature was never complete nor stable. Let's rip it out and make the
    kernel signal handling code less complicated"

    * 'exec_domain_rip_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc: (27 commits)
    arm64: Removed unused variable
    sparc: Fix execution domain removal
    Remove rest of exec domains.
    arch: Remove exec_domain from remaining archs
    arc: Remove signal translation and exec_domain
    xtensa: Remove signal translation and exec_domain
    xtensa: Autogenerate offsets in struct thread_info
    x86: Remove signal translation and exec_domain
    unicore32: Remove signal translation and exec_domain
    um: Remove signal translation and exec_domain
    tile: Remove signal translation and exec_domain
    sparc: Remove signal translation and exec_domain
    sh: Remove signal translation and exec_domain
    s390: Remove signal translation and exec_domain
    mn10300: Remove signal translation and exec_domain
    microblaze: Remove signal translation and exec_domain
    m68k: Remove signal translation and exec_domain
    m32r: Remove signal translation and exec_domain
    m32r: Autogenerate offsets in struct thread_info
    frv: Remove signal translation and exec_domain
    ...

    Linus Torvalds
     

15 Apr, 2015

9 commits

  • Pull power management and ACPI updates from Rafael Wysocki:
    "These are mostly fixes and cleanups all over, although there are a few
    items that sort of fall into the new feature category.

    First off, we have new callbacks for PM domains that should help us to
    handle some issues related to device initialization in a better way.

    There also is some consolidation in the unified device properties API
    area allowing us to use that inferface for accessing data coming from
    platform initialization code in addition to firmware-provided data.

    We have some new device/CPU IDs in a few drivers, support for new
    chips and a new cpufreq driver too.

    Specifics:

    - Generic PM domains support update including new PM domain callbacks
    to handle device initialization better (Russell King, Rafael J
    Wysocki, Kevin Hilman)

    - Unified device properties API update including a new mechanism for
    accessing data provided by platform initialization code (Rafael J
    Wysocki, Adrian Hunter)

    - ARM cpuidle update including ARM32/ARM64 handling consolidation
    (Daniel Lezcano)

    - intel_idle update including support for the Silvermont Core in the
    Baytrail SOC and for the Airmont Core in the Cherrytrail and
    Braswell SOCs (Len Brown, Mathias Krause)

    - New cpufreq driver for Hisilicon ACPU (Leo Yan)

    - intel_pstate update including support for the Knights Landing chip
    (Dasaratharaman Chandramouli, Kristen Carlson Accardi)

    - QorIQ cpufreq driver update (Tang Yuantian, Arnd Bergmann)

    - powernv cpufreq driver update (Shilpasri G Bhat)

    - devfreq update including Tegra support changes (Tomeu Vizoso,
    MyungJoo Ham, Chanwoo Choi)

    - powercap RAPL (Running-Average Power Limit) driver update including
    support for Intel Broadwell server chips (Jacob Pan, Mathias Krause)

    - ACPI device enumeration update related to the handling of the
    special PRP0001 device ID allowing DT-style 'compatible' property
    to be used for ACPI device identification (Rafael J Wysocki)

    - ACPI EC driver update including limited _DEP support (Lan Tianyu,
    Lv Zheng)

    - ACPI backlight driver update including a new mechanism to allow
    native backlight handling to be forced on non-Windows 8 systems and
    a new quirk for Lenovo Ideapad Z570 (Aaron Lu, Hans de Goede)

    - New Windows Vista compatibility quirk for Sony VGN-SR19XN (Chen Yu)

    - Assorted ACPI fixes and cleanups (Aaron Lu, Martin Kepplinger,
    Masanari Iida, Mika Westerberg, Nan Li, Rafael J Wysocki)

    - Fixes related to suspend-to-idle for the iTCO watchdog driver and
    the ACPI core system suspend/resume code (Rafael J Wysocki, Chen Yu)

    - PM tracing support for the suspend phase of system suspend/resume
    transitions (Zhonghui Fu)

    - Configurable delay for the system suspend/resume testing facility
    (Brian Norris)

    - PNP subsystem cleanups (Peter Huewe, Rafael J Wysocki)"

    * tag 'pm+acpi-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits)
    ACPI / scan: Fix NULL pointer dereference in acpi_companion_match()
    ACPI / scan: Rework modalias creation when "compatible" is present
    intel_idle: mark cpu id array as __initconst
    powercap / RAPL: mark rapl_ids array as __initconst
    powercap / RAPL: add ID for Broadwell server
    intel_pstate: Knights Landing support
    intel_pstate: remove MSR test
    cpufreq: fix qoriq uniprocessor build
    ACPI / scan: Take the PRP0001 position in the list of IDs into account
    ACPI / scan: Simplify acpi_match_device()
    ACPI / scan: Generalize of_compatible matching
    device property: Introduce firmware node type for platform data
    device property: Make it possible to use secondary firmware nodes
    PM / watchdog: iTCO: stop watchdog during system suspend
    cpufreq: hisilicon: add acpu driver
    ACPI / EC: Call acpi_walk_dep_device_list() after installing EC opregion handler
    cpufreq: powernv: Report cpu frequency throttling
    intel_idle: Add support for the Airmont Core in the Cherrytrail and Braswell SOCs
    intel_idle: Update support for Silvermont Core in Baytrail SOC
    PM / devfreq: tegra: Register governor on module init
    ...

    Linus Torvalds
     
  • Merge first patchbomb from Andrew Morton:

    - arch/sh updates

    - ocfs2 updates

    - kernel/watchdog feature

    - about half of mm/

    * emailed patches from Andrew Morton : (122 commits)
    Documentation: update arch list in the 'memtest' entry
    Kconfig: memtest: update number of test patterns up to 17
    arm: add support for memtest
    arm64: add support for memtest
    memtest: use phys_addr_t for physical addresses
    mm: move memtest under mm
    mm, hugetlb: abort __get_user_pages if current has been oom killed
    mm, mempool: do not allow atomic resizing
    memcg: print cgroup information when system panics due to panic_on_oom
    mm: numa: remove migrate_ratelimited
    mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE
    mm: split ET_DYN ASLR from mmap ASLR
    s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE
    mm: expose arch_mmap_rnd when available
    s390: standardize mmap_rnd() usage
    powerpc: standardize mmap_rnd() usage
    mips: extract logic for mmap_rnd()
    arm64: standardize mmap_rnd() usage
    x86: standardize mmap_rnd() usage
    arm: factor out mmap ASLR into mmap_rnd
    ...

    Linus Torvalds
     
  • Memtest is a simple feature which fills the memory with a given set of
    patterns and validates memory contents, if bad memory regions is detected
    it reserves them via memblock API. Since memblock API is widely used by
    other architectures this feature can be enabled outside of x86 world.

    This patch set promotes memtest to live under generic mm umbrella and
    enables memtest feature for arm/arm64.

    It was reported that this patch set was useful for tracking down an issue
    with some errant DMA on an arm64 platform.

    This patch (of 6):

    There is nothing platform dependent in the core memtest code, so other
    platforms might benefit from this feature too.

    [linux@roeck-us.net: MEMTEST depends on MEMBLOCK]
    Signed-off-by: Vladimir Murzin
    Acked-by: Will Deacon
    Tested-by: Mark Rutland
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Catalin Marinas
    Cc: Russell King
    Cc: Paul Bolle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Murzin
     
  • The arch_randomize_brk() function is used on several architectures,
    even those that don't support ET_DYN ASLR. To avoid bulky extern/#define
    tricks, consolidate the support under CONFIG_ARCH_HAS_ELF_RANDOMIZE for
    the architectures that support it, while still handling CONFIG_COMPAT_BRK.

    Signed-off-by: Kees Cook
    Cc: Hector Marco-Gisbert
    Cc: Russell King
    Reviewed-by: Ingo Molnar
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Alexander Viro
    Cc: Oleg Nesterov
    Cc: Andy Lutomirski
    Cc: "David A. Long"
    Cc: Andrey Ryabinin
    Cc: Arun Chandran
    Cc: Yann Droneaud
    Cc: Min-Hua Chen
    Cc: Paul Burton
    Cc: Alex Smith
    Cc: Markos Chandras
    Cc: Vineeth Vijayan
    Cc: Jeff Bailey
    Cc: Michael Holzheu
    Cc: Ben Hutchings
    Cc: Behan Webster
    Cc: Ismael Ripoll
    Cc: Jan-Simon Mller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Implement huge I/O mapping capability interfaces for ioremap() on x86.

    IOREMAP_MAX_ORDER is defined to PUD_SHIFT on x86/64 and PMD_SHIFT on
    x86/32, which overrides the default value defined in .

    Signed-off-by: Toshi Kani
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Arnd Bergmann
    Cc: Dave Hansen
    Cc: Robert Elliott
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshi Kani
     
  • We would want to use number of page table level to define mm_struct.
    Let's expose it as CONFIG_PGTABLE_LEVELS.

    Signed-off-by: Kirill A. Shutemov
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Tested-by: Guenter Roeck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Pull perf changes from Ingo Molnar:
    "Core kernel changes:

    - One of the more interesting features in this cycle is the ability
    to attach eBPF programs (user-defined, sandboxed bytecode executed
    by the kernel) to kprobes.

    This allows user-defined instrumentation on a live kernel image
    that can never crash, hang or interfere with the kernel negatively.
    (Right now it's limited to root-only, but in the future we might
    allow unprivileged use as well.)

    (Alexei Starovoitov)

    - Another non-trivial feature is per event clockid support: this
    allows, amongst other things, the selection of different clock
    sources for event timestamps traced via perf.

    This feature is sought by people who'd like to merge perf generated
    events with external events that were measured with different
    clocks:

    - cluster wide profiling

    - for system wide tracing with user-space events,

    - JIT profiling events

    etc. Matching perf tooling support is added as well, available via
    the -k, --clockid parameter to perf record et al.

    (Peter Zijlstra)

    Hardware enablement kernel changes:

    - x86 Intel Processor Trace (PT) support: which is a hardware tracer
    on steroids, available on Broadwell CPUs.

    The hardware trace stream is directly output into the user-space
    ring-buffer, using the 'AUX' data format extension that was added
    to the perf core to support hardware constraints such as the
    necessity to have the tracing buffer physically contiguous.

    This patch-set was developed for two years and this is the result.
    A simple way to make use of this is to use BTS tracing, the PT
    driver emulates BTS output - available via the 'intel_bts' PMU.
    More explicit PT specific tooling support is in the works as well -
    will probably be ready by 4.2.

    (Alexander Shishkin, Peter Zijlstra)

    - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware
    feature of Intel Xeon CPUs that allows the measurement and
    allocation/partitioning of caches to individual workloads.

    These kernel changes expose the measurement side as a new PMU
    driver, which exposes various QoS related PMU events. (The
    partitioning change is work in progress and is planned to be merged
    as a cgroup extension.)

    (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P
    Waskiewicz Jr)

    - x86 Intel Haswell LBR call stack support: this is a new Haswell
    feature that allows the hardware recording of call chains, plus
    tooling support. To activate this feature you have to enable it
    via the new 'lbr' call-graph recording option:

    perf record --call-graph lbr
    perf report

    or:

    perf top --call-graph lbr

    This hardware feature is a lot faster than stack walk or dwarf
    based unwinding, but has some limitations:

    - It reuses the current LBR facility, so LBR call stack and
    branch record can not be enabled at the same time.

    - It is only available for user-space callchains.

    (Yan, Zheng)

    - x86 Intel Broadwell CPU support and various event constraints and
    event table fixes for earlier models.

    (Andi Kleen)

    - x86 Intel HT CPUs event scheduling workarounds. This is a complex
    CPU bug affecting the SNB,IVB,HSW families that results in counter
    value corruption. The mitigation code is automatically enabled and
    is transparent.

    (Maria Dimakopoulou, Stephane Eranian)

    The perf tooling side had a ton of changes in this cycle as well, so
    I'm only able to list the user visible changes here, in addition to
    the tooling changes outlined above:

    User visible changes affecting all tools:

    - Improve support of compressed kernel modules (Jiri Olsa)
    - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo)
    - Bash completion for subcommands (Yunlong Song)
    - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa)
    - Support missing -f to override perf.data file ownership. (Yunlong Song)
    - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)

    User visible changes in individual tools:

    'perf data':

    New tool for converting perf.data to other formats, initially
    for the CTF (Common Trace Format) from LTTng (Jiri Olsa,
    Sebastian Siewior)

    'perf diff':

    Add --kallsyms option (David Ahern)

    'perf list':

    Allow listing events with 'tracepoint' prefix (Yunlong Song)

    Sort the output of the command (Yunlong Song)

    'perf kmem':

    Respect -i option (Jiri Olsa)

    Print big numbers using thousands' group (Namhyung Kim)

    Allow -v option (Namhyung Kim)

    Fix alignment of slab result table (Namhyung Kim)

    'perf probe':

    Support multiple probes on different binaries on the same command line (Masami Hiramatsu)

    Support unnamed union/structure members data collection. (Masami Hiramatsu)

    Check kprobes blacklist when adding new events. (Masami Hiramatsu)

    'perf record':

    Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra)

    Support recording running/enabled time (Andi Kleen)

    'perf sched':

    Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song)

    'perf report' and 'perf top':

    Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo)

    Indicate which callchain entries are annotated in the
    TUI hists browser (Arnaldo Carvalho de Melo)

    Add pid/tid filtering to 'report' and 'script' commands (David Ahern)

    Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one
    cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT
    events (Arnaldo Carvalho de Melo)

    'perf stat':

    Report unsupported events properly (Suzuki K. Poulose)

    Output running time and run/enabled ratio in CSV mode (Andi Kleen)

    'perf trace':

    Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo)

    Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo)

    Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo)

    Dump stack on segfaults (Arnaldo Carvalho de Melo)

    No need to explicitely enable evsels for workload started from perf, let it
    be enabled via perf_event_attr.enable_on_exec, removing some events that take
    place in the 'perf trace' before a workload is really started by it.
    (Arnaldo Carvalho de Melo)

    Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo)

    There's also been a ton of infrastructure work done, such as the
    split-out of perf's build system into tools/build/ and other changes -
    see the shortlog and changelog for details"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits)
    perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init()
    perf evlist: Fix type for references to data_head/tail
    perf probe: Check the orphaned -x option
    perf probe: Support multiple probes on different binaries
    perf buildid-list: Fix segfault when show DSOs with hits
    perf tools: Fix cross-endian analysis
    perf tools: Fix error path to do closedir() when synthesizing threads
    perf tools: Fix synthesizing fork_event.ppid for non-main thread
    perf tools: Add 'I' event modifier for exclude_idle bit
    perf report: Don't call map__kmap if map is NULL.
    perf tests: Fix attr tests
    perf probe: Fix ARM 32 building error
    perf tools: Merge all perf_event_attr print functions
    perf record: Add clockid parameter
    perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10
    perf sched replay: Support using -f to override perf.data file ownership
    perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files
    perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task
    perf sched replay: Fix the segmentation fault problem caused by pr_err in threads
    perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations
    ...

    Linus Torvalds
     
  • Pull RCU changes from Ingo Molnar:
    "The main changes in this cycle were:

    - changes permitting use of call_rcu() and friends very early in
    boot, for example, before rcu_init() is invoked.

    - add in-kernel API to enable and disable expediting of normal RCU
    grace periods.

    - improve RCU's handling of (hotplug-) outgoing CPUs.

    - NO_HZ_FULL_SYSIDLE fixes.

    - tiny-RCU updates to make it more tiny.

    - documentation updates.

    - miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
    cpu: Provide smpboot_thread_init() on !CONFIG_SMP kernels as well
    cpu: Defer smpboot kthread unparking until CPU known to scheduler
    rcu: Associate quiescent-state reports with grace period
    rcu: Yet another fix for preemption and CPU hotplug
    rcu: Add diagnostics to grace-period cleanup
    rcutorture: Default to grace-period-initialization delays
    rcu: Handle outgoing CPUs on exit from idle loop
    cpu: Make CPU-offline idle-loop transition point more precise
    rcu: Eliminate ->onoff_mutex from rcu_node structure
    rcu: Process offlining and onlining only at grace-period start
    rcu: Move rcu_report_unblock_qs_rnp() to common code
    rcu: Rework preemptible expedited bitmask handling
    rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs
    rcutorture: Enable slow grace-period initializations
    rcu: Provide diagnostic option to slow down grace-period initialization
    rcu: Detect stalls caused by failure to propagate up rcu_node tree
    rcu: Eliminate empty HOTPLUG_CPU ifdef
    rcu: Simplify sync_rcu_preempt_exp_init()
    rcu: Put all orphan-callback-related code under same comment
    rcu: Consolidate offline-CPU callback initialization
    ...

    Linus Torvalds
     
  • Pull livepatching updates from Jiri Kosina:
    "These are mostly smaller things that got accumulated during the
    development cycle. The unified solution is still being worked on and
    is not mature enough for 4.1 yet.

    - s390 livepatching support, from Jiri Slaby (has Ack from s390
    maintainers)

    - error handling simplification, from Josh Poimboeuf

    - two minor code cleanups from Josh Poimboeuf and Miroslav Benes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
    livepatch: add support on s390
    livepatch: remove unnecessary call to klp_find_object_module()
    livepatch: simplify disable error path
    livepatch: remove extern specifier from header files

    Linus Torvalds