13 Apr, 2016

1 commit


01 Apr, 2016

1 commit

  • In a798f091113e ("x86/entry/32: Change INT80 to be an interrupt gate")
    Andy broke lguest. This is because lguest had special code to allow
    the 0x80 trap gate go straight into the guest itself; interrupts gates
    (without more work, as mentioned in the file's comments) bounce via
    the hypervisor.

    His change made them go via the hypervisor, but as it's in the range of
    normal hardware interrupts, they were not directed through to the guest
    at all. Turns out the guest userspace isn't very effective if syscalls
    are all noops.

    I haven't ripped out all the now-useless trap-direct-to-guest-kernel
    code yet, since it will still be needed if someone decides to update
    this optimization.

    Signed-off-by: Rusty Russell
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Weisbecker
    Cc: x86\@kernel.org
    Link: http://lkml.kernel.org/r/87fuv685kl.fsf@rustcorp.com.au
    Signed-off-by: Ingo Molnar

    Rusty Russell
     

31 Mar, 2016

1 commit

  • Use static_cpu_has() in __flush_tlb_all() due to the time-sensitivity of
    this one.

    Signed-off-by: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1459266123-21878-10-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     

12 Jan, 2016

1 commit

  • Pavel noted that lguest maps the switcher code executable and
    read-write. This is a bad idea for any kernel text, but
    particularly for text mapped at a fixed address.

    Create two vmas, one for the text (PAGE_KERNEL_RX) and another
    for the stacks (PAGE_KERNEL). Use VM_NO_GUARD to map them
    adjacent (as expected by the rest of the code).

    Reported-by: Pavel Machek
    Tested-by: Pavel Machek
    Signed-off-by: Rusty Russell
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Rusty Russell
     

23 Jun, 2015

2 commits

  • Pull x86 core updates from Ingo Molnar:
    "There were so many changes in the x86/asm, x86/apic and x86/mm topics
    in this cycle that the topical separation of -tip broke down somewhat -
    so the result is a more traditional architecture pull request,
    collected into the 'x86/core' topic.

    The topics were still maintained separately as far as possible, so
    bisectability and conceptual separation should still be pretty good -
    but there were a handful of merge points to avoid excessive
    dependencies (and conflicts) that would have been poorly tested in the
    end.

    The next cycle will hopefully be much more quiet (or at least will
    have fewer dependencies).

    The main changes in this cycle were:

    * x86/apic changes, with related IRQ core changes: (Jiang Liu, Thomas
    Gleixner)

    - This is the second and most intrusive part of changes to the x86
    interrupt handling - full conversion to hierarchical interrupt
    domains:

    [IOAPIC domain] -----
    |
    [MSI domain] --------[Remapping domain] ----- [ Vector domain ]
    | (optional) |
    [HPET MSI domain] ----- |
    |
    [DMAR domain] -----------------------------
    |
    [Legacy domain] -----------------------------

    This now reflects the actual hardware and allowed us to distangle
    the domain specific code from the underlying parent domain, which
    can be optional in the case of interrupt remapping. It's a clear
    separation of functionality and removes quite some duct tape
    constructs which plugged the remap code between ioapic/msi/hpet
    and the vector management.

    - Intel IOMMU IRQ remapping enhancements, to allow direct interrupt
    injection into guests (Feng Wu)

    * x86/asm changes:

    - Tons of cleanups and small speedups, micro-optimizations. This
    is in preparation to move a good chunk of the low level entry
    code from assembly to C code (Denys Vlasenko, Andy Lutomirski,
    Brian Gerst)

    - Moved all system entry related code to a new home under
    arch/x86/entry/ (Ingo Molnar)

    - Removal of the fragile and ugly CFI dwarf debuginfo annotations.
    Conversion to C will reintroduce many of them - but meanwhile
    they are only getting in the way, and the upstream kernel does
    not rely on them (Ingo Molnar)

    - NOP handling refinements. (Borislav Petkov)

    * x86/mm changes:

    - Big PAT and MTRR rework: making the code more robust and
    preparing to phase out exposing direct MTRR interfaces to drivers -
    in favor of using PAT driven interfaces (Toshi Kani, Luis R
    Rodriguez, Borislav Petkov)

    - New ioremap_wt()/set_memory_wt() interfaces to support
    Write-Through cached memory mappings. This is especially
    important for good performance on NVDIMM hardware (Toshi Kani)

    * x86/ras changes:

    - Add support for deferred errors on AMD (Aravind Gopalakrishnan)

    This is an important RAS feature which adds hardware support for
    poisoned data. That means roughly that the hardware marks data
    which it has detected as corrupted but wasn't able to correct, as
    poisoned data and raises an APIC interrupt to signal that in the
    form of a deferred error. It is the OS's responsibility then to
    take proper recovery action and thus prolonge system lifetime as
    far as possible.

    - Add support for Intel "Local MCE"s: upcoming CPUs will support
    CPU-local MCE interrupts, as opposed to the traditional system-
    wide broadcasted MCE interrupts (Ashok Raj)

    - Misc cleanups (Borislav Petkov)

    * x86/platform changes:

    - Intel Atom SoC updates

    ... and lots of other cleanups, fixlets and other changes - see the
    shortlog and the Git log for details"

    * 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (222 commits)
    x86/hpet: Use proper hpet device number for MSI allocation
    x86/hpet: Check for irq==0 when allocating hpet MSI interrupts
    x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled
    x86/mm/pat, drivers/media/ivtv: Use arch_phys_wc_add() and require PAT disabled
    x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail
    genirq: Prevent crash in irq_move_irq()
    genirq: Enhance irq_data_to_desc() to support hierarchy irqdomain
    iommu, x86: Properly handle posted interrupts for IOMMU hotplug
    iommu, x86: Provide irq_remapping_cap() interface
    iommu, x86: Setup Posted-Interrupts capability for Intel iommu
    iommu, x86: Add cap_pi_support() to detect VT-d PI capability
    iommu, x86: Avoid migrating VT-d posted interrupts
    iommu, x86: Save the mode (posted or remapped) of an IRTE
    iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
    iommu: dmar: Provide helper to copy shared irte fields
    iommu: dmar: Extend struct irte for VT-d Posted-Interrupts
    iommu: Add new member capability to struct irq_remap_ops
    x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code
    x86/asm/entry/32: Shorten __audit_syscall_entry() args preparation
    x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry()
    ...

    Linus Torvalds
     
  • Pull x86 FPU updates from Ingo Molnar:
    "This tree contains two main changes:

    - The big FPU code rewrite: wide reaching cleanups and reorganization
    that pulls all the FPU code together into a clean base in
    arch/x86/fpu/.

    The resulting code is leaner and faster, and much easier to
    understand. This enables future work to further simplify the FPU
    code (such as removing lazy FPU restores).

    By its nature these changes have a substantial regression risk: FPU
    code related bugs are long lived, because races are often subtle
    and bugs mask as user-space failures that are difficult to track
    back to kernel side backs. I'm aware of no unfixed (or even
    suspected) FPU related regression so far.

    - MPX support rework/fixes. As this is still not a released CPU
    feature, there were some buglets in the code - should be much more
    robust now (Dave Hansen)"

    * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (250 commits)
    x86/fpu: Fix double-increment in setup_xstate_features()
    x86/mpx: Allow 32-bit binaries on 64-bit kernels again
    x86/mpx: Do not count MPX VMAs as neighbors when unmapping
    x86/mpx: Rewrite the unmap code
    x86/mpx: Support 32-bit binaries on 64-bit kernels
    x86/mpx: Use 32-bit-only cmpxchg() for 32-bit apps
    x86/mpx: Introduce new 'directory entry' to 'addr' helper function
    x86/mpx: Add temporary variable to reduce masking
    x86: Make is_64bit_mm() widely available
    x86/mpx: Trace allocation of new bounds tables
    x86/mpx: Trace the attempts to find bounds tables
    x86/mpx: Trace entry to bounds exception paths
    x86/mpx: Trace #BR exceptions
    x86/mpx: Introduce a boot-time disable flag
    x86/mpx: Restrict the mmap() size check to bounds tables
    x86/mpx: Remove redundant MPX_BNDCFG_ADDR_MASK
    x86/mpx: Clean up the code by not passing a task pointer around when unnecessary
    x86/mpx: Use the new get_xsave_field_ptr()API
    x86/fpu/xstate: Wrap get_xsave_addr() to make it safer
    x86/fpu/xstate: Fix up bad get_xsave_addr() assumptions
    ...

    Linus Torvalds
     

03 Jun, 2015

1 commit


28 May, 2015

1 commit

  • This bug has been there since day 1; addresses in the top guest physical
    page weren't considered valid. You could map that page (the check in
    check_gpte() is correct), but if a guest tried to put a pagetable there
    we'd check that address manually when walking it, and kill the guest.

    Signed-off-by: Rusty Russell
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Rusty Russell
     

19 May, 2015

5 commits

  • This cleans up the call sites and the function a bit,
    and also makes it more symmetric with the other high
    level FPU state handling functions.

    It's still only valid for the current task, as we copy
    to the FPU registers of the current CPU.

    No change in functionality.

    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Rename this function in line with the new FPU nomenclature.

    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • There are a number of FPU internal function prototypes and an inline function
    in fpu/api.h, mostly placed so historically as the code grew over the years.

    Move them over into fpu/internal.h where they belong. (Add sched.h include
    to stackprotector.h which incorrectly relied on getting it from fpu/api.h.)

    fpu/api.h is now a pure file that only contains FPU APIs intended for driver
    use.

    Reviewed-by: Borislav Petkov
    Cc: Andy Lutomirski
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • We already have fpu/types.h, move i387.h to fpu/api.h.

    The file name has become a misnomer anyway: it offers generic FPU APIs,
    but is not limited to i387 functionality.

    Reviewed-by: Borislav Petkov
    Cc: Andy Lutomirski
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Move to the new fpu__*() namespace.

    Reviewed-by: Borislav Petkov
    Cc: Andy Lutomirski
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

11 May, 2015

1 commit

  • Commit:

    51bb92843edc ("x86/asm/entry: Remove SYSCALL_VECTOR")

    Converted most uses of SYSCALL_VECTOR to IA32_SYSCALL_VECTOR, but
    forgot about lguest.

    Cc: Brian Gerst
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1431185813-15413-4-git-send-email-brgerst@gmail.com
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

23 Apr, 2015

1 commit

  • Pull virtio updates from Rusty Russell:
    "Some virtio internal cleanups, a new virtio device "virtio input", and
    a change to allow the legacy virtio balloon.

    Most excitingly, some lguest work! No seriously, I got some cleanup
    patches"

    * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    virtio: drop virtio_device_is_legacy_only
    virtio_pci: support non-legacy balloon devices
    virtio_mmio: support non-legacy balloon devices
    virtio_ccw: support non-legacy balloon devices
    virtio: balloon might not be a legacy device
    virtio_balloon: transitional interface
    virtio_ring: Update weak barriers to use dma_wmb/rmb
    virtio_pci_modern: switch to type-safe io accessors
    virtio_pci_modern: type-safe io accessors
    lguest: handle traps on the "interrupt suppressed" iret instruction.
    virtio: drop a useless config read
    virtio_config: reorder functions
    Add virtio-input driver.
    lguest: suppress interrupts for single insn, not range.
    lguest: simplify lguest_iret
    lguest: rename i386_head.S in the comments
    lguest: explicitly set miscdevice's private_data NULL
    lguest: fix pending interrupt test.

    Linus Torvalds
     

02 Apr, 2015

1 commit

  • Since commit 8e7094694396 ("lguest: add a dummy PCI host bridge.")
    lguest uses PCI, but it needs you to frob the ports directly.

    Signed-off-by: Rusty Russell
    Acked-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Rusty Russell
     

01 Apr, 2015

1 commit

  • Lguest's "iret" is non-atomic, as it needs to restore the interrupt
    state before the real iret (the guest can't actually suppress
    interrupts). For this reason, the host discards an interrupt if it
    occurs in this (1-instruction) window.

    We can do better, by emulating the iret execution, then immediately
    setting up the interrupt handler. In fact, we don't need to do much,
    as emulating the iret and setting up th stack for the interrupt handler
    basically cancel each other out.

    Signed-off-by: Rusty Russell

    Rusty Russell
     

24 Mar, 2015

2 commits

  • The last patch reduced our interrupt-suppression region to one address,
    so simplify the code somewhat.

    Also, remove the obsolete undefined instruction ranges and the comment
    which refers to lguest_guest.S instead of head_32.S.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • There is a proposed change to the miscdevice's behaviour on open(). Currently
    file->private_data stays NULL, but only because we don't have an open-entry in
    struct file_operations.

    This may change so that private_data, more consistently, is always set to
    struct miscdevice, not only *if* the driver has it's own open() routine and
    fops-entry, see https://lkml.org/lkml/2014/12/4/939 and commit
    94e4fe2cab3d43b3ba7c3f721743006a8c9d913a

    In short: If we rely on file->private_data being NULL, we should ensure
    it is NULL ourselves.

    Signed-off-by: Martin Kepplinger
    Signed-off-by: Rusty Russell

    Martin Kepplinger
     

19 Feb, 2015

1 commit

  • Pull virtio updates from Rusty Russell:
    "OK, this has the big virtio 1.0 implementation, as specified by OASIS.

    On top of tht is the major rework of lguest, to use PCI and virtio
    1.0, to double-check the implementation.

    Then comes the inevitable fixes and cleanups from that work"

    * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (80 commits)
    virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice.
    virtio_net: unconditionally define struct virtio_net_hdr_v1.
    tools/lguest: don't use legacy definitions for net device in example launcher.
    virtio: Don't expose legacy net features when VIRTIO_NET_NO_LEGACY defined.
    tools/lguest: use common error macros in the example launcher.
    tools/lguest: give virtqueues names for better error messages
    tools/lguest: more documentation and checking of virtio 1.0 compliance.
    lguest: don't look in console features to find emerg_wr.
    tools/lguest: don't start devices until DRIVER_OK status set.
    tools/lguest: handle indirect partway through chain.
    tools/lguest: insert driver references from the 1.0 spec (4.1 Virtio Over PCI)
    tools/lguest: insert device references from the 1.0 spec (4.1 Virtio Over PCI)
    tools/lguest: rename virtio_pci_cfg_cap field to match spec.
    tools/lguest: fix features_accepted logic in example launcher.
    tools/lguest: handle device reset correctly in example launcher.
    virtual: Documentation: simplify and generalize paravirt_ops.txt
    lguest: remove NOTIFY call and eventfd facility.
    lguest: remove NOTIFY facility from demonstration launcher.
    lguest: use the PCI console device's emerg_wr for early boot messages.
    lguest: always put console in PCI slot #1.
    ...

    Linus Torvalds
     

11 Feb, 2015

8 commits


04 Feb, 2015

1 commit

  • CR4 manipulation was split, seemingly at random, between direct
    (write_cr4) and using a helper (set/clear_in_cr4). Unfortunately,
    the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code,
    which only a small subset of users actually wanted.

    This patch replaces all cr4 access in functions that don't leave cr4
    exactly the way they found it with new helpers cr4_set_bits,
    cr4_clear_bits, and cr4_set_bits_and_update_boot.

    Signed-off-by: Andy Lutomirski
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrea Arcangeli
    Cc: Vince Weaver
    Cc: "hillf.zj"
    Cc: Valdis Kletnieks
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Kees Cook
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.net
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

09 Dec, 2014

4 commits

  • This will make it easy for transports to validate features and return
    failure.

    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     
  • At this point, no transports set any of the high 32 feature bits.
    Since transports generally can't (yet) cope with such bits, add BUG_ON
    checks to make sure they are not set by mistake.

    Based on rproc patch by Rusty.

    Signed-off-by: Rusty Russell
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: David Hildenbrand
    Reviewed-by: Cornelia Huck

    Michael S. Tsirkin
     
  • Change u32 to u64, and use BIT_ULL and 1ULL everywhere.

    Note: transports are unchanged, and only set low 32 bit.
    This guarantees that no transport sets e.g. VERSION_1
    by mistake without proper support.

    Based on patch by Rusty.

    Signed-off-by: Rusty Russell
    Signed-off-by: Cornelia Huck
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: David Hildenbrand
    Reviewed-by: Cornelia Huck

    Michael S. Tsirkin
     
  • It seemed like a good idea to use bitmap for features
    in struct virtio_device, but it's actually a pain,
    and seems to become even more painful when we get more
    than 32 feature bits. Just change it to a u32 for now.

    Based on patch by Rusty.

    Suggested-by: David Hildenbrand
    Signed-off-by: Rusty Russell
    Signed-off-by: Cornelia Huck
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Cornelia Huck

    Michael S. Tsirkin
     

07 Aug, 2014

1 commit

  • Currently map_vm_area() takes (struct page *** pages) as third argument,
    and after mapping, it moves (*pages) to point to (*pages +
    nr_mappped_pages).

    It looks like this kind of increment is useless to its caller these
    days. The callers don't care about the increments and actually they're
    trying to avoid this by passing another copy to map_vm_area().

    The caller can always guarantee all the pages can be mapped into vm_area
    as specified in first argument and the caller only cares about whether
    map_vm_area() fails or not.

    This patch cleans up the pointer movement in map_vm_area() and updates
    its callers accordingly.

    Signed-off-by: WANG Chao
    Cc: Zhang Yanfei
    Acked-by: Greg Kroah-Hartman
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    WANG Chao
     

08 Apr, 2014

1 commit


07 Nov, 2013

1 commit


29 Oct, 2013

1 commit

  • Currently a host kick error is silently ignored and not reflected in
    the virtqueue of a particular virtio device.

    Changing the notify API for guest->host notification seems to be one
    prerequisite in order to be able to handle such errors in the context
    where the kick is triggered.

    This patch changes the notify API. The notify function must return a
    bool return value. It returns false if the host notification failed.

    Signed-off-by: Heinz Graalfs
    Signed-off-by: Rusty Russell

    Heinz Graalfs
     

06 Sep, 2013

2 commits


05 Jul, 2013

1 commit

  • Pull trivial tree updates from Jiri Kosina:
    "The usual stuff from trivial tree"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
    treewide: relase -> release
    Documentation/cgroups/memory.txt: fix stat file documentation
    sysctl/net.txt: delete reference to obsolete 2.4.x kernel
    spinlock_api_smp.h: fix preprocessor comments
    treewide: Fix typo in printk
    doc: device tree: clarify stuff in usage-model.txt.
    open firmware: "/aliasas" -> "/aliases"
    md: bcache: Fixed a typo with the word 'arithmetic'
    irq/generic-chip: fix a few kernel-doc entries
    frv: Convert use of typedef ctl_table to struct ctl_table
    sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
    doc: clk: Fix incorrect wording
    Documentation/arm/IXP4xx fix a typo
    Documentation/networking/ieee802154 fix a typo
    Documentation/DocBook/media/v4l fix a typo
    Documentation/video4linux/si476x.txt fix a typo
    Documentation/virtual/kvm/api.txt fix a typo
    Documentation/early-userspace/README fix a typo
    Documentation/video4linux/soc-camera.txt fix a typo
    lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
    ...

    Linus Torvalds