01 Aug, 2010

6 commits


09 Jun, 2010

2 commits

  • The containing function is called from several places. At one of them, in
    the function __sigp_stop, the spin lock &fi->lock is held.

    The semantic patch that makes this change is as follows:
    (http://coccinelle.lip6.fr/)

    //
    @gfp exists@
    identifier fn;
    position p;
    @@

    fn(...) {
    ... when != spin_unlock
    when any
    GFP_KERNEL@p
    ... when any
    }

    @locked@
    identifier gfp.fn;
    @@

    spin_lock(...)
    ... when != spin_unlock
    fn(...)

    @depends on locked@
    position gfp.p;
    @@

    - GFP_KERNEL@p
    + GFP_ATOMIC
    //

    Signed-off-by: Julia Lawall
    Acked-by: Christian Borntraeger
    Signed-off-by: Martin Schwidefsky

    Julia Lawall
     
  • Add missing GFP flag to memory allocations. The part in cio only
    changes a comment.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

27 May, 2010

1 commit

  • This config option enables or disables three single instructions
    which aren't expensive. This is too fine grained.
    Besided that everybody who uses kvm would enable it anyway in order
    to debug performance problems.
    Just remove it.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

22 May, 2010

1 commit

  • * 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (269 commits)
    KVM: x86: Add missing locking to arch specific vcpu ioctls
    KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls
    KVM: MMU: Segregate shadow pages with different cr0.wp
    KVM: x86: Check LMA bit before set_efer
    KVM: Don't allow lmsw to clear cr0.pe
    KVM: Add cpuid.txt file
    KVM: x86: Tell the guest we'll warn it about tsc stability
    x86, paravirt: don't compute pvclock adjustments if we trust the tsc
    x86: KVM guest: Try using new kvm clock msrs
    KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
    KVM: x86: add new KVMCLOCK cpuid feature
    KVM: x86: change msr numbers for kvmclock
    x86, paravirt: Add a global synchronization point for pvclock
    x86, paravirt: Enable pvclock flags in vcpu_time_info structure
    KVM: x86: Inject #GP with the right rip on efer writes
    KVM: SVM: Don't allow nested guest to VMMCALL into host
    KVM: x86: Fix exception reinjection forced to true
    KVM: Fix wallclock version writing race
    KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots
    KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)
    ...

    Linus Torvalds
     

19 May, 2010

1 commit


17 May, 2010

3 commits

  • The RCU/SRCU API have already changed for proving RCU usage.

    I got the following dmesg when PROVE_RCU=y because we used incorrect API.
    This patch coverts rcu_deference() to srcu_dereference() or family API.

    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    2 locks held by qemu-system-x86/8550:
    #0: (&kvm->slots_lock){+.+.+.}, at: [] kvm_set_memory_region+0x29/0x50 [kvm]
    #1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm]

    stack backtrace:
    Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27
    Call Trace:
    [] lockdep_rcu_dereference+0xaa/0xb3
    [] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm]
    [] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm]
    [] __kvm_set_memory_region+0x636/0x6e2 [kvm]
    [] kvm_set_memory_region+0x37/0x50 [kvm]
    [] vmx_set_tss_addr+0x46/0x5a [kvm_intel]
    [] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm]
    [] ? unlock_page+0x27/0x2c
    [] ? __do_fault+0x3a9/0x3e1
    [] kvm_vm_ioctl+0x364/0x38d [kvm]
    [] ? up_read+0x23/0x3d
    [] vfs_ioctl+0x32/0xa6
    [] do_vfs_ioctl+0x495/0x4db
    [] ? fget_light+0xc2/0x241
    [] ? do_sys_open+0x104/0x116
    [] ? retint_swapgs+0xe/0x13
    [] sys_ioctl+0x47/0x6a
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Avi Kivity

    Lai Jiangshan
     
  • This patch fixed possible memory leak in kvm_arch_vcpu_create()
    under s390, which would happen when kvm_arch_vcpu_create() fails.

    Signed-off-by: Wei Yongjun
    Acked-by: Carsten Otte
    Cc: stable@kernel.org
    Signed-off-by: Avi Kivity

    Wei Yongjun
     
  • Use the SPP instruction to set a tag on entry to / exit of the virtual
    machine context. This allows the cpu measurement facility to distinguish
    the samples from the host and the different guests.

    Signed-off-by: Carsten Otte

    Carsten Otte
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

06 Mar, 2010

1 commit

  • * 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (145 commits)
    KVM: x86: Add KVM_CAP_X86_ROBUST_SINGLESTEP
    KVM: VMX: Update instruction length on intercepted BP
    KVM: Fix emulate_sys[call, enter, exit]()'s fault handling
    KVM: Fix segment descriptor loading
    KVM: Fix load_guest_segment_descriptor() to inject page fault
    KVM: x86 emulator: Forbid modifying CS segment register by mov instruction
    KVM: Convert kvm->requests_lock to raw_spinlock_t
    KVM: Convert i8254/i8259 locks to raw_spinlocks
    KVM: x86 emulator: disallow opcode 82 in 64-bit mode
    KVM: x86 emulator: code style cleanup
    KVM: Plan obsolescence of kernel allocated slots, paravirt mmu
    KVM: x86 emulator: Add LOCK prefix validity checking
    KVM: x86 emulator: Check CPL level during privilege instruction emulation
    KVM: x86 emulator: Fix popf emulation
    KVM: x86 emulator: Check IOPL level during io instruction emulation
    KVM: x86 emulator: fix memory access during x86 emulation
    KVM: x86 emulator: Add Virtual-8086 mode of emulation
    KVM: x86 emulator: Add group9 instruction decoding
    KVM: x86 emulator: Add group8 instruction decoding
    KVM: do not store wqh in irqfd
    ...

    Trivial conflicts in Documentation/feature-removal-schedule.txt

    Linus Torvalds
     

01 Mar, 2010

4 commits


27 Feb, 2010

3 commits

  • Use asm offsets to make sure the offset defines to struct _lowcore and
    its layout don't get out of sync.
    Also add a BUILD_BUG_ON() which checks that the size of the structure
    is sane.
    And while being at it change those sites which use odd casts to access
    the current lowcore. These should use S390_lowcore instead.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • ENOTSUPP is not supposed to leak to userspace so lets just use
    EOPNOTSUPP everywhere.
    Doesn't fix a bug, but makes future reviews easier.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Remove support to be able to dump 31 bit systems with a 64 bit dumper.
    This is mostly useless since no distro ships 31 bit kernels together
    with a 64 bit dumper.
    We also get rid of a bit of hacky code.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

17 Feb, 2010

1 commit


25 Jan, 2010

1 commit

  • kvm_handle_sie_intercept uses a jump table to get the intercept handler
    for a SIE intercept. Static code analysis revealed a potential problem:
    the intercept_funcs jump table was defined to contain (0x48 >> 2) entries,
    but we only checked for code > 0x48 which would cause an off-by-one
    array overflow if code == 0x48.

    Use the compiler and ARRAY_SIZE to automatically set the limits.

    Cc: stable@kernel.org
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Marcelo Tosatti

    Christian Borntraeger
     

15 Jan, 2010

1 commit

  • What it is: vhost net is a character device that can be used to reduce
    the number of system calls involved in virtio networking.
    Existing virtio net code is used in the guest without modification.

    There's similarity with vringfd, with some differences and reduced scope
    - uses eventfd for signalling
    - structures can be moved around in memory at any time (good for
    migration, bug work-arounds in userspace)
    - write logging is supported (good for migration)
    - support memory table and not just an offset (needed for kvm)

    common virtio related code has been put in a separate file vhost.c and
    can be made into a separate module if/when more backends appear. I used
    Rusty's lguest.c as the source for developing this part : this supplied
    me with witty comments I wouldn't be able to write myself.

    What it is not: vhost net is not a bus, and not a generic new system
    call. No assumptions are made on how guest performs hypercalls.
    Userspace hypervisors are supported as well as kvm.

    How it works: Basically, we connect virtio frontend (configured by
    userspace) to a backend. The backend could be a network device, or a tap
    device. Backend is also configured by userspace, including vlan/mac
    etc.

    Status: This works for me, and I haven't see any crashes.
    Compared to userspace, people reported improved latency (as I save up to
    4 system calls per packet), as well as better bandwidth and CPU
    utilization.

    Features that I plan to look at in the future:
    - mergeable buffers
    - zero copy
    - scalability tuning: figure out the best threading model to use

    Note on RCU usage (this is also documented in vhost.h, near
    private_pointer which is the value protected by this variant of RCU):
    what is happening is that the rcu_dereference() is being used in a
    workqueue item. The role of rcu_read_lock() is taken on by the start of
    execution of the workqueue item, of rcu_read_unlock() by the end of
    execution of the workqueue item, and of synchronize_rcu() by
    flush_workqueue()/flush_work(). In the future we might need to apply
    some gcc attribute or sparse annotation to the function passed to
    INIT_WORK(). Paul's ack below is for this RCU usage.

    (Includes fixes by Alan Cox ,
    David L Stevens ,
    Chris Wright )

    Acked-by: Rusty Russell
    Acked-by: Arnd Bergmann
    Acked-by: "Paul E. McKenney"
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     

10 Dec, 2009

1 commit

  • * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (72 commits)
    [S390] 3215/3270 console: remove wrong comment
    [S390] dasd: remove BKL from extended error reporting code
    [S390] vmlogrdr: remove BKL
    [S390] vmur: remove BKL
    [S390] zcrypt: remove BKL
    [S390] 3270: remove BKL
    [S390] vmwatchdog: remove lock_kernel() from open() function
    [S390] monwriter: remove lock_kernel() from open() function
    [S390] monreader: remove lock_kernel() from open() function
    [S390] s390: remove unused nfsd #includes
    [S390] ftrace: build ftrace.o when CONFIG_FTRACE_SYSCALLS is set for s390
    [S390] etr/stp: put correct per cpu variable
    [S390] tty3270: move keyboard compat ioctls
    [S390] sclp: improve servicability setting
    [S390] s390: use change recording override for kernel mapping
    [S390] MAINTAINERS: Add s390 drivers block
    [S390] use generic sockios.h header file
    [S390] use generic termbits.h header file
    [S390] smp: remove unused typedef and defines
    [S390] cmm: free pages on hibernate.
    ...

    Linus Torvalds
     

07 Dec, 2009

1 commit

  • Introduce user_mode to replace the two variables switch_amode and
    s390_noexec. There are three valid combinations of the old values:
    1) switch_amode == 0 && s390_noexec == 0
    2) switch_amode == 1 && s390_noexec == 0
    3) switch_amode == 1 && s390_noexec == 1
    They get replaced by
    1) user_mode == HOME_SPACE_MODE
    2) user_mode == PRIMARY_SPACE_MODE
    3) user_mode == SECONDARY_SPACE_MODE
    The new kernel parameter user_mode=[primary,secondary,home] lets
    you choose the address space mode the user space processes should
    use. In addition the CONFIG_S390_SWITCH_AMODE config option
    is removed.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

03 Dec, 2009

4 commits

  • This patch corrects the checking of the new address for the prefix register.
    On s390, the prefix register is used to address the cpu's lowcore (address
    0...8k). This check is supposed to verify that the memory is readable and
    present.
    copy_from_guest is a helper function, that can be used to read from guest
    memory. It applies prefixing, adds the start address of the guest memory in
    user, and then calls copy_from_user. Previous code was obviously broken for
    two reasons:
    - prefixing should not be applied here. The current prefix register is
    going to be updated soon, and the address we're looking for will be
    0..8k after we've updated the register
    - we're adding the guest origin (gmsor) twice: once in subject code
    and once in copy_from_guest

    With kuli, we did not hit this problem because (a) we were lucky with
    previous prefix register content, and (b) our guest memory was mmaped
    very low into user address space.

    Cc: stable@kernel.org
    Signed-off-by: Carsten Otte
    Reported-by: Alexander Graf
    Signed-off-by: Avi Kivity

    Carsten Otte
     
  • This patch moves s390 processor status word into the base kvm_run
    struct and keeps it up-to date on all userspace exits.

    The userspace ABI is broken by this, however there are no applications
    in the wild using this. A capability check is provided so users can
    verify the updated API exists.

    Cc: stable@kernel.org
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Carsten Otte
     
  • X86 CPUs need to have some magic happening to enable the virtualization
    extensions on them. This magic can result in unpleasant results for
    users, like blocking other VMMs from working (vmx) or using invalid TLB
    entries (svm).

    Currently KVM activates virtualization when the respective kernel module
    is loaded. This blocks us from autoloading KVM modules without breaking
    other VMMs.

    To circumvent this problem at least a bit, this patch introduces on
    demand activation of virtualization. This means, that instead
    virtualization is enabled on creation of the first virtual machine
    and disabled on destruction of the last one.

    So using this, KVM can be easily autoloaded, while keeping other
    hypervisors usable.

    Signed-off-by: Alexander Graf
    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Alexander Graf
     
  • Not the incorrect -EINVAL.

    Signed-off-by: Avi Kivity

    Avi Kivity
     

04 Oct, 2009

1 commit

  • commit 628eb9b8a8f3
    KVM: s390: streamline memslot handling

    introduced kvm_s390_vcpu_get_memsize. This broke guests >=4G, since this
    function returned an int.

    This patch changes the return value to a long.

    Signed-off-by: Christian Borntraeger
    Signed-off-by: Avi Kivity

    Christian Borntraeger
     

21 Sep, 2009

1 commit


10 Sep, 2009

6 commits

  • Remove kvm_cpu_has_interrupt() and kvm_arch_interrupt_allowed() from
    interface between general code and arch code. kvm_arch_vcpu_runnable()
    checks for interrupts instead.

    Signed-off-by: Gleb Natapov
    Signed-off-by: Avi Kivity

    Gleb Natapov
     
  • Return EOPNOTSUPP for KVM_TRACE_ENABLE/PAUSE/DISABLE ioctls.

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     
  • [christian: remove unused variables on s390]

    Signed-off-by: Gleb Natapov
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Avi Kivity

    Gleb Natapov
     
  • This patch relocates the variables kvm-s390 uses to track guest mem addr/size.
    As discussed dropping the variables at struct kvm_arch level allows to use the
    common vcpu->request based mechanism to reload guest memory if e.g. changes
    via set_memory_region.

    The kick mechanism introduced in this series is used to ensure running vcpus
    leave guest state to catch the update.

    Signed-off-by: Christian Ehrhardt
    Signed-off-by: Avi Kivity

    Christian Ehrhardt
     
  • If signal pending is true we exit without updating kvm_run, userspace
    currently just does nothing and jumps to kvm_run again.
    Since we did not set an exit_reason we might end up with a random one
    (whatever was the last exit). Therefore it was possible to e.g. jump to
    the psw position the last real interruption set.
    Setting the INTR exit reason ensures that no old psw data is swapped
    in on reentry.

    Signed-off-by: Christian Ehrhardt
    Signed-off-by: Avi Kivity

    Christian Ehrhardt
     
  • To ensure vcpu's come out of guest context in certain cases this patch adds a
    s390 specific way to kick them out of guest context. Currently it kicks them
    out to rerun the vcpu_run path in the s390 code, but the mechanism itself is
    expandable and with a new flag we could also add e.g. kicks to userspace etc.

    Signed-off-by: Christian Ehrhardt
    Signed-off-by: Avi Kivity

    Christian Ehrhardt