27 Apr, 2008

14 commits

  • So userspace can save/restore the mpstate during migration.

    [avi: export the #define constants describing the value]
    [christian: add s390 stubs]
    [avi: ditto for ia64]

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     
  • Timers that fire between guest hlt and vcpu_block's add_wait_queue() are
    ignored, possibly resulting in hangs.

    Also make sure that atomic_inc and waitqueue_active tests happen in the
    specified order, otherwise the following race is open:

    CPU0 CPU1
    if (waitqueue_active(wq))
    add_wait_queue()
    if (!atomic_read(pit_timer->pending))
    schedule()
    atomic_inc(pit_timer->pending)

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     
  • Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Temporarily rename this function to avoid merge conflicts and/or
    dependencies. This function will be removed as soon as git-s390
    and kvm.git are finally upstream.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Heiko Carstens
     
  • kvm_arch_vcpu_ioctl_run currently incorrectly always returns 0.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Heiko Carstens
     
  • This patch adds functionality to detect if the kernel runs under the KVM
    hypervisor. A macro MACHINE_IS_KVM is exported for device drivers. This
    allows drivers to skip device detection if the systems runs non-virtualized.
    We also define a preferred console to avoid having the ttyS0, which is a line
    mode only console.

    Signed-off-by: Christian Borntraeger
    Acked-by: Martin Schwidefsky
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Carsten Otte
     
  • This patch adds the virtualization submenu and the kvm option to the kernel
    config. It also defines HAVE_KVM for 64bit kernels.

    Acked-by: Martin Schwidefsky
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Christian Borntraeger
     
  • This patch introduces interpretation of some diagnose instruction intercepts.
    Diagnose is our classic architected way of doing a hypercall. This patch
    features the following diagnose codes:
    - vm storage size, that tells the guest about its memory layout
    - time slice end, which is used by the guest to indicate that it waits
    for a lock and thus cannot use up its time slice in a useful way
    - ipl functions, which a guest can use to reset and reboot itself

    In order to implement ipl functions, we also introduce an exit reason that
    causes userspace to perform various resets on the virtual machine. All resets
    are described in the principles of operation book, except KVM_S390_RESET_IPL
    which causes a reboot of the machine.

    Acked-by: Martin Schwidefsky
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Christian Borntraeger
     
  • This patch introduces in-kernel handling of _some_ sigp interprocessor
    signals (similar to ipi).
    kvm_s390_handle_sigp() decodes the sigp instruction and calls individual
    handlers depending on the operation requested:
    - sigp sense tries to retrieve information such as existence or running state
    of the remote cpu
    - sigp emergency sends an external interrupt to the remove cpu
    - sigp stop stops a remove cpu
    - sigp stop store status stops a remote cpu, and stores its entire internal
    state to the cpus lowcore
    - sigp set arch sets the architecture mode of the remote cpu. setting to
    ESAME (s390x 64bit) is accepted, setting to ESA/S390 (s390, 31 or 24 bit) is
    denied, all others are passed to userland
    - sigp set prefix sets the prefix register of a remote cpu

    For implementation of this, the stop intercept indication starts to get reused
    on purpose: a set of action bits defines what to do once a cpu gets stopped:
    ACTION_STOP_ON_STOP really stops the cpu when a stop intercept is recognized
    ACTION_STORE_ON_STOP stores the cpu status to lowcore when a stop intercept is
    recognized

    Acked-by: Martin Schwidefsky
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Carsten Otte
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Christian Borntraeger
     
  • This patch introduces in-kernel handling of some intercepts for privileged
    instructions:

    handle_set_prefix() sets the prefix register of the local cpu
    handle_store_prefix() stores the content of the prefix register to memory
    handle_store_cpu_address() stores the cpu number of the current cpu to memory
    handle_skey() just decrements the instruction address and retries
    handle_stsch() delivers condition code 3 "operation not supported"
    handle_chsc() same here
    handle_stfl() stores the facility list which contains the
    capabilities of the cpu
    handle_stidp() stores cpu type/model/revision and such
    handle_stsi() stores information about the system topology

    Acked-by: Martin Schwidefsky
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Heiko Carstens
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Christian Borntraeger
     
  • This patch contains the s390 interrupt subsystem (similar to in kernel apic)
    including timer interrupts (similar to in-kernel-pit) and enabled wait
    (similar to in kernel hlt).

    In order to achieve that, this patch also introduces intercept handling
    for instruction intercepts, and it implements load control instructions.

    This patch introduces an ioctl KVM_S390_INTERRUPT which is valid for both
    the vm file descriptors and the vcpu file descriptors. In case this ioctl is
    issued against a vm file descriptor, the interrupt is considered floating.
    Floating interrupts may be delivered to any virtual cpu in the configuration.

    The following interrupts are supported:
    SIGP STOP - interprocessor signal that stops a remote cpu
    SIGP SET PREFIX - interprocessor signal that sets the prefix register of a
    (stopped) remote cpu
    INT EMERGENCY - interprocessor interrupt, usually used to signal need_reshed
    and for smp_call_function() in the guest.
    PROGRAM INT - exception during program execution such as page fault, illegal
    instruction and friends
    RESTART - interprocessor signal that starts a stopped cpu
    INT VIRTIO - floating interrupt for virtio signalisation
    INT SERVICE - floating interrupt for signalisations from the system
    service processor

    struct kvm_s390_interrupt, which is submitted as ioctl parameter when injecting
    an interrupt, also carrys parameter data for interrupts along with the interrupt
    type. Interrupts on s390 usually have a state that represents the current
    operation, or identifies which device has caused the interruption on s390.

    kvm_s390_handle_wait() does handle waitpsw in two flavors: in case of a
    disabled wait (that is, disabled for interrupts), we exit to userspace. In case
    of an enabled wait we set up a timer that equals the cpu clock comparator value
    and sleep on a wait queue.

    [christian: change virtio interrupt to 0x2603]

    Acked-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Signed-off-by: Carsten Otte
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Avi Kivity

    Carsten Otte
     
  • This path introduces handling of sie intercepts in three flavors: Intercepts
    are either handled completely in-kernel by kvm_handle_sie_intercept(),
    or passed to userspace with corresponding data in struct kvm_run in case
    kvm_handle_sie_intercept() returns -ENOTSUPP.
    In case of partial execution in kernel with the need of userspace support,
    kvm_handle_sie_intercept() may choose to set up struct kvm_run and return
    -EREMOTE.

    The trivial intercept reasons are handled in this patch:
    handle_noop() just does nothing for intercepts that don't require our support
    at all
    handle_stop() is called when a cpu enters stopped state, and it drops out to
    userland after updating our vcpu state
    handle_validity() faults in the cpu lowcore if needed, or passes the request
    to userland

    Acked-by: Martin Schwidefsky
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Christian Borntraeger
     
  • This patch contains the port of Qumranet's kvm kernel module to IBM zSeries
    (aka s390x, mainframe) architecture. It uses the mainframe's virtualization
    instruction SIE to run virtual machines with up to 64 virtual CPUs each.
    This port is only usable on 64bit host kernels, and can only run 64bit guest
    kernels. However, running 31bit applications in guest userspace is possible.

    The following source files are introduced by this patch
    arch/s390/kvm/kvm-s390.c similar to arch/x86/kvm/x86.c, this implements all
    arch callbacks for kvm. __vcpu_run calls back into
    sie64a to enter the guest machine context
    arch/s390/kvm/sie64a.S assembler function sie64a, which enters guest
    context via SIE, and switches world before and after that
    include/asm-s390/kvm_host.h contains all vital data structures needed to run
    virtual machines on the mainframe
    include/asm-s390/kvm.h defines kvm_regs and friends for user access to
    guest register content
    arch/s390/kvm/gaccess.h functions similar to uaccess to access guest memory
    arch/s390/kvm/kvm-s390.h header file for kvm-s390 internals, extended by
    later patches

    Acked-by: Martin Schwidefsky
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Heiko Carstens
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Heiko Carstens
     
  • The SIE instruction on s390 uses the 2nd half of the page table page to
    virtualize the storage keys of a guest. This patch offers the s390_enable_sie
    function, which reorganizes the page tables of a single-threaded process to
    reserve space in the page table:
    s390_enable_sie makes sure that the process is single threaded and then uses
    dup_mm to create a new mm with reorganized page tables. The old mm is freed
    and the process has now a page status extended field after every page table.

    Code that wants to exploit pgstes should SELECT CONFIG_PGSTE.

    This patch has a small common code hit, namely making dup_mm non-static.

    Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's
    review feedback. Now we do have the prototype for dup_mm in
    include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now
    call task_lock() to prevent race against ptrace modification of mm_users.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Carsten Otte
    Acked-by: Andrew Morton
    Signed-off-by: Avi Kivity

    Carsten Otte
     

22 Apr, 2008

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    [HWRNG] omap: Minor updates
    [CRYPTO] kconfig: Ordering cleanup
    [CRYPTO] all: Clean up init()/fini()
    [CRYPTO] padlock-aes: Use generic setkey function
    [CRYPTO] aes: Export generic setkey
    [CRYPTO] api: Make the crypto subsystem fully modular
    [CRYPTO] cts: Add CTS mode required for Kerberos AES support
    [CRYPTO] lrw: Replace all adds to big endians variables with be*_add_cpu
    [CRYPTO] tcrypt: Change the XTEA test vectors
    [CRYPTO] tcrypt: Shrink the tcrypt module
    [CRYPTO] tcrypt: Change the usage of the test vectors
    [CRYPTO] api: Constify function pointer tables
    [CRYPTO] aes-x86-32: Remove unused return code
    [CRYPTO] tcrypt: Shrink speed templates
    [CRYPTO] tcrypt: Group common speed templates
    [CRYPTO] sha512: Rename sha512 to sha512_generic
    [CRYPTO] sha384: Hardware acceleration for s390
    [CRYPTO] sha512: Hardware acceleration for s390
    [CRYPTO] s390: Generic sha_update and sha_final
    [CRYPTO] api: Switch to proc_create()

    Linus Torvalds
     

21 Apr, 2008

3 commits


19 Apr, 2008

2 commits

  • None of these files use any of the functionality promised by
    asm/semaphore.h. It's possible that they rely on it dragging in some
    unrelated header file, but I can't build all these files, so we'll have
    fix any build failures as they come up.

    Signed-off-by: Matthew Wilcox

    Matthew Wilcox
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26: (1090 commits)
    [NET]: Fix and allocate less memory for ->priv'less netdevices
    [IPV6]: Fix dangling references on error in fib6_add().
    [NETLABEL]: Fix NULL deref in netlbl_unlabel_staticlist_gen() if ifindex not found
    [PKT_SCHED]: Fix datalen check in tcf_simp_init().
    [INET]: Uninline the __inet_inherit_port call.
    [INET]: Drop the inet_inherit_port() call.
    SCTP: Initialize partial_bytes_acked to 0, when all of the data is acked.
    [netdrvr] forcedeth: internal simplifications; changelog removal
    phylib: factor out get_phy_id from within get_phy_device
    PHY: add BCM5464 support to broadcom PHY driver
    cxgb3: Fix __must_check warning with dev_dbg.
    tc35815: Statistics cleanup
    natsemi: fix MMIO for PPC 44x platforms
    [TIPC]: Cleanup of TIPC reference table code
    [TIPC]: Optimized initialization of TIPC reference table
    [TIPC]: Remove inlining of reference table locking routines
    e1000: convert uint16_t style integers to u16
    ixgb: convert uint16_t style integers to u16
    sb1000.c: make const arrays static
    sb1000.c: stop inlining largish static functions
    ...

    Linus Torvalds
     

18 Apr, 2008

1 commit


17 Apr, 2008

17 commits

  • Semaphores are no longer performance-critical, so a generic C
    implementation is better for maintainability, debuggability and
    extensibility. Thanks to Peter Zijlstra for fixing the lockdep
    warning. Thanks to Harvey Harrison for pointing out that the
    unlikely() was unnecessary.

    Signed-off-by: Matthew Wilcox
    Acked-by: Ingo Molnar

    Matthew Wilcox
     
  • Move the function that prints the segment warning messages found in the
    monreader driver and the dcssblk driver to the extmem base code.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Martin Schwidefsky
     
  • Newer s390 models have a breaking-event-address-recording register.
    Each time an instruction causes a break in the sequential instruction
    execution, the address is saved in that hardware register. On a program
    interrupt the address is copied to the lowcore address 272-279, which
    makes it software accessible.

    This patch changes the program check handler and the stack overflow
    checker to copy the value into the pt_regs argument.
    The oops output is enhanced to show the last known breaking address.
    It might give additional information if the stack trace is corrupted.

    The feature is only available on 64 bit.

    The new oops output looks like:

    [---------snip----------]
    Modules linked in: vmcp sunrpc qeth_l2 dm_mod qeth ccwgroup
    CPU: 2 Not tainted 2.6.24zlive-host #8
    Process modprobe (pid: 4788, task: 00000000bf3d8718, ksp: 00000000b2b0b8e0)
    Krnl PSW : 0704200180000000 000003e000020028 (vmcp_init+0x28/0xe4 [vmcp])
    R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3
    Krnl GPRS: 0000000004000002 000003e000020000 0000000000000000 0000000000000001
    000000000015734c ffffffffffffffff 000003e0000b3b00 0000000000000000
    000003e00007ca30 00000000b5bb5d40 00000000b5bb5800 000003e0000b3b00
    000003e0000a2000 00000000003ecf50 00000000b2b0bd50 00000000b2b0bcb0
    Krnl Code: 000003e000020018: c0c000040ff4 larl %r12,3e0000a2000
    000003e00002001e: e3e0f0000024 stg %r14,0(%r15)
    000003e000020024: a7f40001 brc 15,3e000020026
    >000003e000020028: e310c0100004 lg %r1,16(%r12)
    000003e00002002e: c020000413dc larl %r2,3e0000a27e6
    000003e000020034: c0a00004aee6 larl %r10,3e0000b5e00
    000003e00002003a: a7490001 lghi %r4,1
    000003e00002003e: a75900f0 lghi %r5,240
    Call Trace:
    ([] blocking_notifier_call_chain+0x2c/0x40)
    [] sys_init_module+0x19d8/0x1b08
    [] sysc_noemu+0x10/0x16
    [] 0x2000011cda2
    Last Breaking-Event-Address:
    [] vmcp_init+0x24/0xe4 [vmcp]
    [---------snip----------]

    Signed-off-by: Christian Borntraeger
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Christian Borntraeger
     
  • The current uaccess page table walk code assumes at a few places that
    any access is a user space access. This is not correct if somebody
    has issued a set_fs(KERNEL_DS) in advance.
    Add code which checks which address space we are in and with this make
    sure we access the correct address space. This way we get also rid of
    the dirty
    if (!currrent-mm)
    return -EFAULT;
    hack in futex_atomic_cmpxchg_pt.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • Most noteable part of this commit is the new local header file entry.h
    which contains all the function declarations of functions that get only
    called from asm code or are arch internal. That way we can avoid extern
    declarations in C files.
    This is more or less the same that was done for sparc64.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • This way we get rid of s390's NO_IDLE_HZ and use the generic dynticks
    variant instead. In addition we get high resolution timers for free.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • Remove the program check generating monitor calls and use function
    calls instead. Theres is no real advantage in using monitor calls,
    but they do make debugging harder, because of all the program checks
    it generates.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • Signed-off-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Johannes Weiner
     
  • Signed-off-by: Ursula Braun
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Ursula Braun
     
  • The new function supports setting of permissions for the debugfs files
    created by the debug feature. In addition to that, the function provides
    uid and gid as parameters for future use. Currently only root is allowed
    for uid and gid.

    Signed-off-by: Michael Holzheu
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Michael Holzheu
     
  • Not very helpful when code dies in "init".
    See also http://lkml.org/lkml/2008/3/26/557 .

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • Add get_clock_xt to read an 8 byte clock value using store clock
    extended (STCKE) and use get_clock_xt for sched_clock. STCKE should
    be faster than STCK on newer machines.

    Signed-off-by: Jan Glauber
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Jan Glauber
     
  • If vertical cpu polarization is active then the hypervisor will
    dispatch certain cpus for a longer time than other cpus for maximum
    performance. For example if a guest would have three virtual cpus,
    each of them with a share of 33 percent, then in case of vertical
    cpu polarization all of the processing time would be combined to a
    single cpu which would run all the time, while the other two cpus
    would get nearly no cpu time.

    There are three different types of vertical cpus: high, medium and
    low. Low cpus hardly get any real cpu time, while high cpus get a
    full real cpu. Medium cpus get something in between.

    In order to switch between the two possible modes (default is
    horizontal) a 0 for horizontal polarization or a 1 for vertical
    polarization must be written to the dispatching sysfs attribute:

    /sys/devices/system/cpu/dispatching

    The polarization of each single cpu can be figured out by the
    polarization sysfs attribute of each cpu:

    /sys/devices/system/cpu/cpuX/polarization

    horizontal, vertical:high, vertical:medium, vertical:low or unknown.

    When switching polarization the polarization attribute may contain
    the value unknown until the configuration change is done and the
    kernel has figured out the new polarization of each cpu.

    Note that running a system with different types of vertical cpus may
    result in significant performance regressions. If possible only one
    type of vertical cpus should be used. All other cpus should be
    offlined.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • Add s390 backend so we can give the scheduler some hints about the
    cpu topology.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • Make stfle visible so other code can call this.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • sys_sigreturn and sys_rt_sigreturn don't take any arguments. So luckily
    this resulted only in unneeded instead of incorrect code.
    But still this clearly shows why one should not put extern declarations
    in C files (will be fixed with a larger sparse patch).

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • This is just a port of 83bd01024b1fdfc41d9b758e5669e80fca72df66
    "x86: protect against sigaltstack wraparound".

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Heiko Carstens
     

04 Apr, 2008

1 commit


21 Mar, 2008

1 commit

  • a0c1e9073ef7428a14309cba010633a6cd6719ea "futex: runtime enable pi and
    robust functionality" introduces a test wether futex in atomic stuff
    works or not.
    It does that by writing to address 0 of the kernel address space. This
    will crash on older machines where addressing mode switching is enabled
    but where the mvcos instruction is not available. Page table walking is
    done by hand and therefore the code tries to access current->mm which
    is NULL.
    Therefore add an extra check, so we survive the early test.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens