15 May, 2008

4 commits


07 May, 2008

4 commits

  • Use the existing arch_alloc_page/arch_free_page callbacks to do
    the guest page state transitions between stable and unused.

    Acked-by: Rik van Riel
    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • This removes redundant arch code for generic ptrace requests
    already handled by ptrace_request and compat_ptrace_request.
    It simplifies things to just have the standard entry points,
    and use the generic compat_sys_ptrace.

    Signed-off-by: Roland McGrath
    Signed-off-by: Martin Schwidefsky

    Roland McGrath
     
  • From: Martin Schwidefsky

    This patch fixes a bug with cpu bound guest on kvm-s390. Sometimes it
    was impossible to deliver a signal to a spinning guest. We used
    preemption as a circumvention. The preemption notifiers called
    vcpu_load, which checked for pending signals and triggered a host
    intercept. But even with preemption, a sigkill was not delivered
    immediately.

    This patch changes the low level host interrupt handler to check for the
    SIE instruction, if TIF_WORK is set. In that case we change the
    instruction pointer of the return PSW to rerun the vcpu_run loop. The kvm
    code sees an intercept reason 0 if that happens. This patch adds accounting
    for these types of intercept as well.

    The advantages:
    - works with and without preemption
    - signals are delivered immediately
    - much better host latencies without preemption

    Acked-by: Carsten Otte
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Martin Schwidefsky

    Christian Borntraeger
     
  • On return from syscall or interrupt, we have to check if we return to
    userspace (likely) and if there is work todo (less likely) to decide
    if we handle the work. We can optimize this check: we first check for
    the less likely work case and then check for userspace.

    This patch is also a preparation for an additional patch, that fixes a bug
    in KVM dealing with cpu bound guests.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

04 May, 2008

1 commit

  • This replaces the duplicated arch-specific versions of "sys_pipe()" with
    one unified implementation. This removes almost 250 lines of duplicated
    code.

    It's marked __weak, so that *if* an architecture wants to override the
    default implementation it can do so by simply having its own replacement
    version, since many architectures use alternate calling conventions for
    the 'pipe()' system call for legacy reasons (ie traditional UNIX
    implementations often return the two file descriptors in registers)

    I still haven't changed the cris version even though Linus says the BKL
    isn't needed. The arch maintainer can easily do it if there are really
    no obstacles.

    Signed-off-by: Ulrich Drepper
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     

30 Apr, 2008

17 commits

  • * 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
    [S390] Update default configuration.
    [S390] use generic sys_ptrace
    [S390] Remove self ptrace IEEE_IP hack.
    [S390] Convert to SPARSEMEM & SPARSEMEM_VMEMMAP
    [S390] System z large page support.
    [S390] Convert machine feature detection code to C.
    [S390] vmemmap: use clear_table to initialise page tables.
    [S390] Move stfl to system.h and delete duplicated version.
    [S390] uaccess_mvcos: #ifdef config dependent code.
    [S390] cpu topology: Fix possible deadlock.
    [S390] Add topology_core_siblings to topology.h
    [S390] cio: Make isc handling more robust.
    [S390] remove -traditional
    [S390] Automatically detect added cpus.
    [S390] smp: Fix locking order.
    [S390] Add missing ifndef/define to include/asm-s390/sysinfo.h.
    [S390] Move show_regs to traps.c.
    [S390] cio: Use strict_strtoul() for attributes.

    Linus Torvalds
     
  • TIF_RESTORE_SIGMASK no longer needs to be in the _TIF_WORK_* masks. Those low
    bits are scarce, and are all used up now. Renumber TIF_RESTORE_SIGMASK to
    free one up.

    Signed-off-by: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • After the PT_IEEE_IP hack has been removed s390 can now use
    the common code sys_ptrace function.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • The self referential PT_IEEE_IP ptrace peek & poke calls have been
    broken for that last 6 years. For peek the code always returns 0
    instead of the last ieee fault and for poke the code does nothing.
    Since nobody noticed the code seems to be superfluous. So lets
    remove it.

    Cc: Christoph Hellwig
    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • Convert s390 to SPARSEMEM and SPARSEMEM_VMEMMAP. We do a select
    of SPARSEMEM_VMEMMAP since it is configurable. This is because
    SPARSEMEM without SPARSEMEM_VMEMMAP gives us a hell of broken
    include dependencies that I don't want to fix.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • This adds hugetlbfs support on System z, using both hardware large page
    support if available and software large page emulation on older hardware.
    Shared (large) page tables are implemented in software emulation mode,
    by using page->index of the first tail page from a compound large page
    to store page table information.

    Signed-off-by: Gerald Schaefer
    Signed-off-by: Martin Schwidefsky

    Gerald Schaefer
     
  • From: Heiko Carstens
    From: Carsten Otte

    This lets us use defines for the magic bits in machine flags instead
    of using plain numbers all over the place.
    In addition on newer machines features/facilities are indicated by the
    result of the stfl instruction. So we use these bits instead of trying
    to execute new instructions and check wether we get an exception or
    not.
    Also the mvpg instruction is always available when in zArch mode,
    whereas the idte instruction is only available in zArch mode. This
    results in some minor optimizations.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Carsten Otte
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Always use clear_table to initialise page tables. The overlapping
    memcpy is just a leftover of a previous version that wasn't fully
    converted to clear_table.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • arch/s390/lib/uaccess_mvcos.c:166:
    warning: 'strnlen_user_mvcos' defined but not used
    arch/s390/lib/uaccess_mvcos.c:186:
    warning: 'strncpy_from_user_mvcos' defined but not used

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • When we get a notification that cpu topology changed, we schedule a
    work struct which just calls arch_reinit_sched_domains. This function
    in turn calls get_online_cpus() which results int the lockdep warning
    below.

    After all it turnded out that it's not legal to call get_online_cpus()
    from the context of a multi-threaded work queue.
    It could deadlock this way:

    process 0 (events/cpu-x):
    -> run_workqueue
    -> removes my work_struct from the work queue
    -> calls work_struct->fn
    -> get_online_cpus()
    -> locks on cpu_hotplug.lock since process 1 below is doing cpu hotplug

    process 1:
    -> cpu_down (for cpu-x)
    -> cpu_hotplug_begin (holds cpu_hotplug.lock now)
    -> cpu-x dead
    -> notifier_call_chain with CPU_DEAD
    -> cleanup_workqueue_thread
    -> flush_cpu_workqueue (succeeds)
    -> kthread_stop for events/cpu-x
    -> now kthread_stop waits for my work_struct to complete from within
    process 0. -> dead.

    A single threaded workqueue wouldn't have such problems, however there is
    no such common queue available and it's not worth to create one for the
    very rare calls to arch_reinit_sched_domains.

    So we just create a kernel thread from our work struct which calls
    arch_reinit_sched_domains and are done with it.

    Thanks to Oleg Nesterov and Peter Zijlstra for helping me figuring out
    that this isn't a false positive lockdep warning:

    =======================================================
    [ INFO: possible circular locking dependency detected ]
    2.6.25-03562-g3dc5063-dirty #12
    -------------------------------------------------------
    events/3/14 is trying to acquire lock:
    (&cpu_hotplug.lock){--..}, at: [] get_online_cpus+0x50/0x78

    but task is already holding lock:
    (topology_work){--..}, at: [] run_workqueue+0x106/0x278

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #2 (topology_work){--..}:
    [] __lock_acquire+0x1010/0x111c
    [] lock_acquire+0xc0/0xf8
    [] run_workqueue+0x170/0x278
    [] worker_thread+0x8c/0xf0
    [] kthread+0x68/0xa0
    [] kernel_thread_starter+0x6/0xc
    [] kernel_thread_starter+0x0/0xc

    -> #1 (events){--..}:
    [] __lock_acquire+0x1010/0x111c
    [] lock_acquire+0xc0/0xf8
    [] cleanup_workqueue_thread+0x60/0xa8
    [] workqueue_cpu_callback+0xbc/0x170
    [] notifier_call_chain+0x5c/0xa4
    [] __raw_notifier_call_chain+0x26/0x38
    [] raw_notifier_call_chain+0x2e/0x40
    [] cpu_down+0x228/0x31c
    [] store_online+0x64/0xb8
    [] sysdev_store+0x48/0x58
    [] sysfs_write_file+0x126/0x1c0
    [] vfs_write+0xb0/0x15c
    [] sys_write+0x56/0x88
    [] sys32_write+0x34/0x4c
    [] sysc_noemu+0x10/0x16
    [] 0x77f3f186

    -> #0 (&cpu_hotplug.lock){--..}:
    [] __lock_acquire+0xe20/0x111c
    [] lock_acquire+0xc0/0xf8
    [] mutex_lock_nested+0xd0/0x364
    [] get_online_cpus+0x50/0x78
    [] arch_reinit_sched_domains+0x26/0x58
    [] topology_work_fn+0x26/0x34
    [] run_workqueue+0x176/0x278
    [] worker_thread+0x8c/0xf0
    [] kthread+0x68/0xa0
    [] kernel_thread_starter+0x6/0xc
    [] kernel_thread_starter+0x0/0xc

    other info that might help us debug this:

    2 locks held by events/3/14:
    #0: (events){--..}, at: [] run_workqueue+0x106/0x278
    #1: (topology_work){--..}, at: [] run_workqueue+0x106/0x278

    stack backtrace:
    CPU: 3 Not tainted 2.6.25-03562-g3dc5063-dirty #12
    Process events/3 (pid: 14, task: 000000002fb04038, ksp: 000000002fb0bd70)
    0400000000000000 000000002fb0ba40 0000000000000002 0000000000000000
    000000002fb0bae0 000000002fb0ba58 000000002fb0ba58 0000000000016488
    0000000000000000 000000002fb0bd70 0000000000000000 0000000000000000
    000000002fb0ba40 000000000000000c 000000002fb0ba40 000000002fb0bab0
    00000000003c99e0 0000000000016488 000000002fb0ba40 000000002fb0ba90
    Call Trace:
    ([] show_trace+0x138/0x158)
    [] show_stack+0xc6/0xf8
    [] dump_stack+0xb0/0xc0
    [] print_circular_bug_tail+0xa2/0xb4
    [] __lock_acquire+0xe20/0x111c
    [] lock_acquire+0xc0/0xf8
    [] mutex_lock_nested+0xd0/0x364
    [] get_online_cpus+0x50/0x78
    [] arch_reinit_sched_domains+0x26/0x58
    [] topology_work_fn+0x26/0x34
    [] run_workqueue+0x176/0x278
    [] worker_thread+0x8c/0xf0
    [] kthread+0x68/0xa0
    [] kernel_thread_starter+0x6/0xc
    [] kernel_thread_starter+0x0/0xc
    INFO: lockdep is turned off.

    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • This exposes the core siblings to user space via sysfs.

    Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • Signed-off-by: Mathieu Desnoyers
    CC: Sam Ravnborg
    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Mathieu Desnoyers
     
  • Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • On some smp sysfs store attributes get_online_cpus() may block on
    cpu_hotplug.lock, but we hold already smp_cpu_state_mutex. Since the
    locking order on cpu hotplug via arch_update_cpu_topology is inverse
    this might lead to deadlocks.
    So make sure locking order is always the same.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • This is where it should be and we can get rid of some externs
    and a static inline function.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

29 Apr, 2008

3 commits

  • New version that does not preserve the marker. Arch maintainers indicate
    that the marker functionality is is not needed anymore.

    Note you may simplify the s390 asm-offsets.c code further if you use the
    OFFSET() macro instead of the DEFINE. See kbuild.h

    Signed-off-by: Christoph Lameter
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • s390 has a strange marker in DEFINE. Undefine the DEFINE from kbuild.h and
    define it the way s390 wants it to preserve things as they were.

    May be good if the arch maintainer could go over this and check if this
    workaround is really necessary.

    Signed-off-by: Christoph Lameter
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Add a proper prototype for __do_softirq() in include/linux/interrupt.h

    Signed-off-by: Adrian Bunk
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

27 Apr, 2008

11 commits

  • So userspace can save/restore the mpstate during migration.

    [avi: export the #define constants describing the value]
    [christian: add s390 stubs]
    [avi: ditto for ia64]

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     
  • Timers that fire between guest hlt and vcpu_block's add_wait_queue() are
    ignored, possibly resulting in hangs.

    Also make sure that atomic_inc and waitqueue_active tests happen in the
    specified order, otherwise the following race is open:

    CPU0 CPU1
    if (waitqueue_active(wq))
    add_wait_queue()
    if (!atomic_read(pit_timer->pending))
    schedule()
    atomic_inc(pit_timer->pending)

    Signed-off-by: Marcelo Tosatti
    Signed-off-by: Avi Kivity

    Marcelo Tosatti
     
  • Signed-off-by: Avi Kivity

    Avi Kivity
     
  • Temporarily rename this function to avoid merge conflicts and/or
    dependencies. This function will be removed as soon as git-s390
    and kvm.git are finally upstream.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Heiko Carstens
     
  • kvm_arch_vcpu_ioctl_run currently incorrectly always returns 0.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Heiko Carstens
     
  • This patch adds functionality to detect if the kernel runs under the KVM
    hypervisor. A macro MACHINE_IS_KVM is exported for device drivers. This
    allows drivers to skip device detection if the systems runs non-virtualized.
    We also define a preferred console to avoid having the ttyS0, which is a line
    mode only console.

    Signed-off-by: Christian Borntraeger
    Acked-by: Martin Schwidefsky
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Carsten Otte
     
  • This patch adds the virtualization submenu and the kvm option to the kernel
    config. It also defines HAVE_KVM for 64bit kernels.

    Acked-by: Martin Schwidefsky
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Christian Borntraeger
     
  • This patch introduces interpretation of some diagnose instruction intercepts.
    Diagnose is our classic architected way of doing a hypercall. This patch
    features the following diagnose codes:
    - vm storage size, that tells the guest about its memory layout
    - time slice end, which is used by the guest to indicate that it waits
    for a lock and thus cannot use up its time slice in a useful way
    - ipl functions, which a guest can use to reset and reboot itself

    In order to implement ipl functions, we also introduce an exit reason that
    causes userspace to perform various resets on the virtual machine. All resets
    are described in the principles of operation book, except KVM_S390_RESET_IPL
    which causes a reboot of the machine.

    Acked-by: Martin Schwidefsky
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Christian Borntraeger
     
  • This patch introduces in-kernel handling of _some_ sigp interprocessor
    signals (similar to ipi).
    kvm_s390_handle_sigp() decodes the sigp instruction and calls individual
    handlers depending on the operation requested:
    - sigp sense tries to retrieve information such as existence or running state
    of the remote cpu
    - sigp emergency sends an external interrupt to the remove cpu
    - sigp stop stops a remove cpu
    - sigp stop store status stops a remote cpu, and stores its entire internal
    state to the cpus lowcore
    - sigp set arch sets the architecture mode of the remote cpu. setting to
    ESAME (s390x 64bit) is accepted, setting to ESA/S390 (s390, 31 or 24 bit) is
    denied, all others are passed to userland
    - sigp set prefix sets the prefix register of a remote cpu

    For implementation of this, the stop intercept indication starts to get reused
    on purpose: a set of action bits defines what to do once a cpu gets stopped:
    ACTION_STOP_ON_STOP really stops the cpu when a stop intercept is recognized
    ACTION_STORE_ON_STOP stores the cpu status to lowcore when a stop intercept is
    recognized

    Acked-by: Martin Schwidefsky
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Carsten Otte
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Christian Borntraeger
     
  • This patch introduces in-kernel handling of some intercepts for privileged
    instructions:

    handle_set_prefix() sets the prefix register of the local cpu
    handle_store_prefix() stores the content of the prefix register to memory
    handle_store_cpu_address() stores the cpu number of the current cpu to memory
    handle_skey() just decrements the instruction address and retries
    handle_stsch() delivers condition code 3 "operation not supported"
    handle_chsc() same here
    handle_stfl() stores the facility list which contains the
    capabilities of the cpu
    handle_stidp() stores cpu type/model/revision and such
    handle_stsi() stores information about the system topology

    Acked-by: Martin Schwidefsky
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Heiko Carstens
    Signed-off-by: Carsten Otte
    Signed-off-by: Avi Kivity

    Christian Borntraeger
     
  • This patch contains the s390 interrupt subsystem (similar to in kernel apic)
    including timer interrupts (similar to in-kernel-pit) and enabled wait
    (similar to in kernel hlt).

    In order to achieve that, this patch also introduces intercept handling
    for instruction intercepts, and it implements load control instructions.

    This patch introduces an ioctl KVM_S390_INTERRUPT which is valid for both
    the vm file descriptors and the vcpu file descriptors. In case this ioctl is
    issued against a vm file descriptor, the interrupt is considered floating.
    Floating interrupts may be delivered to any virtual cpu in the configuration.

    The following interrupts are supported:
    SIGP STOP - interprocessor signal that stops a remote cpu
    SIGP SET PREFIX - interprocessor signal that sets the prefix register of a
    (stopped) remote cpu
    INT EMERGENCY - interprocessor interrupt, usually used to signal need_reshed
    and for smp_call_function() in the guest.
    PROGRAM INT - exception during program execution such as page fault, illegal
    instruction and friends
    RESTART - interprocessor signal that starts a stopped cpu
    INT VIRTIO - floating interrupt for virtio signalisation
    INT SERVICE - floating interrupt for signalisations from the system
    service processor

    struct kvm_s390_interrupt, which is submitted as ioctl parameter when injecting
    an interrupt, also carrys parameter data for interrupts along with the interrupt
    type. Interrupts on s390 usually have a state that represents the current
    operation, or identifies which device has caused the interruption on s390.

    kvm_s390_handle_wait() does handle waitpsw in two flavors: in case of a
    disabled wait (that is, disabled for interrupts), we exit to userspace. In case
    of an enabled wait we set up a timer that equals the cpu clock comparator value
    and sleep on a wait queue.

    [christian: change virtio interrupt to 0x2603]

    Acked-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Signed-off-by: Carsten Otte
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Avi Kivity

    Carsten Otte