27 Dec, 2011

29 commits

  • Add s390x specific parts to kdump kernel documentation.

    Signed-off-by: Michael Holzheu
    Signed-off-by: Martin Schwidefsky

    Michael Holzheu
     
  • Currently the vmalloc_start address (or better end of real memory) for s390x
    is obtained by makedumpfile using vmlist.addr symbol, which is not correct.
    The correct vmalloc_start address can be obtained using 'high_memory' symbol.

    This patch adds the high_memory symbol to vmcoreinfo.

    Signed-off-by: Michael Holzheu
    Signed-off-by: Martin Schwidefsky

    Michael Holzheu
     
  • The 'expires' value of a ccw requests defines how long the device
    driver should wait for a response from the evice after the request has
    been submitted to the channel subsystem. After the expiration time
    (e.g. 30 seconds) the waiting request will be cancelled and started
    again. This protects the DASD devices from beeing blocked by errors
    that cause the answering I/O interrupt to be lost.

    In case of error recovery requests, this 'expires' value used to be
    set to 0, so in case of a lost interrupt, such a recovery request
    would never expire and block the device. To prevent this kind of
    problem, all recovery requests need to have an expires value > 0 as
    well. If not specified otherwise, this should be the same expires
    value as for the original request.

    Signed-off-by: Stefan Weinhuber
    Signed-off-by: Martin Schwidefsky

    Stefan Weinhuber
     
  • The panic function will first print the panic message to the console,
    then stop additional cpus with smp_send_stop and finally call the
    function on the panic notifier list.
    In case of an I/O based console the panic message will cause I/O to
    be started and a function on the panic notifier list will wait for the
    completion of the I/O. That does not work if an I/O completion interrupt
    has already been delivered to a cpu that is then stopped by smp_send_stop.
    To break this cyclic dependency add code to smp_send_stop that gives
    the additional cpu the opportunity to complete outstanding interrupts.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • Call generic IPC demultiplexer instead of having a nearly identical
    s390 variant. Also make sure that native and compat handling now have
    the same behaviour.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Return EINVAL instead of EFAULT for invalid input parameter.

    Signed-off-by: Holger Dengler
    Signed-off-by: Martin Schwidefsky

    Holger Dengler
     
  • Fix length checking of the expected reply and remove re-adjustment of
    expected control block length.

    Signed-off-by: Holger Dengler
    Signed-off-by: Martin Schwidefsky

    Holger Dengler
     
  • Move the program interruption code and the translation exception identifier
    to the pt_regs structure as 'int_code' and 'int_parm_long' and make the
    first level interrupt handler in entry[64].S store the two values. That
    makes it possible to drop 'prot_addr' and 'trap_no' from the thread_struct
    and to reduce the number of arguments to a lot of functions. Finally
    un-inline do_trap. Overall this saves 5812 bytes in the .text section of
    the 64 bit kernel.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • Remove last traces of our kerntypes patch which was always an addon
    patch which never got upstream. Somehow a few bits got upstream
    anyway.
    Since kerntypes aren't used anymore and lcrash isn't maintained (for
    s390 at least) remove the last traces of kerntypes that somehow went
    upstream. Also remove the documentation that mentions lcrash.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Increase cpu topology change poll frequency if a change is anticipated.
    Otherwise a user might be a bit confused to have to wait up to a minute
    in order to see a change this should be visible immediatly.
    However there is no guarantee that the change will happen during the
    time frame the poll frequency is increased.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Another round of cleanup for entry[64].S, in particular the program check
    handler looks more reasonable now. The code size for the 31 bit kernel
    has been reduced by 616 byte and by 528 byte for the 64 bit version.
    Even better the code is a bit faster as well.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • Only add subdirectories of arch/s390 to kbuild if their respective
    config option is selected.

    Signed-off-by: Jan Glauber
    Signed-off-by: Martin Schwidefsky

    Jan Glauber
     
  • There is no reason for the cpu-measurement-facility host id constant to
    reside in the lowcore where space is precious. Use an entry in the literal
    pool in HANDLE_SIE_INTERCEPT and a stack slot in sie64a.
    While we are at it replace the id -1 with 0 to indicate host execution.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • Cleanup z10 topology handling. This adds some more code but hopefully
    the result is more readable and easier to maintain.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • This patch disables the check for MACHINE_IS_VM when initializing the
    pfault infrastructure. The code checks for successful completion of
    diag 258 anyway, thus it's safe to try initialization on LPAR anyway.
    This is needed to use pfault on kvm

    Signed-off-by: Carsten Otte
    Signed-off-by: Martin Schwidefsky

    Carsten Otte
     
  • drivers/s390/cio/qdio_setup.c:24:32:
    warning: non-ANSI function declaration of function 'qdio_allocate_aob'

    While at it also simplify the function.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Remove all ifdefs from topology code and also only compile it for the
    CONFIG_SCHED_BOOK case. The new code selects SCHED_MC if SCHED_BOOK is
    selected. SCHED_MC without SCHED_BOOK is not possible anymore.
    Furthermore various sysfs attributes are not available anymore for the
    !SCHED_BOOK case. In particular all attributes that correspond to
    CPU polarization.
    But since all real world kernels have SCHED_BOOK selected anyway this
    doesn't matter too much.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Currently, when smp_switch_to_ipl_cpu() is done, the backchain in the dump
    analysis tool crash looks like the following:

    #0 [1f746e70] __machine_kexec at 11dd92
    #1 [1f746eb8] smp_restart_cpu at 11820e
    #0 [00907eb0] cpu_idle at 10602e
    #1 [00907ef8] start_kernel at 979a08

    It would be good to see the registers of the interrupted function.
    To achieve this, the backchain on the new stack has to be set to zero.
    This looks then like the following:

    #0 [1f746e70] __machine_kexec at 11dd8e
    #1 [1f746eb8] smp_restart_cpu at 11820a
    PSW: 0706000180000000 00000000005c6fe6 (vtime_stop_cpu+134)
    GPRS: 0000000000000000 00000000005c6fe6 0000000001ad0228 0000000001ad0248
    0000000000907f08 0000000001ad0b40 0000000000979344 0000000000000000
    00000000009c0000 00000000009c0010 00000000009ab024 0000000001ad0200
    0000000001ad0238 00000000005cc9d8 000000000010602e 0000000000907e68
    #0 [00907eb0] cpu_idle at 10602e
    #1 [00907ef8] start_kernel at 979a08

    In addition to this, now also the correct PSW is stored in the pt_regs
    structure that is located at the start of the panic stack.

    Signed-off-by: Michael Holzheu
    Signed-off-by: Martin Schwidefsky

    Michael Holzheu
     
  • The kernel address space of a 64 bit kernel currently uses a three level
    page table and the vmemmap array has a fixed address and a fixed maximum
    size. A three level page table is good enough for systems with less than
    3.8TB of memory, for bigger systems four page table levels need to be
    used. Each page table level costs a bit of performance, use 3 levels for
    normal systems and 4 levels only for the really big systems.
    To avoid bloating sparse.o too much set MAX_PHYSMEM_BITS to 46 for a
    maximum of 64TB of memory.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • Signed-off-by: Michael Holzheu
    Signed-off-by: Martin Schwidefsky

    Michael Holzheu
     
  • This patch makes the create_mem_hole() function more readable and
    fixes some minor bugs (e.g. off-by-one problems).

    Signed-off-by: Michael Holzheu
    Signed-off-by: Martin Schwidefsky

    Michael Holzheu
     
  • commit cc772456ac9b460693492b3a3d89e8c81eda5874
    [S390] fix list corruption in gmap reverse mapping

    added a potential dead lock:

    BUG: sleeping function called from invalid context at mm/page_alloc.c:2260
    in_atomic(): 1, irqs_disabled(): 0, pid: 1108, name: qemu-system-s39
    3 locks held by qemu-system-s39/1108:
    #0: (&kvm->slots_lock){+.+.+.}, at: [] kvm_set_memory_region+0x3a/0x6c [kvm]
    #1: (&mm->mmap_sem){++++++}, at: [] gmap_map_segment+0x9c/0x298
    #2: (&(&mm->page_table_lock)->rlock){+.+.+.}, at: [] gmap_map_segment+0xb4/0x298
    CPU: 0 Not tainted 3.1.3 #45
    Process qemu-system-s39 (pid: 1108, task: 00000004f8b3cb30, ksp: 00000004fd5978d0)
    00000004fd5979a0 00000004fd597920 0000000000000002 0000000000000000
    00000004fd5979c0 00000004fd597938 00000004fd597938 0000000000617e96
    0000000000000000 00000004f8b3cf58 0000000000000000 0000000000000000
    000000000000000d 000000000000000c 00000004fd597988 0000000000000000
    0000000000000000 0000000000100a18 00000004fd597920 00000004fd597960
    Call Trace:
    ([] show_trace+0xee/0x144)
    [] __might_sleep+0x12a/0x158
    [] __alloc_pages_nodemask+0x224/0xadc
    [] gmap_alloc_table+0x46/0x114
    [] gmap_map_segment+0x268/0x298
    [] kvm_arch_commit_memory_region+0x44/0x6c [kvm]
    [] __kvm_set_memory_region+0x3b0/0x4a4 [kvm]
    [] kvm_set_memory_region+0x4c/0x6c [kvm]
    [] kvm_vm_ioctl+0x14a/0x314 [kvm]
    [] do_vfs_ioctl+0x94/0x588
    [] SyS_ioctl+0x94/0xac
    [] sysc_noemu+0x22/0x28
    [] 0x3fffcd5e7ca
    3 locks held by qemu-system-s39/1108:
    #0: (&kvm->slots_lock){+.+.+.}, at: [] kvm_set_memory_region+0x3a/0x6c [kvm]
    #1: (&mm->mmap_sem){++++++}, at: [] gmap_map_segment+0x9c/0x298
    #2: (&(&mm->page_table_lock)->rlock){+.+.+.}, at: [] gmap_map_segment+0xb4/0x298

    Fix this by freeing the lock on the alloc path. This is ok, since the
    gmap table is never freed until we call gmap_free, so the table we are
    walking cannot go.

    Signed-off-by: Christian Borntraeger
    Signed-off-by: Martin Schwidefsky

    Christian Borntraeger
     
  • The current code in setup_boot_command_line() uses a heuristic to
    detect an EBCDIC command line. It checks if any of the bytes in
    the command line has bit one (0x80) set. In that case it is assumed
    that we have an EBCDIC string and the complete command line is
    converted.

    On s390 there are cases where the boot loader provides a kernel
    command line that is NULL terminated, but has random data after
    the NULL termination. In that case, setup_boot_command_line()
    might misinterpret an ASCII string for an EBCDIC string. A
    subsequent string conversion can then damage the ASCII string.

    This patch solves the problem by checking for NULL termination.
    If no EBCDIC character has been found until the the NULL
    termination has been found, we now assume that we have an ASCII
    string.

    Signed-off-by: Michael Holzheu
    Signed-off-by: Martin Schwidefsky

    Michael Holzheu
     
  • Mask the extint_code parameter of the smp external interrupt handler
    to get the interruption code. Otherwise emergency call interrupts
    erroneously might be accounted as emergency signal interrupts.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • RC6 fails again.

    > I found my system freeze mostly during starting up X and KDE. Sometimes it
    > works for some minutes, sometimes it freezes immediatly. When the freeze
    > happens, everything is dead (even the reset button does not work, I need to
    > power cycle).

    > I disabled RC6, and my system runs wonderfully.

    > The system is a Z68 Pro board with Sandybridge i5-2500K processor, 8
    > GB of RAM and UEFI firmware.

    Reported-by: Kai Krakow
    Signed-off-by: Keith Packard
    Signed-off-by: Linus Torvalds

    Keith Packard
     
  • Semaphores still cause problems on some machines:

    > From Udo Steinberg:
    >
    > With Linux-3.2-rc6 I'm frequently seeing GPU hangs when large amounts of
    > text scroll in an xterm, such as when extracting a tar archive. Such as this
    > one (note the timestamps):
    >
    > I can reproduce it fairly easily with something
    > as simple as:
    >
    > while true; do dmesg; done

    This patch turns them off on SNB while leaving them on for IVB.

    Reported-by: Udo Steinberg
    Cc: Daniel Vetter
    Cc: Eugeni Dodonov
    Signed-off-by: Keith Packard
    Signed-off-by: Linus Torvalds

    Keith Packard
     
  • * 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: PPC: e500: include linux/export.h
    KVM: PPC: fix kvmppc_start_thread() for CONFIG_SMP=N
    KVM: PPC: protect use of kvmppc_h_pr
    KVM: PPC: move compute_tlbie_rb to book3s_64 common header
    KVM: Don't automatically expose the TSC deadline timer in cpuid
    KVM: Device assignment permission checks
    KVM: Remove ability to assign a device without iommu support
    KVM: x86: Prevent starting PIT timers in the absence of irqchip support

    Linus Torvalds
     
  • post 3.2-rc7 pull request

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
    MAINTAINERS: firewire git URL update

    Linus Torvalds
     
  • Bruce Fields notes that commit 778fc546f749 ("locks: fix tracking of
    inprogress lease breaks") introduced a possible error pointer
    dereference on failure to allocate memory. locks_conflict() will
    dereference the passed-in new lease lock structure that may be an error pointer.

    This means an open (without O_NONBLOCK set) on a file with a lease
    applied (generally only done when Samba or nfsd (with v4) is running)
    could crash if a kmalloc() fails.

    So instead of playing games with IS_ERROR() all over the place, just
    check the allocation failure early. That makes the code more
    straightforward, and avoids this possible bad pointer dereference.

    Based-on-patch-by: J. Bruce Fields
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

26 Dec, 2011

6 commits

  • This is required for THIS_MODULE. We recently stopped acquiring
    it via some other header.

    Signed-off-by: Scott Wood
    Signed-off-by: Alexander Graf

    Scott Wood
     
  • Currently kvmppc_start_thread() tries to wake other SMT threads via
    xics_wake_cpu(). Unfortunately xics_wake_cpu only exists when
    CONFIG_SMP=Y so when compiling with CONFIG_SMP=N we get:

    arch/powerpc/kvm/built-in.o: In function `.kvmppc_start_thread':
    book3s_hv.c:(.text+0xa1e0): undefined reference to `.xics_wake_cpu'

    The following should be fine since kvmppc_start_thread() shouldn't
    called to start non-zero threads when SMP=N since threads_per_core=1.

    Signed-off-by: Michael Neuling
    Signed-off-by: Alexander Graf

    Michael Neuling
     
  • kvmppc_h_pr is only available if CONFIG_KVM_BOOK3S_64_PR.

    Signed-off-by: Andreas Schwab
    Signed-off-by: Alexander Graf

    Andreas Schwab
     
  • compute_tlbie_rb is only used on ppc64 and cannot be compiled on ppc32.

    Signed-off-by: Andreas Schwab
    Signed-off-by: Alexander Graf

    Andreas Schwab
     
  • Unlike all of the other cpuid bits, the TSC deadline timer bit is set
    unconditionally, regardless of what userspace wants.

    This is broken in several ways:
    - if userspace doesn't use KVM_CREATE_IRQCHIP, and doesn't emulate the TSC
    deadline timer feature, a guest that uses the feature will break
    - live migration to older host kernels that don't support the TSC deadline
    timer will cause the feature to be pulled from under the guest's feet;
    breaking it
    - guests that are broken wrt the feature will fail.

    Fix by not enabling the feature automatically; instead report it to userspace.
    Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee
    will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not
    KVM_GET_SUPPORTED_CPUID.

    Fixes the Illumos guest kernel, which uses the TSC deadline timer feature.

    [avi: add the KVM_CAP + documentation]

    Reported-by: Alexey Zaytsev
    Tested-by: Alexey Zaytsev
    Signed-off-by: Jan Kiszka
    Signed-off-by: Avi Kivity

    Jan Kiszka
     
  • Only allow KVM device assignment to attach to devices which:

    - Are not bridges
    - Have BAR resources (assume others are special devices)
    - The user has permissions to use

    Assigning a bridge is a configuration error, it's not supported, and
    typically doesn't result in the behavior the user is expecting anyway.
    Devices without BAR resources are typically chipset components that
    also don't have host drivers. We don't want users to hold such devices
    captive or cause system problems by fencing them off into an iommu
    domain. We determine "permission to use" by testing whether the user
    has access to the PCI sysfs resource files. By default a normal user
    will not have access to these files, so it provides a good indication
    that an administration agent has granted the user access to the device.

    [Yang Bai: add missing #include]
    [avi: fix comment style]

    Signed-off-by: Alex Williamson
    Signed-off-by: Yang Bai
    Signed-off-by: Marcelo Tosatti

    Alex Williamson
     

25 Dec, 2011

4 commits


24 Dec, 2011

1 commit