27 Dec, 2011

4 commits

  • Move the program interruption code and the translation exception identifier
    to the pt_regs structure as 'int_code' and 'int_parm_long' and make the
    first level interrupt handler in entry[64].S store the two values. That
    makes it possible to drop 'prot_addr' and 'trap_no' from the thread_struct
    and to reduce the number of arguments to a lot of functions. Finally
    un-inline do_trap. Overall this saves 5812 bytes in the .text section of
    the 64 bit kernel.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • This patch disables the check for MACHINE_IS_VM when initializing the
    pfault infrastructure. The code checks for successful completion of
    diag 258 anyway, thus it's safe to try initialization on LPAR anyway.
    This is needed to use pfault on kvm

    Signed-off-by: Carsten Otte
    Signed-off-by: Martin Schwidefsky

    Carsten Otte
     
  • The kernel address space of a 64 bit kernel currently uses a three level
    page table and the vmemmap array has a fixed address and a fixed maximum
    size. A three level page table is good enough for systems with less than
    3.8TB of memory, for bigger systems four page table levels need to be
    used. Each page table level costs a bit of performance, use 3 levels for
    normal systems and 4 levels only for the really big systems.
    To avoid bloating sparse.o too much set MAX_PHYSMEM_BITS to 46 for a
    maximum of 64TB of memory.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • commit cc772456ac9b460693492b3a3d89e8c81eda5874
    [S390] fix list corruption in gmap reverse mapping

    added a potential dead lock:

    BUG: sleeping function called from invalid context at mm/page_alloc.c:2260
    in_atomic(): 1, irqs_disabled(): 0, pid: 1108, name: qemu-system-s39
    3 locks held by qemu-system-s39/1108:
    #0: (&kvm->slots_lock){+.+.+.}, at: [] kvm_set_memory_region+0x3a/0x6c [kvm]
    #1: (&mm->mmap_sem){++++++}, at: [] gmap_map_segment+0x9c/0x298
    #2: (&(&mm->page_table_lock)->rlock){+.+.+.}, at: [] gmap_map_segment+0xb4/0x298
    CPU: 0 Not tainted 3.1.3 #45
    Process qemu-system-s39 (pid: 1108, task: 00000004f8b3cb30, ksp: 00000004fd5978d0)
    00000004fd5979a0 00000004fd597920 0000000000000002 0000000000000000
    00000004fd5979c0 00000004fd597938 00000004fd597938 0000000000617e96
    0000000000000000 00000004f8b3cf58 0000000000000000 0000000000000000
    000000000000000d 000000000000000c 00000004fd597988 0000000000000000
    0000000000000000 0000000000100a18 00000004fd597920 00000004fd597960
    Call Trace:
    ([] show_trace+0xee/0x144)
    [] __might_sleep+0x12a/0x158
    [] __alloc_pages_nodemask+0x224/0xadc
    [] gmap_alloc_table+0x46/0x114
    [] gmap_map_segment+0x268/0x298
    [] kvm_arch_commit_memory_region+0x44/0x6c [kvm]
    [] __kvm_set_memory_region+0x3b0/0x4a4 [kvm]
    [] kvm_set_memory_region+0x4c/0x6c [kvm]
    [] kvm_vm_ioctl+0x14a/0x314 [kvm]
    [] do_vfs_ioctl+0x94/0x588
    [] SyS_ioctl+0x94/0xac
    [] sysc_noemu+0x22/0x28
    [] 0x3fffcd5e7ca
    3 locks held by qemu-system-s39/1108:
    #0: (&kvm->slots_lock){+.+.+.}, at: [] kvm_set_memory_region+0x3a/0x6c [kvm]
    #1: (&mm->mmap_sem){++++++}, at: [] gmap_map_segment+0x9c/0x298
    #2: (&(&mm->page_table_lock)->rlock){+.+.+.}, at: [] gmap_map_segment+0xb4/0x298

    Fix this by freeing the lock on the alloc path. This is ok, since the
    gmap table is never freed until we call gmap_free, so the table we are
    walking cannot go.

    Signed-off-by: Christian Borntraeger
    Signed-off-by: Martin Schwidefsky

    Christian Borntraeger
     

14 Nov, 2011

1 commit

  • Ignore completion interrupts if the initial interrupt hasn't been
    received and the addressed task is not running. This case can only
    happen if leftover (pending) completion interrupt gets delivered
    which wasn't removed with the PFAULT CANCEL operation during cpu
    hotplug.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

07 Nov, 2011

1 commit

  • * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
    Revert "tracing: Include module.h in define_trace.h"
    irq: don't put module.h into irq.h for tracking irqgen modules.
    bluetooth: macroize two small inlines to avoid module.h
    ip_vs.h: fix implicit use of module_get/module_put from module.h
    nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
    include: replace linux/module.h with "struct module" wherever possible
    include: convert various register fcns to macros to avoid include chaining
    crypto.h: remove unused crypto_tfm_alg_modname() inline
    uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
    pm_runtime.h: explicitly requires notifier.h
    linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
    miscdevice.h: fix up implicit use of lists and types
    stop_machine.h: fix implicit use of smp.h for smp_processor_id
    of: fix implicit use of errno.h in include/linux/of.h
    of_platform.h: delete needless include
    acpi: remove module.h include from platform/aclinux.h
    miscdevice.h: delete unnecessary inclusion of module.h
    device_cgroup.h: delete needless include
    net: sch_generic remove redundant use of
    net: inet_timewait_sock doesnt need
    ...

    Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
    - drivers/media/dvb/frontends/dibx000_common.c
    - drivers/media/video/{mt9m111.c,ov6650.c}
    - drivers/mfd/ab3550-core.c
    - include/linux/dmaengine.h

    Linus Torvalds
     

03 Nov, 2011

3 commits

  • This avoids duplicating the function in every arch gup_fast.

    Signed-off-by: Andrea Arcangeli
    Cc: Peter Zijlstra
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: Benjamin Herrenschmidt
    Cc: David Gibson
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • s390 didn't return 0 in that case, if it's rolling back the *nr pointer it
    should also return zero to avoid adding pages to the array at the wrong
    offset.

    Signed-off-by: Andrea Arcangeli
    Cc: Peter Zijlstra
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: Benjamin Herrenschmidt
    Cc: David Gibson
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Up to this point the code assumed old refcounting for hugepages (pre-thp).
    This updates the code directly to the thp mapcount tail page refcounting.

    Signed-off-by: Andrea Arcangeli
    Cc: Peter Zijlstra
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: Benjamin Herrenschmidt
    Cc: David Gibson
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

01 Nov, 2011

1 commit

  • Fix several compile errors on s390 caused by splitting module.h.

    Some include additions [e.g. qdio_setup.c, zfcp_qdio.c] are in
    anticipation of pending changes queued for s390 that increase
    the modular use footprint.

    [PG: added additional obvious changes since Heiko's original patch]

    Signed-off-by: Heiko Carstens
    Signed-off-by: Paul Gortmaker

    Heiko Carstens
     

30 Oct, 2011

10 commits


26 Sep, 2011

1 commit


20 Sep, 2011

1 commit

  • 598841ca9919d008b520114d8a4378c4ce4e40a1 ([S390] use gmap address
    spaces for kvm guest images) changed kvm to use a separate address
    space for kvm guests. This address space was switched in __vcpu_run
    In some cases (preemption, page fault) there is the possibility that
    this address space switch is lost.
    The typical symptom was a huge amount of validity intercepts or
    random guest addressing exceptions.
    Fix this by doing the switch in sie_loop and sie_exit and saving the
    address space in the gmap structure itself. Also use the preempt
    notifier.

    Signed-off-by: Christian Borntraeger
    Acked-by: Avi Kivity
    Signed-off-by: Heiko Carstens

    Christian Borntraeger
     

03 Aug, 2011

2 commits

  • With this patch a new S390 shutdown trigger "restart" is added. If under
    z/VM "systerm restart" is entered or under the HMC the "PSW restart" button
    is pressed, the PSW located at 0 (31 bit) or 0x1a0 (64 bit) bit is loaded.
    Now we execute do_restart() that processes the restart action that is
    defined under /sys/firmware/shutdown_actions/on_restart. Currently the
    following actions are possible: reipl (default), stop, vmcmd, dump, and
    dump_reipl.

    Signed-off-by: Michael Holzheu
    Signed-off-by: Heiko Carstens

    Michael Holzheu
     
  • Fix the following compile warning for !CONFIG_PGSTE:

    CC arch/s390/mm/pgtable.o
    arch/s390/mm/pgtable.c: In function ‘page_table_alloc_pgste’:
    arch/s390/mm/pgtable.c:531:1: warning: no return statement in function returning non-void [-Wreturn-type]

    Signed-off-by: Jan Glauber
    Signed-off-by: Heiko Carstens

    Jan Glauber
     

24 Jul, 2011

1 commit

  • Add code that allows KVM to control the virtual memory layout that
    is seen by a guest. The guest address space uses a second page table
    that shares the last level pte-tables with the process page table.
    If a page is unmapped from the process page table it is automatically
    unmapped from the guest page table as well.

    The guest address space mapping starts out empty, KVM can map any
    individual 1MB segments from the process virtual memory to any 1MB
    aligned location in the guest virtual memory. If a target segment in
    the process virtual memory does not exist or is unmapped while a
    guest mapping exists the desired target address is stored as an
    invalid segment table entry in the guest page table.
    The population of the guest page table is fault driven.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

01 Jul, 2011

1 commit

  • The nmi parameter indicated if we could do wakeups from the current
    context, if not, we would set some state and self-IPI and let the
    resulting interrupt do the wakeup.

    For the various event classes:

    - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
    the PMI-tail (ARM etc.)
    - tracepoint: nmi=0; since tracepoint could be from NMI context.
    - software: nmi=[0,1]; some, like the schedule thing cannot
    perform wakeups, and hence need 0.

    As one can see, there is very little nmi=1 usage, and the down-side of
    not using it is that on some platforms some software events can have a
    jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).

    The up-side however is that we can remove the nmi parameter and save a
    bunch of conditionals in fast paths.

    Signed-off-by: Peter Zijlstra
    Cc: Michael Cree
    Cc: Will Deacon
    Cc: Deng-Cheng Zhu
    Cc: Anton Blanchard
    Cc: Eric B Munson
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Cc: David S. Miller
    Cc: Frederic Weisbecker
    Cc: Jason Wessel
    Cc: Don Zickus
    Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

06 Jun, 2011

1 commit

  • Replace the s390 specific rcu page-table freeing code with the
    generic variant. This requires to duplicate the definition for the
    struct mmu_table_batch as s390 does not use the generic tlb flush
    code.

    While we are at it remove the restriction that page table fragments
    can not be reused after a single fragment has been freed with rcu
    and split out allocation and freeing of page tables with pgstes.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

29 May, 2011

2 commits

  • Quite a few functions that get called from the tlb gather code require that
    preemption must be disabled. So disable preemption inside of the called
    functions instead.
    The only drawback is that rcu_table_freelist_finish() doesn't get necessarily
    called on the cpu(s) that filled the free lists. So we may see a delay, until
    we finally see an rcu callback. However over time this shouldn't matter.

    So we get rid of lots of "BUG: using smp_processor_id() in preemptible"
    messages.

    Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • …l/git/tip/linux-2.6-tip

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits)
    perf: Fix SIGIO handling
    perf top: Don't stop if no kernel symtab is found
    perf top: Handle kptr_restrict
    perf top: Remove unused macro
    perf events: initialize fd array to -1 instead of 0
    perf tools: Make sure kptr_restrict warnings fit 80 col terms
    perf tools: Fix build on older systems
    perf symbols: Handle /proc/sys/kernel/kptr_restrict
    perf: Remove duplicate headers
    ftrace: Add internal recursive checks
    tracing: Update btrfs's tracepoints to use u64 interface
    tracing: Add __print_symbolic_u64 to avoid warnings on 32bit machine
    ftrace: Set ops->flag to enabled even on static function tracing
    tracing: Have event with function tracer check error return
    ftrace: Have ftrace_startup() return failure code
    jump_label: Check entries limit in __jump_label_update
    ftrace/recordmcount: Avoid STT_FUNC symbols as base on ARM
    scripts/tags.sh: Add magic for trace-events for etags too
    scripts/tags.sh: Fix ctags for DEFINE_EVENT()
    x86/ftrace: Fix compiler warning in ftrace.c
    ...

    Linus Torvalds
     

27 May, 2011

1 commit


26 May, 2011

7 commits

  • Add ZONE_DMA to 31-bit config again. The performance gain is minimal
    and hardly anybody cares anymore about a 31-bit kernel.
    So add ZONE_DMA again to help with SLAB_CACHE_DMA removal for
    !CONFIG_ZONE_DMA configurations.

    Acked-by: David Rientjes
    Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • s390 arch backend for d065bd81 "mm: retry page fault when blocking on
    disk transfer".

    Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • If e.g. copy_from_user() generates a page fault and the kernel runs
    into an OOM situation the system might lock up.
    If the OOM killer sends a SIG_KILL to the current process it can't
    handle it since it is stuck in a copy_from_user() - page fault loop.

    Fix this by adding the same fix as other architectures have.

    E.g. the x86 variant f86268 "x86/mm: Handle mm_fault_error() in kernel
    space"

    Signed-off-by: Heiko Carstens

    Heiko Carstens
     
  • Merge irq.c and s390_ext.c into irq.c. That way all external interrupt
    related functions are together.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Interrupt sources like pfault, sclp, dasd_diag and virtio all use the
    service signal external interrupt subclass mask in control register 0
    to enable and disable the corresponding interrupt.
    Because no reference counting is implemented each subsystem thinks it
    is the only user of subclass and sets and clears the bit like it wants.
    This leads to case that unloading the dasd diag module under z/VM
    causes both sclp and pfault interrupts to be masked. The result will
    be locked up system sooner or later.
    Fix this by introducing a new way to set (register) and clear
    (unregister) the service signal subclass mask bit in cr0.
    Also convert all drivers.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Always enable the service signal subclass mask bit in cr0, if pfault
    is available. That way we use the normal cpu hotplug way to propagate
    the subclass mask bit in cr0 instead of open coding it.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • The functions probe_kernel_write() and probe_kernel_read() do not modify
    the src pointer. Allow const pointers to be passed in without the need
    of a typecast.

    Acked-by: Mike Frysinger
    Acked-by: Heiko Carstens
    Acked-by: Martin Schwidefsky
    Signed-off-by: Steven Rostedt
    Link: http://lkml.kernel.org/r/1305824936.1465.4.camel@gandalf.stny.rr.com

    Steven Rostedt
     

25 May, 2011

1 commit

  • Fold all the mmu_gather rework patches into one for submission

    Signed-off-by: Peter Zijlstra
    Reported-by: Hugh Dickins
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: Martin Schwidefsky
    Cc: Russell King
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Tony Luck
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

23 May, 2011

2 commits