27 Dec, 2011
4 commits
-
Move the program interruption code and the translation exception identifier
to the pt_regs structure as 'int_code' and 'int_parm_long' and make the
first level interrupt handler in entry[64].S store the two values. That
makes it possible to drop 'prot_addr' and 'trap_no' from the thread_struct
and to reduce the number of arguments to a lot of functions. Finally
un-inline do_trap. Overall this saves 5812 bytes in the .text section of
the 64 bit kernel.Signed-off-by: Martin Schwidefsky
-
This patch disables the check for MACHINE_IS_VM when initializing the
pfault infrastructure. The code checks for successful completion of
diag 258 anyway, thus it's safe to try initialization on LPAR anyway.
This is needed to use pfault on kvmSigned-off-by: Carsten Otte
Signed-off-by: Martin Schwidefsky -
The kernel address space of a 64 bit kernel currently uses a three level
page table and the vmemmap array has a fixed address and a fixed maximum
size. A three level page table is good enough for systems with less than
3.8TB of memory, for bigger systems four page table levels need to be
used. Each page table level costs a bit of performance, use 3 levels for
normal systems and 4 levels only for the really big systems.
To avoid bloating sparse.o too much set MAX_PHYSMEM_BITS to 46 for a
maximum of 64TB of memory.Signed-off-by: Martin Schwidefsky
-
commit cc772456ac9b460693492b3a3d89e8c81eda5874
[S390] fix list corruption in gmap reverse mappingadded a potential dead lock:
BUG: sleeping function called from invalid context at mm/page_alloc.c:2260
in_atomic(): 1, irqs_disabled(): 0, pid: 1108, name: qemu-system-s39
3 locks held by qemu-system-s39/1108:
#0: (&kvm->slots_lock){+.+.+.}, at: [] kvm_set_memory_region+0x3a/0x6c [kvm]
#1: (&mm->mmap_sem){++++++}, at: [] gmap_map_segment+0x9c/0x298
#2: (&(&mm->page_table_lock)->rlock){+.+.+.}, at: [] gmap_map_segment+0xb4/0x298
CPU: 0 Not tainted 3.1.3 #45
Process qemu-system-s39 (pid: 1108, task: 00000004f8b3cb30, ksp: 00000004fd5978d0)
00000004fd5979a0 00000004fd597920 0000000000000002 0000000000000000
00000004fd5979c0 00000004fd597938 00000004fd597938 0000000000617e96
0000000000000000 00000004f8b3cf58 0000000000000000 0000000000000000
000000000000000d 000000000000000c 00000004fd597988 0000000000000000
0000000000000000 0000000000100a18 00000004fd597920 00000004fd597960
Call Trace:
([] show_trace+0xee/0x144)
[] __might_sleep+0x12a/0x158
[] __alloc_pages_nodemask+0x224/0xadc
[] gmap_alloc_table+0x46/0x114
[] gmap_map_segment+0x268/0x298
[] kvm_arch_commit_memory_region+0x44/0x6c [kvm]
[] __kvm_set_memory_region+0x3b0/0x4a4 [kvm]
[] kvm_set_memory_region+0x4c/0x6c [kvm]
[] kvm_vm_ioctl+0x14a/0x314 [kvm]
[] do_vfs_ioctl+0x94/0x588
[] SyS_ioctl+0x94/0xac
[] sysc_noemu+0x22/0x28
[] 0x3fffcd5e7ca
3 locks held by qemu-system-s39/1108:
#0: (&kvm->slots_lock){+.+.+.}, at: [] kvm_set_memory_region+0x3a/0x6c [kvm]
#1: (&mm->mmap_sem){++++++}, at: [] gmap_map_segment+0x9c/0x298
#2: (&(&mm->page_table_lock)->rlock){+.+.+.}, at: [] gmap_map_segment+0xb4/0x298Fix this by freeing the lock on the alloc path. This is ok, since the
gmap table is never freed until we call gmap_free, so the table we are
walking cannot go.Signed-off-by: Christian Borntraeger
Signed-off-by: Martin Schwidefsky
14 Nov, 2011
1 commit
-
Ignore completion interrupts if the initial interrupt hasn't been
received and the addressed task is not running. This case can only
happen if leftover (pending) completion interrupt gets delivered
which wasn't removed with the PFAULT CANCEL operation during cpu
hotplug.Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky
07 Nov, 2011
1 commit
-
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include
net: sch_generic remove redundant use of
net: inet_timewait_sock doesnt need
...Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
03 Nov, 2011
3 commits
-
This avoids duplicating the function in every arch gup_fast.
Signed-off-by: Andrea Arcangeli
Cc: Peter Zijlstra
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Rik van Riel
Cc: Mel Gorman
Cc: KOSAKI Motohiro
Cc: Benjamin Herrenschmidt
Cc: David Gibson
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
s390 didn't return 0 in that case, if it's rolling back the *nr pointer it
should also return zero to avoid adding pages to the array at the wrong
offset.Signed-off-by: Andrea Arcangeli
Cc: Peter Zijlstra
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Rik van Riel
Cc: Mel Gorman
Cc: KOSAKI Motohiro
Cc: Benjamin Herrenschmidt
Cc: David Gibson
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Up to this point the code assumed old refcounting for hugepages (pre-thp).
This updates the code directly to the thp mapcount tail page refcounting.Signed-off-by: Andrea Arcangeli
Cc: Peter Zijlstra
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Rik van Riel
Cc: Mel Gorman
Cc: KOSAKI Motohiro
Cc: Benjamin Herrenschmidt
Cc: David Gibson
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Nov, 2011
1 commit
-
Fix several compile errors on s390 caused by splitting module.h.
Some include additions [e.g. qdio_setup.c, zfcp_qdio.c] are in
anticipation of pending changes queued for s390 that increase
the modular use footprint.[PG: added additional obvious changes since Heiko's original patch]
Signed-off-by: Heiko Carstens
Signed-off-by: Paul Gortmaker
30 Oct, 2011
10 commits
-
Add prototypes and includes for functions used in different modules.
Signed-off-by: Martin Schwidefsky
-
Linux on System z uses a ballooner based on diagnose 0x10. (aka as
collaborative memory management). This patch implements diagnose
0x10 on the guest address space.Signed-off-by: Christian Borntraeger
Signed-off-by: Martin Schwidefsky -
gmap_fault needs to walk the guest page table. However, parts of
that may change if some other thread does munmap. In that case
gmap_unmap_notifier will also unmap the corresponding parts from
the guest page table. We need to take mmap_sem in order to serialize
these operations.
do_exception now calls __gmap_fault with mmap_sem held which does
not get exported to modules. The exported function, which is called
from KVM, now takes mmap_sem.Reported-by: Heiko Carstens
Signed-off-by: Carsten Otte
Signed-off-by: Martin Schwidefsky -
This introduces locking via mm->page_table_lock to protect
the rmap list for guest mappings from being corrupted by concurrent
operations.Signed-off-by: Carsten Otte
Signed-off-by: Martin Schwidefsky -
Fix possible deadlock reported by lockdep:
qemu-system-s39/2963 is trying to acquire lock:
(&mm->mmap_sem){++++++}, at: gmap_alloc_table+0x9c/0x120
but task is already holding lock:
(&mm->mmap_sem){++++++}, at: gmap_map_segment+0xa6/0x27cActually gmap_alloc_table is the only called in gmap_map_segment with
mmap_sem held, thus it's safe to simply remove the inner lock.Signed-off-by: Carsten Otte
Signed-off-by: Martin Schwidefsky -
Split out addressing mode bits from PSW_BASE_BITS, rename PSW_BASE_BITS
to PSW_MASK_BASE, get rid of psw_user32_bits, remove unused function
enabled_wait(), introduce PSW_MASK_USER, and drop PSW_MASK_MERGE macros.
Change psw_kernel_bits / psw_user_bits to contain only the bits that
are always set in the respective mode.Signed-off-by: Martin Schwidefsky
-
An instruction with an address right below the adress limit for the
current addressing mode will wrap. The instruction restart logic in
the protection fault handler and the signal code need to follow the
wrapping rules to find the correct instruction address.Signed-off-by: Martin Schwidefsky
-
This patch provides the architecture specific part of the s390 kdump
support.Signed-off-by: Michael Holzheu
Signed-off-by: Martin Schwidefsky -
Add access function for real memory needed by s390 kdump backend.
Signed-off-by: Michael Holzheu
Signed-off-by: Martin Schwidefsky -
The rcu page table free code uses a couple of bits in the page table
pointer passed to tlb_remove_table to discern the different page table
types. __tlb_remove_table extracts the type with an incorrect mask which
leads to memory leaks. The correct mask is ((FRAG_MASK << 4) | FRAG_MASK).Cc: stable@kernel.org
Signed-off-by: Martin Schwidefsky
26 Sep, 2011
1 commit
-
If gmap_unmap_segment figures that the segment was not mapped in the
first place, it need to up mmap_sem on exit.Cc:
Signed-off-by: Carsten Otte
Signed-off-by: Martin Schwidefsky
20 Sep, 2011
1 commit
-
598841ca9919d008b520114d8a4378c4ce4e40a1 ([S390] use gmap address
spaces for kvm guest images) changed kvm to use a separate address
space for kvm guests. This address space was switched in __vcpu_run
In some cases (preemption, page fault) there is the possibility that
this address space switch is lost.
The typical symptom was a huge amount of validity intercepts or
random guest addressing exceptions.
Fix this by doing the switch in sie_loop and sie_exit and saving the
address space in the gmap structure itself. Also use the preempt
notifier.Signed-off-by: Christian Borntraeger
Acked-by: Avi Kivity
Signed-off-by: Heiko Carstens
03 Aug, 2011
2 commits
-
With this patch a new S390 shutdown trigger "restart" is added. If under
z/VM "systerm restart" is entered or under the HMC the "PSW restart" button
is pressed, the PSW located at 0 (31 bit) or 0x1a0 (64 bit) bit is loaded.
Now we execute do_restart() that processes the restart action that is
defined under /sys/firmware/shutdown_actions/on_restart. Currently the
following actions are possible: reipl (default), stop, vmcmd, dump, and
dump_reipl.Signed-off-by: Michael Holzheu
Signed-off-by: Heiko Carstens -
Fix the following compile warning for !CONFIG_PGSTE:
CC arch/s390/mm/pgtable.o
arch/s390/mm/pgtable.c: In function ‘page_table_alloc_pgste’:
arch/s390/mm/pgtable.c:531:1: warning: no return statement in function returning non-void [-Wreturn-type]Signed-off-by: Jan Glauber
Signed-off-by: Heiko Carstens
24 Jul, 2011
1 commit
-
Add code that allows KVM to control the virtual memory layout that
is seen by a guest. The guest address space uses a second page table
that shares the last level pte-tables with the process page table.
If a page is unmapped from the process page table it is automatically
unmapped from the guest page table as well.The guest address space mapping starts out empty, KVM can map any
individual 1MB segments from the process virtual memory to any 1MB
aligned location in the guest virtual memory. If a target segment in
the process virtual memory does not exist or is unmapped while a
guest mapping exists the desired target address is stored as an
invalid segment table entry in the guest page table.
The population of the guest page table is fault driven.Signed-off-by: Martin Schwidefsky
01 Jul, 2011
1 commit
-
The nmi parameter indicated if we could do wakeups from the current
context, if not, we would set some state and self-IPI and let the
resulting interrupt do the wakeup.For the various event classes:
- hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
the PMI-tail (ARM etc.)
- tracepoint: nmi=0; since tracepoint could be from NMI context.
- software: nmi=[0,1]; some, like the schedule thing cannot
perform wakeups, and hence need 0.As one can see, there is very little nmi=1 usage, and the down-side of
not using it is that on some platforms some software events can have a
jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).The up-side however is that we can remove the nmi parameter and save a
bunch of conditionals in fast paths.Signed-off-by: Peter Zijlstra
Cc: Michael Cree
Cc: Will Deacon
Cc: Deng-Cheng Zhu
Cc: Anton Blanchard
Cc: Eric B Munson
Cc: Heiko Carstens
Cc: Paul Mundt
Cc: David S. Miller
Cc: Frederic Weisbecker
Cc: Jason Wessel
Cc: Don Zickus
Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
Signed-off-by: Ingo Molnar
06 Jun, 2011
1 commit
-
Replace the s390 specific rcu page-table freeing code with the
generic variant. This requires to duplicate the definition for the
struct mmu_table_batch as s390 does not use the generic tlb flush
code.While we are at it remove the restriction that page table fragments
can not be reused after a single fragment has been freed with rcu
and split out allocation and freeing of page tables with pgstes.Signed-off-by: Martin Schwidefsky
29 May, 2011
2 commits
-
Quite a few functions that get called from the tlb gather code require that
preemption must be disabled. So disable preemption inside of the called
functions instead.
The only drawback is that rcu_table_freelist_finish() doesn't get necessarily
called on the cpu(s) that filled the free lists. So we may see a delay, until
we finally see an rcu callback. However over time this shouldn't matter.So we get rid of lots of "BUG: using smp_processor_id() in preemptible"
messages.Signed-off-by: Heiko Carstens
-
…l/git/tip/linux-2.6-tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits)
perf: Fix SIGIO handling
perf top: Don't stop if no kernel symtab is found
perf top: Handle kptr_restrict
perf top: Remove unused macro
perf events: initialize fd array to -1 instead of 0
perf tools: Make sure kptr_restrict warnings fit 80 col terms
perf tools: Fix build on older systems
perf symbols: Handle /proc/sys/kernel/kptr_restrict
perf: Remove duplicate headers
ftrace: Add internal recursive checks
tracing: Update btrfs's tracepoints to use u64 interface
tracing: Add __print_symbolic_u64 to avoid warnings on 32bit machine
ftrace: Set ops->flag to enabled even on static function tracing
tracing: Have event with function tracer check error return
ftrace: Have ftrace_startup() return failure code
jump_label: Check entries limit in __jump_label_update
ftrace/recordmcount: Avoid STT_FUNC symbols as base on ARM
scripts/tags.sh: Add magic for trace-events for etags too
scripts/tags.sh: Fix ctags for DEFINE_EVENT()
x86/ftrace: Fix compiler warning in ftrace.c
...
27 May, 2011
1 commit
-
…rostedt/linux-2.6-trace into perf/urgent
26 May, 2011
7 commits
-
Add ZONE_DMA to 31-bit config again. The performance gain is minimal
and hardly anybody cares anymore about a 31-bit kernel.
So add ZONE_DMA again to help with SLAB_CACHE_DMA removal for
!CONFIG_ZONE_DMA configurations.Acked-by: David Rientjes
Signed-off-by: Heiko Carstens -
s390 arch backend for d065bd81 "mm: retry page fault when blocking on
disk transfer".Signed-off-by: Heiko Carstens
-
If e.g. copy_from_user() generates a page fault and the kernel runs
into an OOM situation the system might lock up.
If the OOM killer sends a SIG_KILL to the current process it can't
handle it since it is stuck in a copy_from_user() - page fault loop.Fix this by adding the same fix as other architectures have.
E.g. the x86 variant f86268 "x86/mm: Handle mm_fault_error() in kernel
space"Signed-off-by: Heiko Carstens
-
Merge irq.c and s390_ext.c into irq.c. That way all external interrupt
related functions are together.Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky -
Interrupt sources like pfault, sclp, dasd_diag and virtio all use the
service signal external interrupt subclass mask in control register 0
to enable and disable the corresponding interrupt.
Because no reference counting is implemented each subsystem thinks it
is the only user of subclass and sets and clears the bit like it wants.
This leads to case that unloading the dasd diag module under z/VM
causes both sclp and pfault interrupts to be masked. The result will
be locked up system sooner or later.
Fix this by introducing a new way to set (register) and clear
(unregister) the service signal subclass mask bit in cr0.
Also convert all drivers.Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky -
Always enable the service signal subclass mask bit in cr0, if pfault
is available. That way we use the normal cpu hotplug way to propagate
the subclass mask bit in cr0 instead of open coding it.Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky -
The functions probe_kernel_write() and probe_kernel_read() do not modify
the src pointer. Allow const pointers to be passed in without the need
of a typecast.Acked-by: Mike Frysinger
Acked-by: Heiko Carstens
Acked-by: Martin Schwidefsky
Signed-off-by: Steven Rostedt
Link: http://lkml.kernel.org/r/1305824936.1465.4.camel@gandalf.stny.rr.com
25 May, 2011
1 commit
-
Fold all the mmu_gather rework patches into one for submission
Signed-off-by: Peter Zijlstra
Reported-by: Hugh Dickins
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: Martin Schwidefsky
Cc: Russell King
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Richard Weinberger
Cc: Tony Luck
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Cc: KOSAKI Motohiro
Cc: Nick Piggin
Cc: Namhyung Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 May, 2011
2 commits
-
Rework the architecture page table functions to access the bits in the
page table extension array (pgste). There are a number of changes:
1) Fix missing pgste update if the attach_count for the mm is -
Small code cleanup.
Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky