Eric Lee / smarc-fsl-linux-kernel

01 Feb, 2017

1 commit

62d7f2123 s390/mm: Fix cmma unused transfer from pgste into pte ... Browse Code »

commit 0d6da872d3e4a60f43c295386d7ff9a4cdcd57e9 upstream.

The last pgtable rework silently disabled the CMMA unused state by
setting a local pte variable (a parameter) instead of propagating it
back into the caller. Fix it.

Fixes: ebde765c0e85 ("s390/mm: uninline ptep_xxx functions from pgtable.h")
Cc: Martin Schwidefsky
Cc: Claudio Imbrenda
Signed-off-by: Christian Borntraeger
Signed-off-by: Martin Schwidefsky
Signed-off-by: Greg Kroah-Hartman

Christian Borntraeger
2017-02-01 15:33:07 +0800

28 Oct, 2016

1 commit

55bea71ed Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux ... Browse Code »

Pull s390 fixes from Martin Schwidefsky:
"A few more s390 patches for 4.9:
- a fix for an overflow in the dasd driver reported by UBSAN
- fix a regression and add hotplug memory to the zone movable again
- add ignore defines for the pkey system calls
- fix the ouput of the merged stack tracer
- replace printk with pr_cont in arch/s390 where appropriate
- remove the arch specific return_address function again
- ignore reserved channel paths at boot time
- add a missing hugetlb_bad_size call to the arch backend"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: fix zone calculation in arch_add_memory()
s390/dumpstack: use pr_cont within show_stack and die
s390/dumpstack: get rid of return_address again
s390/disassambler: use pr_cont where appropriate
s390/dumpstack: use pr_cont where appropriate
s390/dumpstack: restore reliable indicator for call traces
s390/mm: use hugetlb_bad_size()
s390/cio: don't register chpids in reserved state
s390: ignore pkey system calls
s390/dasd: avoid undefined behaviour

Linus Torvalds
2016-10-28 05:16:30 +0800

24 Oct, 2016

1 commit

4a6542945 s390/mm: fix zone calculation in arch_add_memory() ... Browse Code »

Standby (hotplug) memory should be added to ZONE_MOVABLE on s390. After
commit 199071f1 "s390/mm: make arch_add_memory() NUMA aware",
arch_add_memory() used memblock_end_of_DRAM() to find out the end of
ZONE_NORMAL and the beginning of ZONE_MOVABLE. However, commit 7f36e3e5
"memory-hotplug: add hot-added memory ranges to memblock before allocate
node_data for a node." moved the call of memblock_add_node() before
the call of arch_add_memory() in add_memory_resource(), and thus changed
the return value of memblock_end_of_DRAM() when called in
arch_add_memory(). As a result, arch_add_memory() will think that all
memory blocks should be added to ZONE_NORMAL.

Fix this by changing the logic in arch_add_memory() so that it will
manually iterate over all zones of a given node to find out which zone
a memory block should be added to.

Reviewed-by: Heiko Carstens
Signed-off-by: Gerald Schaefer
Signed-off-by: Martin Schwidefsky

Gerald Schaefer
2016-10-24 16:26:17 +0800

19 Oct, 2016

1 commit

c164154f6 mm: replace get_user_pages_unlocked() write/force parameters with gup_flags ... Browse Code »

This removes the 'write' and 'force' use from get_user_pages_unlocked()
and replaces them with 'gup_flags' to make the use of FOLL_FORCE
explicit in callers as use of this flag can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes
Reviewed-by: Jan Kara
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Lorenzo Stoakes
2016-10-19 05:13:37 +0800

17 Oct, 2016

1 commit

b5003b5f0 s390/mm: use hugetlb_bad_size() ... Browse Code »

Update setup_hugepagesz() to call hugetlb_bad_size() when unsupported
hugepage size is found.

Signed-off-by: Shyam Saini
Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky

Shyam Saini
2016-10-17 17:25:26 +0800

05 Oct, 2016

1 commit

e46cae441 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux ... Browse Code »

Pull s390 updates from Martin Schwidefsky:
"The new features and main improvements in this merge for v4.9

- Support for the UBSAN sanitizer

- Set HAVE_EFFICIENT_UNALIGNED_ACCESS, it improves the code in some
places

- Improvements for the in-kernel fpu code, in particular the overhead
for multiple consecutive in kernel fpu users is recuded

- Add a SIMD implementation for the RAID6 gen and xor operations

- Add RAID6 recovery based on the XC instruction

- The PCI DMA flush logic has been improved to increase the speed of
the map / unmap operations

- The time synchronization code has seen some updates

And bug fixes all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits)
s390/con3270: fix insufficient space padding
s390/con3270: fix use of uninitialised data
MAINTAINERS: update DASD maintainer
s390/cio: fix accidental interrupt enabling during resume
s390/dasd: add missing \n to end of dev_err messages
s390/config: Enable config options for Docker
s390/dasd: make query host access interruptible
s390/dasd: fix panic during offline processing
s390/dasd: fix hanging offline processing
s390/pci_dma: improve lazy flush for unmap
s390/pci_dma: split dma_update_trans
s390/pci_dma: improve map_sg
s390/pci_dma: simplify dma address calculation
s390/pci_dma: remove dma address range check
iommu/s390: simplify registration of I/O address translation parameters
s390: migrate exception table users off module.h and onto extable.h
s390: export header for CLP ioctl
s390/vmur: fix irq pointer dereference in int handler
s390/dasd: add missing KOBJ_CHANGE event for unformatted devices
s390: enable UBSAN
...

Linus Torvalds
2016-10-05 05:05:52 +0800

20 Sep, 2016

2 commits

dcc096c54 s390: migrate exception table users off module.h and onto extable.h ... Browse Code »

These files were only including module.h for exception table
related functions. We've now separated that content out into its
own file "extable.h" so now move over to that and avoid all the
extra header content in module.h that we don't really need to compile
these files.

The additions of uaccess.h are to deal with implict includes like:

arch/s390/kernel/traps.c: In function 'do_report_trap':
arch/s390/kernel/traps.c:56:4: error: implicit declaration of function 'extable_fixup' [-Werror=implicit-function-declaration]
arch/s390/kernel/traps.c: In function 'illegal_op':
arch/s390/kernel/traps.c:173:3: error: implicit declaration of function 'get_user' [-Werror=implicit-function-declaration]

Cc: Heiko Carstens
Cc: linux-s390@vger.kernel.org
Signed-off-by: Paul Gortmaker
Signed-off-by: Martin Schwidefsky

Paul Gortmaker
2016-09-20 20:26:38 +0800
84c9ceefe s390/mm/pfault: Convert to hotplug state machine ... Browse Code »

Install the callbacks via the state machine.

Signed-off-by: Sebastian Andrzej Siewior
Cc: linux-s390@vger.kernel.org
Cc: Peter Zijlstra
Cc: Heiko Carstens
Cc: rt@linutronix.de
Cc: Martin Schwidefsky
Link: http://lkml.kernel.org/r/20160906170457.32393-18-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner

Sebastian Andrzej Siewior
2016-09-20 03:44:32 +0800

24 Aug, 2016

3 commits

47e4d851c s390/mm: merge local / non-local IDTE helper ... Browse Code »

Merge the __p[m|u]xdp_idte and __p[m|u]dp_idte_local functions into a
single __p[m|u]dp_idte function with an additional parameter.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2016-08-24 15:23:56 +0800
34eeaf376 s390/mm: merge local / non-local IPTE helper ... Browse Code »

Merge the __ptep_ipte and __ptep_ipte_local functions into a single
__ptep_ipte function with an additional parameter. The __pte_ipte_range
function is still extra as the while loops makes it hard to merge.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2016-08-24 15:23:55 +0800
44b6cc813 s390/mm,kvm: flush gmap address space with IDTE ... Browse Code »

The __tlb_flush_mm() helper uses a global flush if the mm struct
has a gmap structure attached to it. Replace the global flush with
two individual flushes by means of the IDTE instruction if only a
single gmap is attached the the mm.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2016-08-24 15:23:55 +0800

10 Aug, 2016

1 commit

4d81aaa53 s390/pageattr: handle numpages parameter correctly ... Browse Code »

Both set_memory_ro() and set_memory_rw() will modify the page
attributes of at least one page, even if the numpages parameter is
zero.

The author expected that calling these functions with numpages == zero
would never happen. However with the new 444d13ff10fb ("modules: add
ro_after_init support") feature this happens frequently.

Therefore do the right thing and make these two functions return
gracefully if nothing should be done.

Fixes crashes on module load like this one:

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 000003ff80008000 TEID: 000003ff80008407
Fault in home space mode while using kernel ASCE.
AS:0000000000d18007 R3:00000001e6aa4007 S:00000001e6a10800 P:00000001e34ee21d
Oops: 0004 ilc:3 [#1] SMP
Modules linked in: x_tables
CPU: 10 PID: 1 Comm: systemd Not tainted 4.7.0-11895-g3fa9045 #4
Hardware name: IBM 2964 N96 703 (LPAR)
task: 00000001e9118000 task.stack: 00000001e9120000
Krnl PSW : 0704e00180000000 00000000005677f8 (rb_erase+0xf0/0x4d0)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 000003ff80008b20 000003ff80008b20 000003ff80008b70 0000000000b9d608
000003ff80008b20 0000000000000000 00000001e9123e88 000003ff80008950
00000001e485ab40 000003ff00000000 000003ff80008b00 00000001e4858480
0000000100000000 000003ff80008b68 00000000001d5998 00000001e9123c28
Krnl Code: 00000000005677e8: ec1801c3007c cgij %r1,0,8,567b6e
00000000005677ee: e32010100020 cg %r2,16(%r1)
#00000000005677f4: a78401c2 brc 8,567b78
>00000000005677f8: e35010080024 stg %r5,8(%r1)
00000000005677fe: ec5801af007c cgij %r5,0,8,567b5c
0000000000567804: e30050000024 stg %r0,0(%r5)
000000000056780a: ebacf0680004 lmg %r10,%r12,104(%r15)
0000000000567810: 07fe bcr 15,%r14
Call Trace:
([] __this_module+0x0/0xffffffffffffd700 [x_tables])
([] do_init_module+0x12c/0x220)
([] load_module+0x24e2/0x2b10)
([] SyS_finit_module+0xbe/0xd8)
([] system_call+0xd6/0x264)
Last Breaking-Event-Address:
[] rb_erase+0x12/0x4d0
Kernel panic - not syncing: Fatal exception: panic_on_oops

Reported-by: Christian Borntraeger
Reported-and-tested-by: Sebastian Ott
Fixes: e8a97e42dc98 ("s390/pageattr: allow kernel page table splitting")
Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-08-10 16:12:19 +0800

03 Aug, 2016

1 commit

221bb8a46 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM updates from Paolo Bonzini:

- ARM: GICv3 ITS emulation and various fixes. Removal of the
old VGIC implementation.

- s390: support for trapping software breakpoints, nested
virtualization (vSIE), the STHYI opcode, initial extensions
for CPU model support.

- MIPS: support for MIPS64 hosts (32-bit guests only) and lots
of cleanups, preliminary to this and the upcoming support for
hardware virtualization extensions.

- x86: support for execute-only mappings in nested EPT; reduced
vmexit latency for TSC deadline timer (by about 30%) on Intel
hosts; support for more than 255 vCPUs.

- PPC: bugfixes.

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits)
KVM: PPC: Introduce KVM_CAP_PPC_HTM
MIPS: Select HAVE_KVM for MIPS64_R{2,6}
MIPS: KVM: Reset CP0_PageMask during host TLB flush
MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX()
MIPS: KVM: Sign extend MFC0/RDHWR results
MIPS: KVM: Fix 64-bit big endian dynamic translation
MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase
MIPS: KVM: Use 64-bit CP0_EBase when appropriate
MIPS: KVM: Set CP0_Status.KX on MIPS64
MIPS: KVM: Make entry code MIPS64 friendly
MIPS: KVM: Use kmap instead of CKSEG0ADDR()
MIPS: KVM: Use virt_to_phys() to get commpage PFN
MIPS: Fix definition of KSEGX() for 64-bit
KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD
kvm: x86: nVMX: maintain internal copy of current VMCS
KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
KVM: arm64: vgic-its: Simplify MAPI error handling
KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers
KVM: arm64: vgic-its: Turn device_id validation into generic ID validation
...

Linus Torvalds
2016-08-03 04:11:27 +0800

31 Jul, 2016

1 commit

bc29b7ac1 s390/mm: clean up pte/pmd encoding ... Browse Code »

The hugetlbfs ptepmd conversion functions currently assume that the pmd
bit layout is consistent with the pte layout, which is not really true.

The SW read and write bits are encoded as the sequence "wr" in a pte, but
in a pmd it is "rw". The hugetlbfs conversion assumes that the sequence
is identical in both cases, which results in swapped read and write bits
in the pmd. In practice this is not a problem, because those pmd bits are
only relevant for THP pmds and not for hugetlbfs pmds. The hugetlbfs code
works on (fake) ptes, and the converted pte bits are correct.

There is another variation in pte/pmd encoding which affects dirty
prot-none ptes/pmds. In this case, a pmd has both its HW read-only and
invalid bit set, while it is only the invalid bit for a pte. This also has
no effect in practice, but it should better be consistent.

This patch fixes both inconsistencies by changing the SW read/write bit
layout for pmds as well as the PAGE_NONE encoding for ptes. It also makes
the hugetlbfs conversion functions more robust by introducing a
move_set_bit() macro that uses the pte/pmd bit #defines instead of
constant shifts.

Signed-off-by: Gerald Schaefer
Signed-off-by: Martin Schwidefsky

Gerald Schaefer
2016-07-31 17:27:57 +0800

27 Jul, 2016

3 commits

0e06f5c0d Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge updates from Andrew Morton:

- a few misc bits

- ocfs2

- most(?) of MM

* emailed patches from Andrew Morton : (125 commits)
thp: fix comments of __pmd_trans_huge_lock()
cgroup: remove unnecessary 0 check from css_from_id()
cgroup: fix idr leak for the first cgroup root
mm: memcontrol: fix documentation for compound parameter
mm: memcontrol: remove BUG_ON in uncharge_list
mm: fix build warnings in
mm, thp: convert from optimistic swapin collapsing to conservative
mm, thp: fix comment inconsistency for swapin readahead functions
thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
shmem: split huge pages beyond i_size under memory pressure
thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
khugepaged: add support of collapse for tmpfs/shmem pages
shmem: make shmem_inode_info::lock irq-safe
khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
thp: extract khugepaged from mm/huge_memory.c
shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
shmem: add huge pages support
shmem: get_unmapped_area align huge page
shmem: prepare huge= mount option and sysfs knob
mm, rmap: account shmem thp pages
...

Linus Torvalds
2016-07-27 10:55:54 +0800
dcddffd41 mm: do not pass mm_struct into handle_mm_fault ... Browse Code »

We always have vma->vm_mm around.

Link: http://lkml.kernel.org/r/1466021202-61880-8-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-07-27 07:19:19 +0800
015cd867e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux ... Browse Code »

Pull s390 updates from Martin Schwidefsky:
"There are a couple of new things for s390 with this merge request:

- a new scheduling domain "drawer" is added to reflect the unusual
topology found on z13 machines. Performance tests showed up to 8
percent gain with the additional domain.

- the new crc-32 checksum crypto module uses the vector-galois-field
multiply and sum SIMD instruction to speed up crc-32 and crc-32c.

- proper __ro_after_init support, this requires RO_AFTER_INIT_DATA in
the generic vmlinux.lds linker script definitions.

- kcov instrumentation support. A prerequisite for that is the
inline assembly basic block cleanup, which is the reason for the
net/iucv/iucv.c change.

- support for 2GB pages is added to the hugetlbfs backend.

Then there are two removals:

- the oprofile hardware sampling support is dead code and is removed.
The oprofile user space uses the perf interface nowadays.

- the ETR clock synchronization is removed, this has been superseeded
be the STP clock synchronization. And it always has been
"interesting" code..

And the usual bug fixes and cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (82 commits)
s390/pci: Delete an unnecessary check before the function call "pci_dev_put"
s390/smp: clean up a condition
s390/cio/chp : Remove deprecated create_singlethread_workqueue
s390/chsc: improve channel path descriptor determination
s390/chsc: sanitize fmt check for chp_desc determination
s390/cio: make fmt1 channel path descriptor optional
s390/chsc: fix ioctl CHSC_INFO_CU command
s390/cio/device_ops: fix kernel doc
s390/cio: allow to reset channel measurement block
s390/console: Make preferred console handling more consistent
s390/mm: fix gmap tlb flush issues
s390/mm: add support for 2GB hugepages
s390: have unique symbol for __switch_to address
s390/cpuinfo: show maximum thread id
s390/ptrace: clarify bits in the per_struct
s390: stack address vs thread_info
s390: remove pointless load within __switch_to
s390: enable kcov support
s390/cpumf: use basic block for ecctr inline assembly
s390/hypfs: use basic block for diag inline assembly
...

Linus Torvalds
2016-07-27 03:22:51 +0800

13 Jul, 2016

1 commit

f04540298 s390/mm: fix gmap tlb flush issues ... Browse Code »

__tlb_flush_asce() should never be used if multiple asce belong to a mm.

As this function changes mm logic determining if local or global tlb
flushes will be neded, we might end up flushing only the gmap asce on all
CPUs and a follow up mm asce flushes will only flush on the local CPU,
although that asce ran on multiple CPUs.

The missing tlb flushes will provoke strange faults in user space and even
low address protections in user space, crashing the kernel.

Fixes: 1b948d6caec4 ("s390/mm,tlb: optimize TLB flushing for zEC12")
Cc: stable@vger.kernel.org # 3.15+
Reported-by: Sascha Silbe
Acked-by: Martin Schwidefsky
Signed-off-by: David Hildenbrand
Signed-off-by: Martin Schwidefsky

David Hildenbrand
2016-07-13 16:58:01 +0800

06 Jul, 2016

1 commit

d08de8e2d s390/mm: add support for 2GB hugepages ... Browse Code »

This adds support for 2GB hugetlbfs pages on s390.

Reviewed-by: Martin Schwidefsky
Signed-off-by: Gerald Schaefer
Signed-off-by: Martin Schwidefsky

Gerald Schaefer
2016-07-06 14:46:43 +0800

28 Jun, 2016

1 commit

931641c63 s390/mm: use basic block for essa inline assembly ... Browse Code »

Use only simple inline assemblies which consist of a single basic
block if the register asm construct is being used.

Otherwise gcc would generate broken code if the compiler option
--sanitize-coverage=trace-pc would be used.

Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-28 15:32:29 +0800

25 Jun, 2016

1 commit

10d58bf29 s390: get rid of superfluous __GFP_REPEAT ... Browse Code »

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

page_table_alloc then uses the flag for a single page allocation. This
means that this flag has never been actually useful here because it has
always been used only for PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-14-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko
Acked-by: Heiko Carstens
Cc: Martin Schwidefsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2016-06-25 08:23:52 +0800

20 Jun, 2016

18 commits

37d9df98b KVM: s390: backup the currently enabled gmap when scheduled out ... Browse Code »

Nested virtualization will have to enable own gmaps. Current code
would enable the wrong gmap whenever scheduled out and back in,
therefore resulting in the wrong gmap being enabled.

This patch reenables the last enabled gmap, therefore avoiding having to
touch vcpu->arch.gmap when enabling a different gmap.

Acked-by: Christian Borntraeger
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger

David Hildenbrand
2016-06-20 15:55:24 +0800
01f719176 s390/mm: don't fault everything in read-write in gmap_pte_op_fixup() ... Browse Code »

Let's not fault in everything in read-write but limit it to read-only
where possible.

When restricting access rights, we already have the required protection
level in our hands. When reading from guest 2 storage (gmap_read_table),
it is obviously PROT_READ. When shadowing a pte, the required protection
level is given via the guest 2 provided pte.

Based on an initial patch by Martin Schwidefsky.

Acked-by: Martin Schwidefsky
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger

David Hildenbrand
2016-06-20 15:55:20 +0800
5b6c963bc s390/mm: allow to check if a gmap shadow is valid ... Browse Code »

It will be very helpful to have a mechanism to check without any locks
if a given gmap shadow is still valid and matches the given properties.

Acked-by: Martin Schwidefsky
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger

David Hildenbrand
2016-06-20 15:55:16 +0800
4a4944392 s390/mm: remember the int code for the last gmap fault ... Browse Code »

For nested virtualization, we want to know if we are handling a protection
exception, because these can directly be forwarded to the guest without
additional checks.

Acked-by: Martin Schwidefsky
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger

David Hildenbrand
2016-06-20 15:55:08 +0800
717c05554 s390/mm: limit number of real-space gmap shadows ... Browse Code »

We have no known user of real-space designation and only support it to
be architecture compliant.

Gmap shadows with real-space designation are never unshadowed
automatically, as there is nothing to protect for the top level table.

So let's simply limit the number of such shadows to one by removing
existing ones on creation of another one.

Acked-by: Martin Schwidefsky
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger

David Hildenbrand
2016-06-20 15:55:07 +0800
3218f7094 s390/mm: support real-space for gmap shadows ... Browse Code »

We can easily support real-space designation just like EDAT1 and EDAT2.
So guest2 can provide for guest3 an asce with the real-space control being
set.

We simply have to allocate the biggest page table possible and fake all
levels.

There is no protection to consider. If we exceed guest memory, vsie code
will inject an addressing exception (via program intercept). In the future,
we could limit the fake table level to the gmap page table.

As the top level page table can never go away, such gmap shadows will never
get unshadowed, we'll have to come up with another way to limit the number
of kept gmap shadows.

Acked-by: Martin Schwidefsky
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger

David Hildenbrand
2016-06-20 15:55:02 +0800
18b898098 s390/mm: support EDAT2 for gmap shadows ... Browse Code »

If the guest is enabled for EDAT2, we can easily create shadows for
guest2 -> guest3 provided tables that make use of EDAT2.

If guest2 references a 2GB page, this memory looks consecutive for guest2,
but it does not have to be so for us. Therefore we have to create fake
segment and page tables.

This works just like EDAT1 support, so page tables are removed when the
parent table (r3t table entry) is changed.

We don't hve to care about:
- ACCF-Validity Control in RTTE
- Access-Control Bits in RTTE
- Fetch-Protection Bit in RTTE
- Common-Region Bit in RTTE

Just like for EDAT1, all bits might be dropped and there is no guaranteed
that they are active.

Acked-by: Martin Schwidefsky
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger

David Hildenbrand
2016-06-20 15:54:56 +0800
fd8d4e3ab s390/mm: support EDAT1 for gmap shadows ... Browse Code »

If the guest is enabled for EDAT1, we can easily create shadows for
guest2 -> guest3 provided tables that make use of EDAT1.

If guest2 references a 1MB page, this memory looks consecutive for guest2,
but it might not be so for us. Therefore we have to create fake page tables.

We can easily add that to our existing infrastructure. The invalidation
mechanism will make sure that fake page tables are removed when the parent
table (sgt table entry) is changed.

As EDAT1 also introduced protection on all page table levels, we have to
also shadow these correctly.

We don't have to care about:
- ACCF-Validity Control in STE
- Access-Control Bits in STE
- Fetch-Protection Bit in STE
- Common-Segment Bit in STE

As all bits might be dropped and there is no guaranteed that they are
active ("unpredictable whether the CPU uses these bits", "may be used").
Without using EDAT1 in the shadow ourselfes (STE-format control == 0),
simply shadowing these bits would not be enough. They would be ignored.

Please note that we are using the "fake" flag to make this look consistent
with further changes (EDAT2, real-space designation support) and don't let
the shadow functions handle fc=1 stes.

In the future, with huge pages in the host, gmap_shadow_pgt() could simply
try to map a huge host page if "fake" is set to one and indicate via return
value that no lower fake tables / shadow ptes are required.

Acked-by: Martin Schwidefsky
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger

David Hildenbrand
2016-06-20 15:54:51 +0800
5b062bd49 s390/mm: prepare for EDAT1/EDAT2 support in gmap shadow ... Browse Code »

In preparation for EDAT1/EDAT2 support for gmap shadows, we have to store
the requested edat level in the gmap shadow.

The edat level used during shadow translation is a property of the gmap
shadow. Depending on that level, the gmap shadow will look differently for
the same guest tables. We have to store it internally in order to support
it later.

Acked-by: Martin Schwidefsky
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger

David Hildenbrand
2016-06-20 15:54:47 +0800
0f7f84891 s390/mm: fix races on gmap_shadow creation ... Browse Code »

Before any thread is allowed to use a gmap_shadow, it has to be fully
initialized. However, for invalidation to work properly, we have to
register the new gmap_shadow before we protect the parent gmap table.

Because locking is tricky, and we have to avoid duplicate gmaps, let's
introduce an initialized field, that signalizes other threads if that
gmap_shadow can already be used or if they have to retry.

Let's properly return errors using ERR_PTR() instead of simply returning
NULL, so a caller can properly react on the error.

Acked-by: Martin Schwidefsky
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger

David Hildenbrand
2016-06-20 15:54:28 +0800
998f637cc s390/mm: avoid races on region/segment/page table shadowing ... Browse Code »

We have to unlock sg->guest_table_lock in order to call
gmap_protect_rmap(). If we sleep just before that call, another VCPU
might pick up that shadowed page table (while it is not protected yet)
and use it.

In order to avoid these races, we have to introduce a third state -
"origin set but still invalid" for an entry. This way, we can avoid
another thread already using the entry before the table is fully protected.
As soon as everything is set up, we can clear the invalid bit - if we
had no race with the unshadowing code.

Suggested-by: Martin Schwidefsky
Acked-by: Martin Schwidefsky
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger

David Hildenbrand
2016-06-20 15:54:27 +0800
a9d23e71d s390/mm: shadow pages with real guest requested protection ... Browse Code »

We really want to avoid manually handling protection for nested
virtualization. By shadowing pages with the protection the guest asked us
for, the SIE can handle most protection-related actions for us (e.g.
special handling for MVPG) and we can directly forward protection
exceptions to the guest.

PTEs will now always be shadowed with the correct _PAGE_PROTECT flag.
Unshadowing will take care of any guest changes to the parent PTE and
any host changes to the host PTE. If the host PTE doesn't have the
fitting access rights or is not available, we have to fix it up.

Acked-by: Martin Schwidefsky
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger

David Hildenbrand
2016-06-20 15:54:19 +0800
eea3678d4 s390/mm: flush tlb of shadows in all situations ... Browse Code »

For now, the tlb of shadow gmap is only flushed when the parent is removed,
not when it is removed upfront. Therefore other shadow gmaps can reuse the
tables without the tlb getting flushed.

Fix this by simply flushing the tlb
1. Before the shadow tables are removed (analogouos to other unshadow functions)
2. When the gmap is freed and therefore the top level pages are freed.

Acked-by: Martin Schwidefsky
Signed-off-by: David Hildenbrand
Signed-off-by: Christian Borntraeger

David Hildenbrand
2016-06-20 15:54:18 +0800
4be130a08 s390/mm: add shadow gmap support ... Browse Code »

For a nested KVM guest the outer KVM host needs to create shadow
page tables for the nested guest. This patch adds the basic support
to the guest address space (gmap) code.

For each guest address space the inner KVM host creates, the first
outer KVM host needs to create shadow page tables. The address space
is identified by the ASCE loaded into the control register 1 at the
time the inner SIE instruction for the second nested KVM guest is
executed. The outer KVM host creates the shadow tables starting with
the table identified by the ASCE on a on-demand basis. The outer KVM
host will get repeated faults for all the shadow tables needed to
run the second KVM guest.

While a shadow page table for the second KVM guest is active the access
to the origin region, segment and page tables needs to be restricted
for the first KVM guest. For region and segment and page tables the first
KVM guest may read the memory, but write attempt has to lead to an
unshadow. This is done using the page invalid and read-only bits in the
page table of the first KVM guest. If the first guest re-accesses one of
the origin pages of a shadow, it gets a fault and the affected parts of
the shadow page table hierarchy needs to be removed again.

PGSTE tables don't have to be shadowed, as all interpretation assist can't
deal with the invalid bits in the shadow pte being set differently than
the original ones provided by the first KVM guest.

Many bug fixes and improvements by David Hildenbrand.

Reviewed-by: David Hildenbrand
Signed-off-by: Martin Schwidefsky
Signed-off-by: Christian Borntraeger

Martin Schwidefsky
2016-06-20 15:54:04 +0800
6ea427bbb s390/mm: add reference counter to gmap structure ... Browse Code »

Let's use a reference counter mechanism to control the lifetime of
gmap structures. This will be needed for further changes related to
gmap shadows.

Reviewed-by: David Hildenbrand
Signed-off-by: Martin Schwidefsky
Signed-off-by: Christian Borntraeger

Martin Schwidefsky
2016-06-20 15:53:59 +0800
b2d73b2a0 s390/mm: extended gmap pte notifier ... Browse Code »

The current gmap pte notifier forces a pte into to a read-write state.
If the pte is invalidated the gmap notifier is called to inform KVM
that the mapping will go away.

Extend this approach to allow read-write, read-only and no-access
as possible target states and call the pte notifier for any change
to the pte.

This mechanism is used to temporarily set specific access rights for
a pte without doing the heavy work of a true mprotect call.

Reviewed-by: David Hildenbrand
Signed-off-by: Martin Schwidefsky
Signed-off-by: Christian Borntraeger

Martin Schwidefsky
2016-06-20 15:46:49 +0800
8ecb1a59d s390/mm: use RCU for gmap notifier list and the per-mm gmap list ... Browse Code »

The gmap notifier list and the gmap list in the mm_struct change rarely.
Use RCU to optimize the reader of these lists.

Reviewed-by: David Hildenbrand
Signed-off-by: Martin Schwidefsky
Signed-off-by: Christian Borntraeger

Martin Schwidefsky
2016-06-20 15:46:49 +0800
414d3b074 s390/kvm: page table invalidation notifier ... Browse Code »

Pass an address range to the page table invalidation notifier
for KVM. This allows to notify changes that affect a larger
virtual memory area, e.g. for 1MB pages.

Reviewed-by: David Hildenbrand
Signed-off-by: Martin Schwidefsky
Signed-off-by: Christian Borntraeger

Martin Schwidefsky
2016-06-20 15:46:48 +0800

14 Jun, 2016

1 commit

de3fa841e s390/mm: fix compile for PAGE_DEFAULT_KEY != 0 ... Browse Code »

The usual problem for code that is ifdef'ed out is that it doesn't
compile after a while. That's also the case for the storage key
initialisation code, if it would be used (set PAGE_DEFAULT_KEY to
something not zero):

./arch/s390/include/asm/page.h: In function 'storage_key_init_range':
./arch/s390/include/asm/page.h:36:2: error: implicit declaration of function '__storage_key_init_range'

Since the code itself has been useful for debugging purposes several
times, remove the ifdefs and make sure the code gets compiler
coverage. The cost for this is eight bytes.

Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-14 22:54:05 +0800