Eric Lee / smarc-fsl-linux-kernel

13 Jun, 2016

40 commits

a62247e1f topology/sysfs: provide drawer id and siblings attributes ... Browse Code »

The s390 cpu topology gained another hierarchy level. The top level is
now called drawer and contains several books. A book used to be the
top level.

In order to expose the cpu topology to user space allow to create new
sysfs attributes dependent on CONFIG_SCHED_DRAWER which an
architecture may define and select.

These additional attributes will be available:

/sys/devices/system/cpu/cpuX/topology/drawer_id
/sys/devices/system/cpu/cpuX/topology/drawer_siblings
/sys/devices/system/cpu/cpuX/topology/drawer_siblings_list

Signed-off-by: Heiko Carstens
Acked-by: Peter Zijlstra (Intel)
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:27 +0800
0599eead5 s390/ipl: rename diagnose enums ... Browse Code »

Rename DIAG308_IPL and DIAG308_DUMP to DIAG308_LOAD_CLEAR and
DIAG308_LOAD_NORMAL_DUMP to better reflect the associated IPL
functions.

Suggested-by: Cornelia Huck
Suggested-by: Christian Borntraeger
Signed-off-by: Heiko Carstens
Acked-by: Michael Holzheu
Reviewed-by: Peter Oberparleiter
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:26 +0800
0f7451ff3 s390/ipl: use load normal for LPAR re-ipl ... Browse Code »

Avoid clearing memory for CCW-type re-ipl within a logical
partition. This can save a significant amount of time if a logical
partition contains a lot of memory.

On the other hand we still clear memory if running within a second
level hypervisor, since the hypervisor can simply free all memory that
was used for the guest.

Signed-off-by: Heiko Carstens
Acked-by: Christian Borntraeger
Acked-by: Michael Holzheu
Reviewed-by: Peter Oberparleiter
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:26 +0800
6c22c9863 s390: avoid extable collisions ... Browse Code »

We have some inline assemblies where the extable entry points to a
label at the end of an inline assembly which is not followed by an
instruction.

On the other hand we have also inline assemblies where the extable
entry points to the first instruction of an inline assembly.

If a first type inline asm (extable point to empty label at the end)
would be directly followed by a second type inline asm (extable points
to first instruction) then we would have two different extable entries
that point to the same instruction but would have a different target
address.

This can lead to quite random behaviour, depending on sorting order.

I verified that we currently do not have such collisions within the
kernel. However to avoid such subtle bugs add a couple of nop
instructions to those inline assemblies which contain an extable that
points to an empty label.

Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:26 +0800
ee64baf4e s390/uaccess: use __builtin_expect for get_user/put_user ... Browse Code »

We always expect that get_user and put_user return with zero. Give the
compiler a hint so it can slightly optimize the code and avoid
branches.
This is the same what x86 got with commit a76cf66e948a ("x86/uaccess:
Tell the compiler that uaccess is unlikely to fault").

Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:25 +0800
b8ac5e2f4 s390/uaccess: fix whitespace damage ... Browse Code »

Fix some whitespace damage that was introduced by me with a
query-replace when removing 31 bit support.

Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:25 +0800
8ee2db3cf s390/pci: ensure to not cross a dma segment boundary ... Browse Code »

When we use the iommu_area_alloc helper to get dma addresses
we specify the boundary_size parameter but not the offset (called
shift in this context).

As long as the offset (start_dma) is a multiple of the boundary
we're ok (on current machines start_dma always seems to be 4GB).

Don't leave this to chance and specify the offset for iommu_area_alloc.

Signed-off-by: Sebastian Ott
Reviewed-by: Gerald Schaefer
Signed-off-by: Martin Schwidefsky

Sebastian Ott
2016-06-13 21:58:24 +0800
53b1bc9ab s390/pci: ensure page aligned dma start address ... Browse Code »

We don't have an architectural guarantee on the value of
the dma offset but rely on it to be at least page aligned.
Enforce page alignemt of start_dma.

Signed-off-by: Sebastian Ott
Reviewed-by: Gerald Schaefer
Signed-off-by: Martin Schwidefsky

Sebastian Ott
2016-06-13 21:58:24 +0800
bb98f396f s390: use SPARSE_IRQ ... Browse Code »

Use dynamically allocated irq descriptors on s390 which allows
us to get rid of the s390 specific config option PCI_NR_MSI and
exploit more MSI interrupts. Also the size of the kernel image
is reduced by 131K (using performance_defconfig).

Signed-off-by: Sebastian Ott
Signed-off-by: Martin Schwidefsky

Sebastian Ott
2016-06-13 21:58:24 +0800
1b8b9c81a s390/Documentation: improve sort command for trace buffer ... Browse Code »

When s390 traces with hex_ascii or sprintf view are
extracted and sorted, use the sort option -s (stable)
to avoid multiple lines with the same time stamp being
sorted using the rest of the line as secondary key.

Signed-off-by: Thomas Richter
Signed-off-by: Martin Schwidefsky

Thomas Richter
2016-06-13 21:58:23 +0800
72a9b02d3 s390: use __section macro everywhere ... Browse Code »

Small cleanup patch to use the shorter __section macro everywhere.

Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:23 +0800
d07a980c1 s390: add proper __ro_after_init support ... Browse Code »

On s390 __ro_after_init is currently mapped to __read_mostly which
means that data marked as __ro_after_init will not be protected.

Reason for this is that the common code __ro_after_init implementation
is x86 centric: the ro_after_init data section was added to rodata,
since x86 enables write protection to kernel text and rodata very
late. On s390 we have write protection for these sections enabled with
the initial page tables. So adding the ro_after_init data section to
rodata does not work on s390.

In order to make __ro_after_init work properly on s390 move the
ro_after_init data, right behind rodata. Unlike the rodata section it
will be marked read-only later after all init calls happened.

This s390 specific implementation adds new __start_ro_after_init and
__end_ro_after_init labels. Everything in between will be marked
read-only after the init calls happened. In addition to the
__ro_after_init data move also the exception table there, since from a
practical point of view it fits the __ro_after_init requirements.

Signed-off-by: Heiko Carstens
Reviewed-by: Kees Cook
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:23 +0800
32fb2fc5c vmlinux.lds.h: allow arch specific handling of ro_after_init data section ... Browse Code »

commit c74ba8b3480d ("arch: Introduce post-init read-only memory")
introduced the __ro_after_init attribute which allows to add variables
to the ro_after_init data section.

This new section was added to rodata, even though it contains writable
data. This in turn causes problems on architectures which mark the
page table entries read-only that point to rodata very early.

This patch allows architectures to implement an own handling of the
.data..ro_after_init section.
Usually that would be:
- mark the rodata section read-only very early
- mark the ro_after_init section read-only within mark_rodata_ro

Signed-off-by: Heiko Carstens
Reviewed-by: Kees Cook
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:22 +0800
64f31d580 s390/mm: simplify the TLB flushing code ... Browse Code »

ptep_flush_lazy and pmdp_flush_lazy use mm->context.attach_count to
decide between a lazy TLB flush vs an immediate TLB flush. The field
contains two 16-bit counters, the number of CPUs that have the mm
attached and can create TLB entries for it and the number of CPUs in
the middle of a page table update.

The __tlb_flush_asce, ptep_flush_direct and pmdp_flush_direct functions
use the attach counter and a mask check with mm_cpumask(mm) to decide
between a local flush local of the current CPU and a global flush.

For all these functions the decision between lazy vs immediate and
local vs global TLB flush can be based on CPU masks. There are two
masks: the mm->context.cpu_attach_mask with the CPUs that are actively
using the mm, and the mm_cpumask(mm) with the CPUs that have used the
mm since the last full flush. The decision between lazy vs immediate
flush is based on the mm->context.cpu_attach_mask, to decide between
local vs global flush the mm_cpumask(mm) is used.

With this patch all checks will use the CPU masks, the old counter
mm->context.attach_count with its two 16-bit values is turned into a
single counter mm->context.flush_count that keeps track of the number
of CPUs with incomplete page table updates. The sole user of this
counter is finish_arch_post_lock_switch() which waits for the end of
all page table updates.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2016-06-13 21:58:22 +0800
7dd968163 bitmap: bitmap_equal memcmp optimization ... Browse Code »

The bitmap_equal function has optimized code for small bitmaps with less
than BITS_PER_LONG bits. For larger bitmaps the out-of-line function
__bitmap_equal is called.

For a constant number of bits divisible by BITS_PER_LONG the memcmp
function can be used. For s390 gcc knows how to optimize this function,
memcmp calls with up to 256 bytes / 2048 bits are translated into a
single instruction.

Reviewed-by: David Hildenbrand
Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2016-06-13 21:58:21 +0800
a9809407f s390/mm: fix vunmap vs finish_arch_post_lock_switch ... Browse Code »

The vunmap_pte_range() function calls ptep_get_and_clear() without any
locking. ptep_get_and_clear() uses ptep_xchg_lazy()/ptep_flush_direct()
for the page table update. ptep_flush_direct requires that preemption
is disabled, but without any locking this is not the case. If the kernel
preempts the task while the attach_counter is increased an endless loop
in finish_arch_post_lock_switch() will occur the next time the task is
scheduled.

Add explicit preempt_disable()/preempt_enable() calls to the relevant
functions in arch/s390/mm/pgtable.c.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2016-06-13 21:58:21 +0800
fd5ada040 s390/time: remove ETR support ... Browse Code »

The External-Time-Reference (ETR) clock synchronization interface has
been superseded by Server-Time-Protocol (STP). Remove the outdated
ETR interface.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2016-06-13 21:58:21 +0800
936cc855f s390/time: add leap seconds to initial system time ... Browse Code »

The PTFF instruction can be used to retrieve information about UTC
including the current number of leap seconds. Use this value to
convert the coordinated server time value of the TOD clock to a
proper UTC timestamp to initialize the system time. Without this
correction the system time will be off by the number of leap seonds
until it has been corrected via NTP.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2016-06-13 21:58:20 +0800
402778919 s390/time: LPAR offset handling ... Browse Code »

It is possible to specify a user offset for the TOD clock, e.g. +2 hours.
The TOD clock will carry this offset even if the clock is synchronized
with STP. This makes the time stamps acquired with get_sync_clock()
useless as another LPAR migth use a different TOD offset.

Use the PTFF instrution to get the TOD epoch difference and subtract
it from the TOD clock value to get a physical timestamp. As the epoch
difference contains the sync check delta as well the LPAR offset value
to the physical clock needs to be refreshed after each clock
synchronization.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2016-06-13 21:58:20 +0800
9dc06ccf4 s390/time: move PTFF definitions ... Browse Code »

The PTFF instruction is not a function of ETR, rename and move the
PTFF definitions from etr.h to timex.h.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2016-06-13 21:58:20 +0800
2f82f5776 s390/time: STP sync clock correction ... Browse Code »

The sync clock operation of the channel subsystem call for STP delivers
the TOD clock difference as a result. Use this TOD clock difference
instead of the difference between the TOD timestamps before and after
the sync clock operation.

Signed-off-by: Martin Schwidefsky

Martin Schwidefsky
2016-06-13 21:58:19 +0800
4e042af46 s390/kexec: fix crash on resize of reserved memory ... Browse Code »

Reducing the size of reserved memory for the crash kernel will result
in an immediate crash on s390. Reason for that is that we do not
create struct pages for memory that is reserved. If that memory is
freed any access to struct pages which correspond to this memory will
result in invalid memory accesses and a kernel panic.

Fix this by properly creating struct pages when the system gets
initialized. Change the code also to make use of set_memory_ro() and
set_memory_rw() so page tables will be split if required.

Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:19 +0800
2d0af2247 s390/kexec: fix update of os_info crash kernel size ... Browse Code »

Implement an s390 version of the weak crash_free_reserved_phys_range
function. This allows us to update the size of the reserved crash
kernel memory if it will be resized.

This was previously done with a call to crash_unmap_reserved_pages
from crash_shrink_memory which was removed with ("s390/kexec:
consolidate crash_map/unmap_reserved_pages() and
arch_kexec_protect(unprotect)_crashkres()")

Fixes: 7a0058ec7860 ("s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres()")
Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:19 +0800
0ccb32c98 s390/mm: align swapper_pg_dir to 16k ... Browse Code »

The segment/region table that is part of the kernel image must be
properly aligned to 16k in order to make the crdte inline assembly
work.
Otherwise it will calculate a wrong segment/region table start address
and access incorrect memory locations if the swapper_pg_dir is not
aligned to 16k.

Therefore define BSS_FIRST_SECTIONS in order to put the swapper_pg_dir
at the beginning of the bss section and also align the bss section to
16k just like other architectures did.

Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:18 +0800
4b8fe77ac s390: dump_stack: fill in arch description ... Browse Code »

Lets provide the basic machine information for dump_stack on
s390. This enables the "Hardware name:" line and results in
output like

[...]
Oops: 0004 ilc:2 [#1] SMP
Modules linked in:
CPU: 1 PID: 74 Comm: sh Not tainted 4.5.0+ #205
Hardware name: IBM 2964 NC9 704 (KVM)
[...]

Signed-off-by: Christian Borntraeger
Acked-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky

Christian Borntraeger
2016-06-13 21:58:18 +0800
99ec1112d s390: use canonical include guard style ... Browse Code »

Signed-off-by: Daniel van Gerpen
Acked-by: Peter Oberparleiter
Signed-off-by: Martin Schwidefsky

Daniel van Gerpen
2016-06-13 21:58:17 +0800
097a116c7 s390/cpuinfo: show dynamic and static cpu mhz ... Browse Code »

Show the dynamic and static cpu mhz of each cpu. Since these values
are per cpu this requires a fundamental extension of the format of
/proc/cpuinfo.

Historically we had only a single line per cpu and a summary at the
top of the file. This format is hardly extendible if we want to add
more per cpu information.

Therefore this patch adds per cpu blocks at the end of /proc/cpuinfo:

cpu : 0
cpu Mhz dynamic : 5504
cpu Mhz static : 5504

cpu : 1
cpu Mhz dynamic : 5504
cpu Mhz static : 5504

cpu : 2
cpu Mhz dynamic : 5504
cpu Mhz static : 5504

cpu : 3
cpu Mhz dynamic : 5504
cpu Mhz static : 5504

Right now each block contains only the dynamic and static cpu mhz,
but it can be easily extended like on every other architecture.

This extension is supposed to be compatible with the old format.

Signed-off-by: Heiko Carstens
Acked-by: Sascha Silbe
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:17 +0800
219a21b3b s390/cpuinfo: print cache info and all single cpu lines on first iteration ... Browse Code »

Change the code to print all the current output during the first
iteration. This is a preparation patch for the upcoming per cpu block
extension to /proc/cpuinfo.

Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:17 +0800
ac3141844 s390: add explicit <linux/stringify.h> for jump label ... Browse Code »

Ensure that we always have __stringify().

Signed-off-by: Jason Baron
Signed-off-by: Heiko Carstens
Signed-off-by: Martin Schwidefsky

Jason Baron
2016-06-13 21:58:16 +0800
37cd944c8 s390/pgtable: add mapping statistics ... Browse Code »

Add statistics that show how memory is mapped within the kernel
identity mapping. This is more or less the same like git
commit ce0c0e50f94e ("x86, generic: CPA add statistics about state
of direct mapping v4") for x86.

I also intentionally copied the lower case "k" within DirectMap4k vs
the upper case "M" and "G" within the two other lines. Let's have
consistent inconsistencies across architectures.

The output of /proc/meminfo now contains these additional lines:

DirectMap4k: 2048 kB
DirectMap1M: 3991552 kB
DirectMap2G: 4194304 kB

The implementation on s390 is lockless unlike the x86 version, since I
assume changes to the kernel mapping are a very rare event. Therefore
it really doesn't matter if these statistics could potentially be
inconsistent if read while kernel pages tables are being changed.

Signed-off-by: Heiko Carstens
Acked-by: Martin Schwidefsky
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:16 +0800
bab247ff5 s390/vmem: simplify vmem code for read-only mappings ... Browse Code »

For the kernel identity mapping map everything read-writeable and
subsequently call set_memory_ro() to make the ro section read-only.
This simplifies the code a lot.

Signed-off-by: Heiko Carstens
Acked-by: Martin Schwidefsky
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:16 +0800
e8a97e42d s390/pageattr: allow kernel page table splitting ... Browse Code »

set_memory_ro() and set_memory_rw() currently only work on 4k
mappings, which is good enough for module code aka the vmalloc area.

However we stumbled already twice into the need to make this also work
on larger mappings:
- the ro after init patch set
- the crash kernel resize code

Therefore this patch implements automatic kernel page table splitting
if e.g. set_memory_ro() would be called on parts of a 2G mapping.
This works quite the same as the x86 code, but is much simpler.

In order to make this work and to be architecturally compliant we now
always use the csp, cspg or crdte instructions to replace valid page
table entries. This means that set_memory_ro() and set_memory_rw()
will be much more expensive than before. In order to avoid huge
latencies the code contains a couple of cond_resched() calls.

The current code only splits page tables, but does not merge them if
it would be possible. The reason for this is that currently there is
no real life scenarion where this would really happen. All current use
cases that I know of only change access rights once during the life
time. If that should change we can still implement kernel page table
merging at a later time.

Signed-off-by: Heiko Carstens
Acked-by: Martin Schwidefsky
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:15 +0800
9e20b4dac s390/pgtable: make pmd and pud helper functions available ... Browse Code »

Make pmd_wrprotect() and pmd_mkwrite() available independently from
CONFIG_TRANSPARENT_HUGEPAGE and CONFIG_HUGETLB_PAGE so these can be
used on the kernel mapping.

Also introduce a couple of pud helper functions, namely pud_pfn(),
pud_wrprotect(), pud_mkwrite(), pud_mkdirty() and pud_mkclean()
which only work on the kernel mapping.

Signed-off-by: Heiko Carstens
Acked-by: Martin Schwidefsky
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:15 +0800
3e76ee99b s390/mm: always use PAGE_KERNEL when mapping pages ... Browse Code »

Always use PAGE_KERNEL when re-enabling pages within the kernel
mapping due to debug pagealloc. Without using this pgprot value
pte_mkwrite() and pte_wrprotect() won't work on such mappings after an
unmap -> map cycle anymore.

Signed-off-by: Heiko Carstens
Acked-by: Martin Schwidefsky
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:14 +0800
5aa29975e s390/vmem: make use of pte_clear() ... Browse Code »

Use pte_clear() instead of open-coding it.

Signed-off-by: Heiko Carstens
Acked-by: Martin Schwidefsky
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:14 +0800
c126aa83e s390/pgtable: get rid of _REGION3_ENTRY_RO ... Browse Code »

_REGION3_ENTRY_RO is a duplicate of _REGION_ENTRY_PROTECT.

Signed-off-by: Heiko Carstens
Acked-by: Martin Schwidefsky
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:14 +0800
2dffdcbac s390/vmem: introduce and use SEGMENT_KERNEL and REGION3_KERNEL ... Browse Code »

Instead of open-coded SEGMENT_KERNEL and REGION3_KERNEL assignments use
defines. Also to make e.g. pmd_wrprotect() work on the kernel mapping
a couple more flags must be set. Therefore add the missing flags also.

In order to make everything symmetrical this patch also adds software
dirty, young, read and write bits for region 3 table entries.

Signed-off-by: Heiko Carstens
Acked-by: Martin Schwidefsky
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:13 +0800
2e9996fcf s390/vmem: align segment and region tables to 16k ... Browse Code »

Usually segment and region tables are 16k aligned due to the way the
buddy allocator works. This is not true for the vmem code which only
asks for a 4k alignment. In order to be consistent enforce a 16k
alignment here as well.

This alignment will be assumed and therefore is required by the
pageattr code.

Signed-off-by: Heiko Carstens
Acked-by: Christian Borntraeger
Acked-by: Martin Schwidefsky
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:13 +0800
4ccccc522 s390/pgtable: introduce and use generic csp inline asm ... Browse Code »

We have already two inline assemblies which make use of the csp
instruction. Since I need a third instance let's introduce a generic
inline assmebly which can be used by everyone.

Signed-off-by: Heiko Carstens
Acked-by: Martin Schwidefsky
Signed-off-by: Martin Schwidefsky

Heiko Carstens
2016-06-13 21:58:13 +0800
fd346c9da s390/keyboard: use memdup_user_nul() ... Browse Code »

Use memdup_user_nul to duplicate a memory region from user-space
to kernel-space and terminate with a NULL, instead of open coding
using kmalloc + copy_from_user and explicitly NULL terminating.

Signed-off-by: Muhammad Falak R Wani
[heiko.carstens@de.ibm.com: remove comment]
Signed-off-by: Heiko Carstens

Signed-off-by: Martin Schwidefsky

Muhammad Falak R Wani
2016-06-13 21:58:12 +0800