24 Mar, 2019

1 commit

  • commit 152482580a1b0accb60676063a1ac57b2d12daf6 upstream.

    kvm_arch_memslots_updated() is at this point in time an x86-specific
    hook for handling MMIO generation wraparound. x86 stashes 19 bits of
    the memslots generation number in its MMIO sptes in order to avoid
    full page fault walks for repeat faults on emulated MMIO addresses.
    Because only 19 bits are used, wrapping the MMIO generation number is
    possible, if unlikely. kvm_arch_memslots_updated() alerts x86 that
    the generation has changed so that it can invalidate all MMIO sptes in
    case the effective MMIO generation has wrapped so as to avoid using a
    stale spte, e.g. a (very) old spte that was created with generation==0.

    Given that the purpose of kvm_arch_memslots_updated() is to prevent
    consuming stale entries, it needs to be called before the new generation
    is propagated to memslots. Invalidating the MMIO sptes after updating
    memslots means that there is a window where a vCPU could dereference
    the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
    spte that was created with (pre-wrap) generation==0.

    Fixes: e59dbe09f8e6 ("KVM: Introduce kvm_arch_memslots_updated()")
    Cc:
    Signed-off-by: Sean Christopherson
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Greg Kroah-Hartman

    Sean Christopherson
     

13 Feb, 2019

1 commit

  • [ Upstream commit be534791011100d204602e2e0496e9e6ce8edf63 ]

    There exist very few ap messages which need to have the 'special' flag
    enabled. This flag tells the firmware layer to do some pre- and maybe
    postprocessing. However, it may happen that this special flag is
    enabled but the firmware is unable to deal with this kind of message
    and thus returns with reply code 0x41. For example older firmware may
    not know the newest messages triggered by the zcrypt device driver and
    thus react with reject and the named reply code. Unfortunately this
    reply code is not known to the zcrypt error routines and thus default
    behavior is to switch the ap queue offline.

    This patch now makes the ap error routine aware of the reply code and
    so userspace is informed about the bad processing result but the queue
    is not switched to offline state any more.

    Signed-off-by: Harald Freudenberger
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Sasha Levin

    Harald Freudenberger
     

31 Jan, 2019

1 commit

  • commit a38662084c8bdb829ff486468c7ea801c13fcc34 upstream.

    The ASCE of an mm_struct can be modified after a task has been created,
    e.g. via crst_table_downgrade for a compat process. The active_mm logic
    to avoid the switch_mm call if the next task is a kernel thread can
    lead to a situation where switch_mm is called where 'prev == next' is
    true but 'prev->context.asce == next->context.asce' is not.

    This can lead to a situation where a CPU uses the outdated ASCE to run
    a task. The result can be a crash, endless loops and really subtle
    problem due to TLBs being created with an invalid ASCE.

    Cc: stable@kernel.org # v3.15+
    Fixes: 53e857f30867 ("s390/mm,tlb: race of lazy TLB flush vs. recreation")
    Reported-by: Heiko Carstens
    Reviewed-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Greg Kroah-Hartman

    Martin Schwidefsky
     

27 Nov, 2018

1 commit

  • [ Upstream commit e12e4044aede97974f2222eb7f0ed726a5179a32 ]

    In case a fork or a clone system fails in copy_process and the error
    handling does the mmput() at the bad_fork_cleanup_mm label, the
    following warning messages will appear on the console:

    BUG: non-zero pgtables_bytes on freeing mm: 16384

    The reason for that is the tricks we play with mm_inc_nr_puds() and
    mm_inc_nr_pmds() in init_new_context().

    A normal 64-bit process has 3 levels of page table, the p4d level and
    the pud level are folded. On process termination the free_pud_range()
    function in mm/memory.c will subtract 16KB from pgtable_bytes with a
    mm_dec_nr_puds() call, but there actually is not really a pud table.

    One issue with this is the fact that pgtable_bytes is usually off
    by a few kilobytes, but the more severe problem is that for a failed
    fork or clone the free_pgtables() function is not called. In this case
    there is no mm_dec_nr_puds() or mm_dec_nr_pmds() that go together with
    the mm_inc_nr_puds() and mm_inc_nr_pmds in init_new_context().
    The pgtable_bytes will be off by 16384 or 32768 bytes and we get the
    BUG message. The message itself is purely cosmetic, but annoying.

    To fix this override the mm_pmd_folded, mm_pud_folded and mm_p4d_folded
    function to check for the true size of the address space.

    Reported-by: Li Wang
    Tested-by: Li Wang
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Sasha Levin

    Martin Schwidefsky
     

20 Sep, 2018

1 commit

  • The resume code checks if the resume cpu is the same as the suspend cpu.
    If not, and if it is also not possible to switch to the suspend cpu, an
    error message should be printed and the resume process should be stopped
    by loading a disabled wait psw.

    The current logic is broken in multiple ways, the message is never printed,
    and the disabled wait psw never loaded because the kernel panics before that:
    - sam31 and SIGP_SET_ARCHITECTURE to ESA mode is wrong, this will break
    on the first 64bit instruction in sclp_early_printk().
    - The init stack should be used, but the stack pointer is not set up correctly
    (missing aghi %r15,-STACK_FRAME_OVERHEAD).
    - __sclp_early_printk() checks the sclp_init_state. If it is not
    sclp_init_state_uninitialized, it simply returns w/o printing anything.
    In the resumed kernel however, sclp_init_state will never be uninitialized.

    This patch fixes those issues by removing the sam31/ESA logic, adding a
    correct init stack pointer, and also introducing sclp_early_printk_force()
    to allow using sclp_early_printk() even when sclp_init_state is not
    uninitialized.

    Reviewed-by: Heiko Carstens
    Signed-off-by: Gerald Schaefer
    Signed-off-by: Martin Schwidefsky

    Gerald Schaefer
     

04 Sep, 2018

1 commit


25 Aug, 2018

1 commit

  • Pull s390 updates from Martin Schwidefsky:

    - A couple of patches for the zcrypt driver:
    + Add two masks to determine which AP cards and queues are host
    devices, this will be useful for KVM AP device passthrough
    + Add-on patch to improve the parsing of the new apmask and aqmask
    + Some code beautification

    - Second try to reenable the GCC plugins, the first patch set had a
    patch to do this but the merge somehow missed this

    - Remove the s390 specific GCC version check and use the generic one

    - Three patches for kdump, two bug fixes and one cleanup

    - Three patches for the PCI layer, one bug fix and two cleanups

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
    s390: remove gcc version check (4.3 or newer)
    s390/zcrypt: hex string mask improvements for apmask and aqmask.
    s390/zcrypt: AP bus support for alternate driver(s)
    s390/zcrypt: code beautify
    s390/zcrypt: switch return type to bool for ap_instructions_available()
    s390/kdump: Remove kzalloc_panic
    s390/kdump: Fix memleak in nt_vmcoreinfo
    s390/kdump: Make elfcorehdr size calculation ABI compliant
    s390/pci: remove fmb address from debug output
    s390/pci: remove stale rc
    s390/pci: fix out of bounds access during irq setup
    s390/zcrypt: fix ap_instructions_available() returncodes
    s390: reenable gcc plugins for real

    Linus Torvalds
     

21 Aug, 2018

1 commit

  • Pull tracing updates from Steven Rostedt:

    - Restructure of lockdep and latency tracers

    This is the biggest change. Joel Fernandes restructured the hooks
    from irqs and preemption disabling and enabling. He got rid of a lot
    of the preprocessor #ifdef mess that they caused.

    He turned both lockdep and the latency tracers to use trace events
    inserted in the preempt/irqs disabling paths. But unfortunately,
    these started to cause issues in corner cases. Thus, parts of the
    code was reverted back to where lockdep and the latency tracers just
    get called directly (without using the trace events). But because the
    original change cleaned up the code very nicely we kept that, as well
    as the trace events for preempt and irqs disabling, but they are
    limited to not being called in NMIs.

    - Have trace events use SRCU for "rcu idle" calls. This was required
    for the preempt/irqs off trace events. But it also had to not allow
    them to be called in NMI context. Waiting till Paul makes an NMI safe
    SRCU API.

    - New notrace SRCU API to allow trace events to use SRCU.

    - Addition of mcount-nop option support

    - SPDX headers replacing GPL templates.

    - Various other fixes and clean ups.

    - Some fixes are marked for stable, but were not fully tested before
    the merge window opened.

    * tag 'trace-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits)
    tracing: Fix SPDX format headers to use C++ style comments
    tracing: Add SPDX License format tags to tracing files
    tracing: Add SPDX License format to bpf_trace.c
    blktrace: Add SPDX License format header
    s390/ftrace: Add -mfentry and -mnop-mcount support
    tracing: Add -mcount-nop option support
    tracing: Avoid calling cc-option -mrecord-mcount for every Makefile
    tracing: Handle CC_FLAGS_FTRACE more accurately
    Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
    Uprobes: Simplify uprobe_register() body
    tracepoints: Free early tracepoints after RCU is initialized
    uprobes: Use synchronize_rcu() not synchronize_sched()
    tracing: Fix synchronizing to event changes with tracepoint_synchronize_unregister()
    ftrace: Remove unused pointer ftrace_swapper_pid
    tracing: More reverting of "tracing: Centralize preemptirq tracepoints and unify their usage"
    tracing/irqsoff: Handle preempt_count for different configs
    tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and unify their usage"
    tracing: irqsoff: Account for additional preempt_disable
    trace: Use rcu_dereference_raw for hooks from trace-event subsystem
    tracing/kprobes: Fix within_notrace_func() to check only notrace functions
    ...

    Linus Torvalds
     

20 Aug, 2018

3 commits

  • Code beautify by following most of the checkpatch suggestions:
    - SPDX license identifier line complains by checkpatch
    - missing space or newline complains by checkpatch
    - octal numbers for permssions complains by checkpatch
    - renaming of static sysfs functions complains by checkpatch
    - fix of block comment complains by checkpatch
    - fix printf like calls where function name instead of %s __func__
    was used
    - __packed instead of __attribute__((packed))
    - init to zero for static variables removed
    - use of DEVICE_ATTR_RO and DEVICE_ATTR_RW macros

    No functional code changes or API changes!

    Signed-off-by: Harald Freudenberger
    Signed-off-by: Martin Schwidefsky

    Harald Freudenberger
     
  • Function ap_instructions_available() had returntype int but
    in fact returned 1 for true and 0 for false. Changed returntype
    to bool.

    Signed-off-by: Harald Freudenberger
    Signed-off-by: Martin Schwidefsky

    Harald Freudenberger
     
  • Pull first set of KVM updates from Paolo Bonzini:
    "PPC:
    - minor code cleanups

    x86:
    - PCID emulation and CR3 caching for shadow page tables
    - nested VMX live migration
    - nested VMCS shadowing
    - optimized IPI hypercall
    - some optimizations

    ARM will come next week"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (85 commits)
    kvm: x86: Set highest physical address bits in non-present/reserved SPTEs
    KVM/x86: Use CC_SET()/CC_OUT in arch/x86/kvm/vmx.c
    KVM: X86: Implement PV IPIs in linux guest
    KVM: X86: Add kvm hypervisor init time platform setup callback
    KVM: X86: Implement "send IPI" hypercall
    KVM/x86: Move X86_CR4_OSXSAVE check into kvm_valid_sregs()
    KVM: x86: Skip pae_root shadow allocation if tdp enabled
    KVM/MMU: Combine flushing remote tlb in mmu_set_spte()
    KVM: vmx: skip VMWRITE of HOST_{FS,GS}_BASE when possible
    KVM: vmx: skip VMWRITE of HOST_{FS,GS}_SEL when possible
    KVM: vmx: always initialize HOST_{FS,GS}_BASE to zero during setup
    KVM: vmx: move struct host_state usage to struct loaded_vmcs
    KVM: vmx: compute need to reload FS/GS/LDT on demand
    KVM: nVMX: remove a misleading comment regarding vmcs02 fields
    KVM: vmx: rename __vmx_load_host_state() and vmx_save_host_state()
    KVM: vmx: add dedicated utility to access guest's kernel_gs_base
    KVM: vmx: track host_state.loaded using a loaded_vmcs pointer
    KVM: vmx: refactor segmentation code in vmx_save_host_state()
    kvm: nVMX: Fix fault priority for VMX operations
    kvm: nVMX: Fix fault vector for VMX operation at CPL > 0
    ...

    Linus Torvalds
     

16 Aug, 2018

3 commits

  • During review of KVM patches it was complained that the
    ap_instructions_available() function returns 0 if AP
    instructions are available and -ENODEV if not. The function
    acts like a boolean function to check for AP instructions
    available and thus should return 0 on failure and != 0 on
    success. Changed to the suggested behaviour and adapted
    the one and only caller of this function which is the ap
    bus core code.

    Signed-off-by: Harald Freudenberger
    Acked-by: Cornelia Huck
    Signed-off-by: Heiko Carstens

    Harald Freudenberger
     
  • Utilize -mfentry and -mnop-mcount gcc options together with
    -mrecord-mcount to get compiler generated calls to the profiling functions
    as nops which are compatible with current -mhotpatch=0,3 approach. At the
    same time -mrecord-mcount enables __mcount_loc section generation by
    the compiler which allows to avoid using scripts/recordmcount.pl script.

    Link: http://lkml.kernel.org/r/patch-4.thread-aa7b8d.git-aa7b8dbf236f.your-ad-here.call-01533557518-ext-9465@work.hours

    Reviewed-by: Heiko Carstens
    Signed-off-by: Vasily Gorbik
    Signed-off-by: Steven Rostedt (VMware)

    Vasily Gorbik
     
  • Pull networking updates from David Miller:
    "Highlights:

    - Gustavo A. R. Silva keeps working on the implicit switch fallthru
    changes.

    - Support 802.11ax High-Efficiency wireless in cfg80211 et al, From
    Luca Coelho.

    - Re-enable ASPM in r8169, from Kai-Heng Feng.

    - Add virtual XFRM interfaces, which avoids all of the limitations of
    existing IPSEC tunnels. From Steffen Klassert.

    - Convert GRO over to use a hash table, so that when we have many
    flows active we don't traverse a long list during accumluation.

    - Many new self tests for routing, TC, tunnels, etc. Too many
    contributors to mention them all, but I'm really happy to keep
    seeing this stuff.

    - Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu.

    - Lots of cleanups and fixes in L2TP code from Guillaume Nault.

    - Add IPSEC offload support to netdevsim, from Shannon Nelson.

    - Add support for slotting with non-uniform distribution to netem
    packet scheduler, from Yousuk Seung.

    - Add UDP GSO support to mlx5e, from Boris Pismenny.

    - Support offloading of Team LAG in NFP, from John Hurley.

    - Allow to configure TX queue selection based upon RX queue, from
    Amritha Nambiar.

    - Support ethtool ring size configuration in aquantia, from Anton
    Mikaev.

    - Support DSCP and flowlabel per-transport in SCTP, from Xin Long.

    - Support list based batching and stack traversal of SKBs, this is
    very exciting work. From Edward Cree.

    - Busyloop optimizations in vhost_net, from Toshiaki Makita.

    - Introduce the ETF qdisc, which allows time based transmissions. IGB
    can offload this in hardware. From Vinicius Costa Gomes.

    - Add parameter support to devlink, from Moshe Shemesh.

    - Several multiplication and division optimizations for BPF JIT in
    nfp driver, from Jiong Wang.

    - Lots of prepatory work to make more of the packet scheduler layer
    lockless, when possible, from Vlad Buslov.

    - Add ACK filter and NAT awareness to sch_cake packet scheduler, from
    Toke Høiland-Jørgensen.

    - Support regions and region snapshots in devlink, from Alex Vesker.

    - Allow to attach XDP programs to both HW and SW at the same time on
    a given device, with initial support in nfp. From Jakub Kicinski.

    - Add TLS RX offload and support in mlx5, from Ilya Lesokhin.

    - Use PHYLIB in r8169 driver, from Heiner Kallweit.

    - All sorts of changes to support Spectrum 2 in mlxsw driver, from
    Ido Schimmel.

    - PTP support in mv88e6xxx DSA driver, from Andrew Lunn.

    - Make TCP_USER_TIMEOUT socket option more accurate, from Jon
    Maxwell.

    - Support for templates in packet scheduler classifier, from Jiri
    Pirko.

    - IPV6 support in RDS, from Ka-Cheong Poon.

    - Native tproxy support in nf_tables, from Máté Eckl.

    - Maintain IP fragment queue in an rbtree, but optimize properly for
    in-order frags. From Peter Oskolkov.

    - Improvde handling of ACKs on hole repairs, from Yuchung Cheng"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits)
    bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT"
    hv/netvsc: Fix NULL dereference at single queue mode fallback
    net: filter: mark expected switch fall-through
    xen-netfront: fix warn message as irq device name has '/'
    cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0
    net: dsa: mv88e6xxx: missing unlock on error path
    rds: fix building with IPV6=m
    inet/connection_sock: prefer _THIS_IP_ to current_text_addr
    net: dsa: mv88e6xxx: bitwise vs logical bug
    net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd()
    ieee802154: hwsim: using right kind of iteration
    net: hns3: Add vlan filter setting by ethtool command -K
    net: hns3: Set tx ring' tc info when netdev is up
    net: hns3: Remove tx ring BD len register in hns3_enet
    net: hns3: Fix desc num set to default when setting channel
    net: hns3: Fix for phy link issue when using marvell phy driver
    net: hns3: Fix for information of phydev lost problem when down/up
    net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero
    net: hns3: Add support for serdes loopback selftest
    bnxt_en: take coredump_record structure off stack
    ...

    Linus Torvalds
     

14 Aug, 2018

2 commits

  • Pull s390 updates from Heiko Carstens:
    "Since Martin is on vacation you get the s390 pull request from me:

    - Host large page support for KVM guests. As the patches have large
    impact on arch/s390/mm/ this series goes out via both the KVM and
    the s390 tree.

    - Add an option for no compression to the "Kernel compression mode"
    menu, this will come in handy with the rework of the early boot
    code.

    - A large rework of the early boot code that will make life easier
    for KASAN and KASLR. With the rework the bootable uncompressed
    image is not generated anymore, only the bzImage is available. For
    debuggung purposes the new "no compression" option is used.

    - Re-enable the gcc plugins as the issue with the latent entropy
    plugin is solved with the early boot code rework.

    - More spectre relates changes:
    + Detect the etoken facility and remove expolines automatically.
    + Add expolines to a few more indirect branches.

    - A rewrite of the common I/O layer trace points to make them
    consumable by 'perf stat'.

    - Add support for format-3 PCI function measurement blocks.

    - Changes for the zcrypt driver:
    + Add attributes to indicate the load of cards and queues.
    + Restructure some code for the upcoming AP device support in KVM.

    - Build flags improvements in various Makefiles.

    - A few fixes for the kdump support.

    - A couple of patches for gcc 8 compile warning cleanup.

    - Cleanup s390 specific proc handlers.

    - Add s390 support to the restartable sequence self tests.

    - Some PTR_RET vs PTR_ERR_OR_ZERO cleanup.

    - Lots of bug fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (107 commits)
    s390/dasd: fix hanging offline processing due to canceled worker
    s390/dasd: fix panic for failed online processing
    s390/mm: fix addressing exception after suspend/resume
    rseq/selftests: add s390 support
    s390: fix br_r1_trampoline for machines without exrl
    s390/lib: use expoline for all bcr instructions
    s390/numa: move initial setup of node_to_cpumask_map
    s390/kdump: Fix elfcorehdr size calculation
    s390/cpum_sf: save TOD clock base in SDBs for time conversion
    KVM: s390: Add huge page enablement control
    s390/mm: Add huge page gmap linking support
    s390/mm: hugetlb pages within a gmap can not be freed
    KVM: s390: Add skey emulation fault handling
    s390/mm: Add huge pmd storage key handling
    s390/mm: Clear skeys for newly mapped huge guest pmds
    s390/mm: Clear huge page storage keys on enable_skey
    s390/mm: Add huge page dirty sync support
    s390/mm: Add gmap pmd invalidation and clearing
    s390/mm: Add gmap pmd notification bit setting
    s390/mm: Add gmap pmd linking
    ...

    Linus Torvalds
     
  • Pull perf update from Thomas Gleixner:
    "The perf crowd presents:

    Kernel updates:

    - Removal of jprobes

    - Cleanup and consolidatation the handling of kprobes

    - Cleanup and consolidation of hardware breakpoints

    - The usual pile of fixes and updates to PMUs and event descriptors

    Tooling updates:

    - Updates and improvements all over the place. Nothing outstanding,
    just the (good) boring incremental grump work"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
    perf trace: Do not require --no-syscalls to suppress strace like output
    perf bpf: Include uapi/linux/bpf.h from the 'perf trace' script's bpf.h
    perf tools: Allow overriding MAX_NR_CPUS at compile time
    perf bpf: Show better message when failing to load an object
    perf list: Unify metric group description format with PMU event description
    perf vendor events arm64: Update ThunderX2 implementation defined pmu core events
    perf cs-etm: Generate branch sample for CS_ETM_TRACE_ON packet
    perf cs-etm: Generate branch sample when receiving a CS_ETM_TRACE_ON packet
    perf cs-etm: Support dummy address value for CS_ETM_TRACE_ON packet
    perf cs-etm: Fix start tracing packet handling
    perf build: Fix installation directory for eBPF
    perf c2c report: Fix crash for empty browser
    perf tests: Fix indexing when invoking subtests
    perf trace: Beautify the AF_INET & AF_INET6 'socket' syscall 'protocol' args
    perf trace beauty: Add beautifiers for 'socket''s 'protocol' arg
    perf trace beauty: Do not print NULL strarray entries
    perf beauty: Add a generator for IPPROTO_ socket's protocol constants
    tools include uapi: Grab a copy of linux/in.h
    perf tests: Fix complex event name parsing
    perf evlist: Fix error out while applying initial delay and LBR
    ...

    Linus Torvalds
     

31 Jul, 2018

4 commits


30 Jul, 2018

7 commits

  • Let's introduce an explicit check if skeys have already been enabled
    for the vcpu, so we don't have to check the mm context if we don't have
    the storage key facility.

    This lets us check for enablement without having to take the mm
    semaphore and thus speedup skey emulation.

    Signed-off-by: Janosch Frank
    Reviewed-by: David Hildenbrand
    Acked-by: Farhan Ali
    Signed-off-by: Christian Borntraeger

    Janosch Frank
     
  • Similarly to the pte skey handling, where we set the storage key to
    the default key for each newly mapped pte, we have to also do that for
    huge pmds.

    With the PG_arch_1 flag we keep track if the area has already been
    cleared of its skeys.

    Signed-off-by: Janosch Frank
    Reviewed-by: Martin Schwidefsky

    Janosch Frank
     
  • To do dirty loging with huge pages, we protect huge pmds in the
    gmap. When they are written to, we unprotect them and mark them dirty.

    We introduce the function gmap_test_and_clear_dirty_pmd which handles
    dirty sync for huge pages.

    Signed-off-by: Janosch Frank
    Acked-by: David Hildenbrand

    Janosch Frank
     
  • If the host invalidates a pmd, we also have to invalidate the
    corresponding gmap pmds, as well as flush them from the TLB. This is
    necessary, as we don't share the pmd tables between host and guest as
    we do with ptes.

    The clearing part of these three new functions sets a guest pmd entry
    to _SEGMENT_ENTRY_EMPTY, so the guest will fault on it and we will
    re-link it.

    Flushing the gmap is not necessary in the host's lazy local and csp
    cases. Both purge the TLB completely.

    Signed-off-by: Janosch Frank
    Reviewed-by: Martin Schwidefsky
    Acked-by: David Hildenbrand

    Janosch Frank
     
  • Like for ptes, we also need invalidation notification for pmds, to
    make sure the guest lowcore pages are always accessible and later
    addition of shadowed pmds.

    With PMDs we do not have PGSTEs or some other bits we could use in the
    host PMD. Instead we pick one of the free bits in the gmap PMD. Every
    time a host pmd will be invalidated, we will check if the respective
    gmap PMD has the bit set and in that case fire up the notifier.

    Signed-off-by: Janosch Frank

    Janosch Frank
     
  • Let's allow pmds to be linked into gmap for the upcoming s390 KVM huge
    page support.

    Before this patch we copied the full userspace pmd entry. This is not
    correct, as it contains SW defined bits that might be interpreted
    differently in the GMAP context. Now we only copy over all hardware
    relevant information leaving out the software bits.

    Signed-off-by: Janosch Frank
    Reviewed-by: David Hildenbrand

    Janosch Frank
     
  • Currently we use the software PGSTE bits PGSTE_IN_BIT and
    PGSTE_VSIE_BIT to notify before an invalidation occurs on a prefix
    page or a VSIE page respectively. Both bits are pgste specific, but
    are used when protecting a memory range.

    Let's introduce abstract GMAP_NOTIFY_* bits that will be realized into
    the respective bits when gmap DAT table entries are protected.

    Signed-off-by: Janosch Frank
    Reviewed-by: Christian Borntraeger
    Reviewed-by: David Hildenbrand

    Janosch Frank
     

19 Jul, 2018

1 commit

  • We want to provide facility 156 (etoken facility) to our
    guests. This includes migration support (via sync regs) and
    VSIE changes. The tokens are being reset on clear reset. This
    has to be implemented by userspace (via sync regs).

    Signed-off-by: Christian Borntraeger
    Reviewed-by: David Hildenbrand
    Acked-by: Cornelia Huck

    Christian Borntraeger
     

17 Jul, 2018

1 commit

  • Remove attribute packed where possible failing this add proper alignment
    information to fix warnings like the one below:

    drivers/s390/cio/chsc.c: In function 'chsc_siosl':
    drivers/s390/cio/chsc.c:1287:2: warning: alignment 1 of 'struct ' is less than 4 [-Wpacked-not-aligned]
    } __attribute__ ((packed)) *siosl_area;

    Note: this patch should be a nop since non of these structs use auto
    storage but allocated pages. However there are changes to the generated
    code because of additional padding at the end of some of the structs due
    to alignment when memset(foo, 0, sizeof(*foo)) is used.

    Signed-off-by: Sebastian Ott
    Signed-off-by: Martin Schwidefsky

    Sebastian Ott
     

13 Jul, 2018

1 commit

  • This is a fix for several issues that were found in the original code
    for storage attributes migration.

    Now no bitmap is allocated to keep track of dirty storage attributes;
    the extra bits of the per-memslot bitmap that are always present anyway
    are now used for this purpose.

    The code has also been refactored a little to improve readability.

    Fixes: 190df4a212a ("KVM: s390: CMMA tracking, ESSA emulation, migration mode")
    Fixes: 4036e3874a1 ("KVM: s390: ioctls to get and set guest storage attributes")
    Acked-by: Janosch Frank
    Signed-off-by: Claudio Imbrenda
    Message-Id:
    Signed-off-by: Christian Borntraeger

    Claudio Imbrenda
     

06 Jul, 2018

1 commit

  • Currently there are some variables in the purgatory (e.g. kernel_entry)
    which are defined twice, once in assembler- and once in c-code. The reason
    for this is that these variables are set during purgatory load, where
    sanity checks on the corresponding Elf_Sym's are made, while they are used
    in assembler-code. Thus adding a second definition in c-code is a handy
    workaround to guarantee correct Elf_Sym's are created.

    When the purgatory is compiled with -fcommon (default for gcc on s390) this
    is no problem because both symbols are merged by the linker. However this
    is not required by ISO C and when the purgatory is built with -fno-common
    the linker fails with errors like

    arch/s390/purgatory/purgatory.o:(.bss+0x18): multiple definition of `kernel_entry'
    arch/s390/purgatory/head.o:/.../arch/s390/purgatory/head.S:230: first defined here

    Thus remove the duplicate definitions and add the required size and type
    information to the assembler definition. Also add -fno-common to the
    command line options to prevent similar hacks in the future.

    Signed-off-by: Philipp Rudo
    Acked-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Philipp Rudo
     

04 Jul, 2018

2 commits

  • This patch introduces SO_TXTIME. User space enables this option in
    order to pass a desired future transmit time in a CMSG when calling
    sendmsg(2). The argument to this socket option is a 8-bytes long struct
    provided by the uapi header net_tstamp.h defined as:

    struct sock_txtime {
    clockid_t clockid;
    u32 flags;
    };

    Note that new fields were added to struct sock by filling a 2-bytes
    hole found in the struct. For that reason, neither the struct size or
    number of cachelines were altered.

    Signed-off-by: Richard Cochran
    Signed-off-by: Jesus Sanchez-Palencia
    Signed-off-by: David S. Miller

    Richard Cochran
     
  • Add support for format 3 function measurement blocks.

    Signed-off-by: Sebastian Ott
    Signed-off-by: Martin Schwidefsky

    Sebastian Ott
     

02 Jul, 2018

3 commits

  • Since the plain vmlinux ELF file no longer carries all necessary parts
    for starting up (like the entry point and decompressor), add a check
    which would block boot process and encourage users to use bzImage or
    arch/s390/boot/compressed/vmlinux instead.

    The check relies on s390 linux entry point ABI definition, which is only
    present in bzImage and arch/s390/boot/compressed/vmlinux.

    Reported-by: Guenter Roeck
    Tested-by: Guenter Roeck
    Acked-by: Cornelia Huck
    Acked-by: Christian Borntraeger
    Reviewed-by: Heiko Carstens
    Signed-off-by: Vasily Gorbik
    Signed-off-by: Martin Schwidefsky

    Vasily Gorbik
     
  • Aligning struct lowcore to double page size allows to get rid of this
    gcc warning:

    In file included from ./arch/s390/include/asm/setup.h:56,
    from ./arch/s390/include/asm/page.h:36,
    from ./arch/s390/include/asm/user.h:11,
    from ./include/linux/user.h:1,
    from ./include/linux/elfcore.h:5,
    from ./include/linux/crash_core.h:6,
    from ./include/linux/kexec.h:18,
    from arch/s390/purgatory/purgatory.c:10:
    ./arch/s390/include/asm/lowcore.h:189:1: warning: alignment 1 of 'struct
    lowcore' is less than 8 [-Wpacked-not-aligned]
    } __packed;

    Acked-by: Christian Borntraeger
    Reviewed-by: Heiko Carstens
    Signed-off-by: Vasily Gorbik
    Signed-off-by: Martin Schwidefsky

    Vasily Gorbik
     
  • Since startup code now reserves memory ranges [0, PARMAREA_END] and
    [_stext, ] _ehead symbol is not used and could be
    cleaned up.

    Reviewed-by: Heiko Carstens
    Signed-off-by: Vasily Gorbik
    Signed-off-by: Martin Schwidefsky

    Vasily Gorbik
     

25 Jun, 2018

4 commits