04 Nov, 2015

1 commit

  • Pull x86 sigcontext header cleanups from Ingo Molnar:
    "This series reorganizes and cleans up various aspects of the main
    sigcontext UAPI headers, such as unifying the data structures and
    updating/adding lots of comments to explain all the ABI details and
    quirks. The headers can now also be built in user-space standalone"

    * 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/headers: Clean up too long lines
    x86/headers: Remove references on the kernel side
    x86/headers: Remove direct sigcontext32.h uses
    x86/headers: Convert sigcontext_ia32 uses to sigcontext_32
    x86/headers: Unify 'struct sigcontext_ia32' and 'struct sigcontext_32'
    x86/headers: Make sigcontext pointers bit independent
    x86/headers: Move the 'struct sigcontext' definitions into the UAPI header
    x86/headers: Clean up the kernel's struct sigcontext types to be ABI-clean
    x86/headers: Convert uses of _fpstate_ia32 to _fpstate_32
    x86/headers: Unify 'struct _fpstate_ia32' and i386 struct _fpstate
    x86/headers: Unify register type definitions between 32-bit compat and i386
    x86/headers: Use ABI types consistently in sigcontext*.h
    x86/headers: Separate out legacy user-space structure definitions
    x86/headers: Clean up and better document uapi/asm/sigcontext.h
    x86/headers: Clean up uapi/asm/sigcontext32.h
    x86/headers: Fix (old) header file dependency bug in uapi/asm/sigcontext32.h

    Linus Torvalds
     

07 Oct, 2015

1 commit

  • 32-bit userspace will now always see the same vDSO, which is
    exactly what used to be the int80 vDSO. Subsequent patches will
    clean it up and make it support SYSENTER and SYSCALL using
    alternatives.

    Signed-off-by: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/e7e6b3526fa442502e6125fe69486aab50813c32.1444091584.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

08 Sep, 2015

4 commits

  • Now that all type definitions are in the UAPI header, include it
    directly, instead of through .

    [ We still keep asm/sigcontext.h, so that uapi/asm/sigcontext32.h
    can include . ]

    Acked-by: Mikko Rapeli
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/1441438363-9999-16-git-send-email-mingo@kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Now that all sigcontext types are defined in asm/sigcontext.h,
    remove the various sigcontext32.h uses in the kernel.

    We still keep the header itself, which includes sigcontext.h, in
    case user-space relies on it.

    Acked-by: Mikko Rapeli
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/1441438363-9999-15-git-send-email-mingo@kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Use the new name in kernel code, and move the old name to the
    user-space-only legacy section of the UAPI header.

    Acked-by: Mikko Rapeli
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/1441438363-9999-14-git-send-email-mingo@kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Remove uses of _fpstate_ia32 from the kernel, and move the
    legacy _fpstate_ia32 definition to the user-space only portion
    of the header.

    Acked-by: Mikko Rapeli
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/1441438363-9999-9-git-send-email-mingo@kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

06 Jul, 2015

1 commit

  • copy_siginfo_to_user32() and copy_siginfo_from_user32() are used
    by both the 32-bit compat and x32 ABIs. Move them to
    signal_compat.c.

    Signed-off-by: Brian Gerst
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1434974121-32575-2-git-send-email-brgerst@gmail.com
    Signed-off-by: Ingo Molnar

    Brian Gerst
     

23 Jun, 2015

1 commit

  • Pull x86 core updates from Ingo Molnar:
    "There were so many changes in the x86/asm, x86/apic and x86/mm topics
    in this cycle that the topical separation of -tip broke down somewhat -
    so the result is a more traditional architecture pull request,
    collected into the 'x86/core' topic.

    The topics were still maintained separately as far as possible, so
    bisectability and conceptual separation should still be pretty good -
    but there were a handful of merge points to avoid excessive
    dependencies (and conflicts) that would have been poorly tested in the
    end.

    The next cycle will hopefully be much more quiet (or at least will
    have fewer dependencies).

    The main changes in this cycle were:

    * x86/apic changes, with related IRQ core changes: (Jiang Liu, Thomas
    Gleixner)

    - This is the second and most intrusive part of changes to the x86
    interrupt handling - full conversion to hierarchical interrupt
    domains:

    [IOAPIC domain] -----
    |
    [MSI domain] --------[Remapping domain] ----- [ Vector domain ]
    | (optional) |
    [HPET MSI domain] ----- |
    |
    [DMAR domain] -----------------------------
    |
    [Legacy domain] -----------------------------

    This now reflects the actual hardware and allowed us to distangle
    the domain specific code from the underlying parent domain, which
    can be optional in the case of interrupt remapping. It's a clear
    separation of functionality and removes quite some duct tape
    constructs which plugged the remap code between ioapic/msi/hpet
    and the vector management.

    - Intel IOMMU IRQ remapping enhancements, to allow direct interrupt
    injection into guests (Feng Wu)

    * x86/asm changes:

    - Tons of cleanups and small speedups, micro-optimizations. This
    is in preparation to move a good chunk of the low level entry
    code from assembly to C code (Denys Vlasenko, Andy Lutomirski,
    Brian Gerst)

    - Moved all system entry related code to a new home under
    arch/x86/entry/ (Ingo Molnar)

    - Removal of the fragile and ugly CFI dwarf debuginfo annotations.
    Conversion to C will reintroduce many of them - but meanwhile
    they are only getting in the way, and the upstream kernel does
    not rely on them (Ingo Molnar)

    - NOP handling refinements. (Borislav Petkov)

    * x86/mm changes:

    - Big PAT and MTRR rework: making the code more robust and
    preparing to phase out exposing direct MTRR interfaces to drivers -
    in favor of using PAT driven interfaces (Toshi Kani, Luis R
    Rodriguez, Borislav Petkov)

    - New ioremap_wt()/set_memory_wt() interfaces to support
    Write-Through cached memory mappings. This is especially
    important for good performance on NVDIMM hardware (Toshi Kani)

    * x86/ras changes:

    - Add support for deferred errors on AMD (Aravind Gopalakrishnan)

    This is an important RAS feature which adds hardware support for
    poisoned data. That means roughly that the hardware marks data
    which it has detected as corrupted but wasn't able to correct, as
    poisoned data and raises an APIC interrupt to signal that in the
    form of a deferred error. It is the OS's responsibility then to
    take proper recovery action and thus prolonge system lifetime as
    far as possible.

    - Add support for Intel "Local MCE"s: upcoming CPUs will support
    CPU-local MCE interrupts, as opposed to the traditional system-
    wide broadcasted MCE interrupts (Ashok Raj)

    - Misc cleanups (Borislav Petkov)

    * x86/platform changes:

    - Intel Atom SoC updates

    ... and lots of other cleanups, fixlets and other changes - see the
    shortlog and the Git log for details"

    * 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (222 commits)
    x86/hpet: Use proper hpet device number for MSI allocation
    x86/hpet: Check for irq==0 when allocating hpet MSI interrupts
    x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled
    x86/mm/pat, drivers/media/ivtv: Use arch_phys_wc_add() and require PAT disabled
    x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail
    genirq: Prevent crash in irq_move_irq()
    genirq: Enhance irq_data_to_desc() to support hierarchy irqdomain
    iommu, x86: Properly handle posted interrupts for IOMMU hotplug
    iommu, x86: Provide irq_remapping_cap() interface
    iommu, x86: Setup Posted-Interrupts capability for Intel iommu
    iommu, x86: Add cap_pi_support() to detect VT-d PI capability
    iommu, x86: Avoid migrating VT-d posted interrupts
    iommu, x86: Save the mode (posted or remapped) of an IRTE
    iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
    iommu: dmar: Provide helper to copy shared irte fields
    iommu: dmar: Extend struct irte for VT-d Posted-Interrupts
    iommu: Add new member capability to struct irq_remap_ops
    x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code
    x86/asm/entry/32: Shorten __audit_syscall_entry() args preparation
    x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry()
    ...

    Linus Torvalds
     

04 Jun, 2015

1 commit


02 Jun, 2015

1 commit

  • So the dwarf2 annotations in low level assembly code have
    become an increasing hindrance: unreadable, messy macros
    mixed into some of the most security sensitive code paths
    of the Linux kernel.

    These debug info annotations don't even buy the upstream
    kernel anything: dwarf driven stack unwinding has caused
    problems in the past so it's out of tree, and the upstream
    kernel only uses the much more robust framepointers based
    stack unwinding method.

    In addition to that there's a steady, slow bitrot going
    on with these annotations, requiring frequent fixups.
    There's no tooling and no functionality upstream that
    keeps it correct.

    So burn down the sick forest, allowing new, healthier growth:

    27 files changed, 350 insertions(+), 1101 deletions(-)

    Someone who has the willingness and time to do this
    properly can attempt to reintroduce dwarf debuginfo in x86
    assembly code plus dwarf unwinding from first principles,
    with the following conditions:

    - it should be maximally readable, and maximally low-key to
    'ordinary' code reading and maintenance.

    - find a build time method to insert dwarf annotations
    automatically in the most common cases, for pop/push
    instructions that manipulate the stack pointer. This could
    be done for example via a preprocessing step that just
    looks for common patterns - plus special annotations for
    the few cases where we want to depart from the default.
    We have hundreds of CFI annotations, so automating most of
    that makes sense.

    - it should come with build tooling checks that ensure that
    CFI annotations are sensible. We've seen such efforts from
    the framepointer side, and there's no reason it couldn't be
    done on the dwarf side.

    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: Frédéric Weisbecker
    Cc: Jan Beulich
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

19 May, 2015

8 commits

  • Most of the FPU does not use them, so split it out and include
    them in signal.c and ia32_signal.c

    Also fix header file dependency assumption in fpu/core.c.

    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Consolidate more signal frame related functions:

    text data bss dec filename
    14108070 2575280 1634304 18317654 vmlinux.before
    14107944 2575344 1634304 18317592 vmlinux.after

    Also, while moving it, rename alloc_mathframe() to fpu__alloc_mathframe().

    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • restore_xstate_sig() is a misnomer: it's not limited to 'xstate' at all,
    it is the high level 'restore FPU state from a signal frame' function
    that works with all legacy FPU formats as well.

    Rename it (and its helper) accordingly, and also move it to the
    fpu__*() namespace.

    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Standardize the naming of save_xstate_sig() by renaming it to
    copy_fpstate_to_sigframe(): this tells us at a glance that
    the function copies an FPU fpstate to a signal frame.

    This naming also follows the naming of copy_fpregs_to_fpstate().

    Don't put 'xstate' into the name: since this is a generic name,
    it's expected that the function is able to handle xstate frames
    as well, beyond legacy frames.

    xstate used to be the odd case in the x86 FPU code - now it's the
    common case.

    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • This unifies all the FPU related header files under a unified, hiearchical
    naming scheme:

    - asm/fpu/types.h: FPU related data types, needed for 'struct task_struct',
    widely included in almost all kernel code, and hence kept
    as small as possible.

    - asm/fpu/api.h: FPU related 'public' methods exported to other subsystems.

    - asm/fpu/internal.h: FPU subsystem internal methods

    - asm/fpu/xsave.h: XSAVE support internal methods

    (Also standardize the header guard in asm/fpu/internal.h.)

    Reviewed-by: Borislav Petkov
    Cc: Andy Lutomirski
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Introduce a simple fpu->fpstate_active flag in the fpu context data structure
    and use that instead of PF_USED_MATH in task->flags.

    Testing for this flag byte should be slightly more efficient than
    testing a bit in a bitmask, but the main advantage is that most
    FPU functions can now be performed on a 'struct fpu' alone, they
    don't need access to 'struct task_struct' anymore.

    There's a slight linecount increase, mostly due to the 'fpu' local
    variables and due to extra comments. The local variables will go away
    once we move most of the FPU methods to pure 'struct fpu' parameters.

    Reviewed-by: Borislav Petkov
    Cc: Andy Lutomirski
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • PF_USED_MATH is used directly, but also in a handful of helper inlines.

    To ease the elimination of PF_USED_MATH, convert all inline helpers
    to open-coded PF_USED_MATH usage.

    Reviewed-by: Borislav Petkov
    Cc: Andy Lutomirski
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Fix a minor header file dependency bug in asm/fpu-internal.h: it
    relies on i387.h but does not include it. All users of fpu-internal.h
    included it explicitly.

    Also remove unnecessary includes, to reduce compilation time.

    This also makes it easier to use it as a standalone header file
    for FPU internals, such as an upcoming C module in arch/x86/kernel/fpu/.

    Reviewed-by: Borislav Petkov
    Cc: Andy Lutomirski
    Cc: Dave Hansen
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

08 May, 2015

3 commits

  • 32-bit code has PER_CPU_VAR(cpu_current_top_of_stack).
    64-bit code uses somewhat more obscure: PER_CPU_VAR(cpu_tss + TSS_sp0).

    Define the 'cpu_current_top_of_stack' macro on CONFIG_X86_64
    as well so that the PER_CPU_VAR(cpu_current_top_of_stack)
    expression can be used in both 32-bit and 64-bit code.

    Signed-off-by: Denys Vlasenko
    Cc: Alexei Starovoitov
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/1429889495-27850-3-git-send-email-dvlasenk@redhat.com
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     
  • PER_CPU_VAR(kernel_stack) is redundant:

    - On the 64-bit build, we can use PER_CPU_VAR(cpu_tss + TSS_sp0).
    - On the 32-bit build, we can use PER_CPU_VAR(cpu_current_top_of_stack).

    PER_CPU_VAR(kernel_stack) will be deleted by a separate change.

    Signed-off-by: Denys Vlasenko
    Cc: Alexei Starovoitov
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/1429889495-27850-1-git-send-email-dvlasenk@redhat.com
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     

27 Apr, 2015

1 commit

  • AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with
    SS == 0 results in an invalid usermode state in which SS is apparently
    equal to __USER_DS but causes #SS if used.

    Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
    ensuring that SYSRET never happens with SS set to NULL.

    This was exposed by a recent vDSO cleanup.

    Fixes: e7d6eefaaa44 x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
    Signed-off-by: Andy Lutomirski
    Cc: Peter Anvin
    Cc: Borislav Petkov
    Cc: Denys Vlasenko
    Cc: Brian Gerst
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     

22 Apr, 2015

2 commits

  • Recently Andy changed the 64-bit syscall logic so that
    pt_regs->ax is initially set to -ENOSYS, and on syscall exit,
    it is updated with the actual return value. This simplified
    the logic there.

    This patch does the same for 32-bit syscall entry points.

    The check for %rax being too big is moved to be just before
    the call instruction which dispatches execution through the
    syscall table.

    There is no way to accidentally skip this check now by jumping
    to a label after it. This allows us to remove redundant checks
    after ptrace et al.

    If %rax is too big, we just skip over the (call, write %rax to
    pt_regs->ax) instruction pair. pt_regs->ax remains set to -ENOSYS,
    and it gets returned to userspace.

    Similar to 64-bit code, this eliminates the "ia32_badsys" code path.

    Run-tested.

    Signed-off-by: Denys Vlasenko
    Cc: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/1429632194-13445-2-git-send-email-dvlasenk@redhat.com
    [ Changelog massage. ]
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     
  • We don't use irq_enable_sysexit on 64-bit kernels any more.
    Remove all the paravirt and Xen machinery to support it on
    64-bit kernels.

    Tested-by: Boris Ostrovsky
    Signed-off-by: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/8a03355698fe5b94194e9e7360f19f91c1b2cf1f.1428100853.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

09 Apr, 2015

1 commit

  • The change which affected how execve clears EXTRA_REGS missed
    32-bit execve syscalls.

    Fix this by using 64-bit execve stub epilogue for them too.

    Run-tested.

    Signed-off-by: Denys Vlasenko
    Cc: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/1428439424-7258-3-git-send-email-dvlasenk@redhat.com
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     

06 Apr, 2015

1 commit

  • The 'pax' argument is unnecesary. Instead, store the RAX value
    directly in regs.

    This pattern goes all the way back to 2.1.106pre1, when restore_sigcontext()
    was changed to return an error code instead of EAX directly:

    https://git.kernel.org/cgit/linux/kernel/git/history/history.git/diff/arch/i386/kernel/signal.c?id=9a8f8b7ca3f319bd668298d447bdf32730e51174

    In 2007 sigaltstack syscall support was added, where the return
    value of restore_sigcontext() was changed to carry the memory-copying
    failure code.

    But instead of putting 'ax' into regs->ax directly, it was carried
    in via a pointer and then returned, where the generic syscall return
    code copied it to regs->ax.

    So there was never any deeper reason for this suboptimal pattern, it
    was simply never noticed after being introduced.

    Signed-off-by: Brian Gerst
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1428152303-17154-1-git-send-email-brgerst@gmail.com
    Signed-off-by: Ingo Molnar

    Brian Gerst
     

03 Apr, 2015

1 commit

  • SYSEXIT is scary on 64-bit kernels -- SYSEXIT must be invoked
    with usergs and IRQs on. That means that we rely on STI to
    correctly mask interrupts for one instruction. This is okay by
    itself, but the semantics with respect to NMIs are unclear.

    Avoid the whole issue by using SYSRETL instead. For background,
    Intel CPUs don't allow SYSCALL from compat mode, but they do
    allow SYSRETL back to compat mode. Go figure.

    To avoid doing too much at once, this doesn't revamp the calling
    convention. We still return with EBP, EDX, and ECX on the user
    stack.

    Oddly this seems to be 30 cycles or so faster. Avoiding POPFQ
    and STI will account for under half of that, I think, so my best
    guess is that Intel just optimizes SYSRET much better than
    SYSEXIT.

    Signed-off-by: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/57a0bf1b5230b2716a64ebe48e9bc1110f7ab433.1428019097.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

01 Apr, 2015

1 commit

  • This mimics the recent similar 64-bit change.
    Saves ~110 bytes of code.

    Patch was run-tested on 32 and 64 bits, Intel and AMD CPU.
    I also looked at the diff of entry_64.o disassembly, to have
    a different view of the changes.

    Signed-off-by: Denys Vlasenko
    Cc: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/1427821211-25099-2-git-send-email-dvlasenk@redhat.com
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     

27 Mar, 2015

2 commits

  • There are a couple of syscall argument zero-extension instructions in
    the 32-bit compat entry code, and it was mentioned that people keep
    trying to optimize them out, introducing bugs.

    Make them more visible, and add a "do not remove" comment.

    Signed-off-by: Denys Vlasenko
    Cc: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/1427452582-21624-3-git-send-email-dvlasenk@redhat.com
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     
  • The existing comment has proven to be not very clear.

    Replace it with a comment similar to the one we now have in the 64-bit
    syscall entry point. (Three instances, one per 32-bit syscall entry).

    In the INT80 entry point's CFI annotations, replace mysterious
    expressions with numric constants. In this case, raw numbers
    look more understandable.

    Signed-off-by: Denys Vlasenko
    Cc: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/1427452582-21624-2-git-send-email-dvlasenk@redhat.com
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     

25 Mar, 2015

4 commits

  • The THREAD_INFO() macro has a somewhat confusingly generic name,
    defined in a generic .h C header file. It also does not make it
    clear that it constructs a memory operand for use in assembly
    code.

    Rename it to ASM_THREAD_INFO() to make it all glaringly
    obvious on first glance.

    Acked-by: Borislav Petkov
    Cc: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Denys Vlasenko
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/20150324184442.GC14760@gmail.com
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Before:

    TI_sysenter_return+THREAD_INFO(%rsp,3*8),%r10d

    After:

    movl THREAD_INFO(TI_sysenter_return, %rsp, 3*8), %r10d

    to turn it into a clear thread_info accessor.

    No code changed:

    md5:
    fb4cb2b3ce05d89940ca304efc8ff183 ia32entry.o.before.asm
    fb4cb2b3ce05d89940ca304efc8ff183 ia32entry.o.after.asm

    e39f2958a5d1300158e276e4f7663263 entry_64.o.before.asm
    e39f2958a5d1300158e276e4f7663263 entry_64.o.after.asm

    Acked-by: Andy Lutomirski
    Acked-by: Denys Vlasenko
    Cc: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/20150324184411.GB14760@gmail.com
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • PER_CPU_VAR(kernel_stack) was set up in a way where it points
    five stack slots below the top of stack.

    Presumably, it was done to avoid one "sub $5*8,%rsp"
    in syscall/sysenter code paths, where iret frame needs to be
    created by hand.

    Ironically, none of them benefits from this optimization,
    since all of them need to allocate additional data on stack
    (struct pt_regs), so they still have to perform subtraction.

    This patch eliminates KERNEL_STACK_OFFSET.

    PER_CPU_VAR(kernel_stack) now points directly to top of stack.
    pt_regs allocations are adjusted to allocate iret frame as well.
    Hopefully we can merge it later with 32-bit specific
    PER_CPU_VAR(cpu_current_top_of_stack) variable...

    Net result in generated code is that constants in several insns
    are changed.

    This change is necessary for changing struct pt_regs creation
    in SYSCALL64 code path from MOV to PUSH instructions.

    Signed-off-by: Denys Vlasenko
    Acked-by: Borislav Petkov
    Acked-by: Andy Lutomirski
    Cc: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/1426785469-15125-2-git-send-email-dvlasenk@redhat.com
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     
  • This changes the THREAD_INFO() definition and all its callsites
    so that they do not count stack position from
    (top of stack - KERNEL_STACK_OFFSET), but from top of stack.

    Semi-mysterious expressions THREAD_INFO(%rsp,RIP) - "why RIP??"
    are now replaced by more logical THREAD_INFO(%rsp,SIZEOF_PTREGS)
    - "calculate thread_info's address using information that
    rsp is SIZEOF_PTREGS bytes below top of stack".

    While at it, replace "(off)-THREAD_SIZE(reg)" with equivalent
    "((off)-THREAD_SIZE)(reg)". The form without parentheses
    falsely looks like we invoke THREAD_SIZE() macro.

    Improve comment atop THREAD_INFO macro definition.

    This patch does not change generated code (verified by objdump).

    Signed-off-by: Denys Vlasenko
    Acked-by: Borislav Petkov
    Acked-by: Andy Lutomirski
    Cc: Alexei Starovoitov
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/1426785469-15125-1-git-send-email-dvlasenk@redhat.com
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     

23 Mar, 2015

1 commit

  • Both the execve() and sigreturn() family of syscalls have the
    ability to change registers in ways that may not be compatabile
    with the syscall path they were called from.

    In particular, SYSRET and SYSEXIT can't handle non-default %cs and %ss,
    and some bits in eflags.

    These syscalls have stubs that are hardcoded to jump to the IRET path,
    and not return to the original syscall path.

    The following commit:

    76f5df43cab5e76 ("Always allocate a complete "struct pt_regs" on the kernel stack")

    recently changed this for some 32-bit compat syscalls, but introduced a bug where
    execve from a 32-bit program to a 64-bit program would fail because it still returned
    via SYSRETL. This caused Wine to fail when built for both 32-bit and 64-bit.

    This patch sets TIF_NOTIFY_RESUME for execve() and sigreturn() so
    that the IRET path is always taken on exit to userspace.

    Signed-off-by: Brian Gerst
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1426978461-32089-1-git-send-email-brgerst@gmail.com
    [ Improved the changelog and comments. ]
    Signed-off-by: Ingo Molnar

    Brian Gerst
     

06 Mar, 2015

2 commits

  • It has nothing to do with init -- there's only one TSS per cpu.

    Other names considered include:

    - current_tss: Confusing because we never switch the tss.
    - singleton_tss: Too long.

    This patch was generated with 's/init_tss/cpu_tss/g'. Followup
    patches will fix INIT_TSS and INIT_TSS_IST by hand.

    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/da29fb2a793e4f649d93ce2d1ed320ebe8516262.1425611534.git.luto@amacapital.net
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     
  • The ia32 sysenter code loaded the top of the kernel stack into
    rsp by loading kernel_stack and then adjusting it. It can be
    simplified to just read sp0 directly.

    This requires the addition of a new asm-offsets entry for sp0.

    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/88ff9006163d296a0665338585c36d9bfb85235d.1425611534.git.luto@amacapital.net
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

05 Mar, 2015

3 commits

  • The last instance of "mysterious" SS+8 constant is replaced by
    SIZEOF_PTREGS.

    Message-Id:
    Signed-off-by: Denys Vlasenko
    Signed-off-by: Andy Lutomirski
    Cc: Alexei Starovoitov
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Steven Rostedt
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/d35aeba3059407ac54f472ddcfbea767ff8916ac.1424989793.git.luto@amacapital.net
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     
  • Use of a small macro - one with conditional expansion - does
    more harm than good. It obfuscates code, with minimal code
    reuse.

    For example, because of obfuscation it's not obvious that
    in 'ia32_sysenter_target', we can optimize loading of r9 -
    currently it is loaded with a detour through ebp.

    This patch folds the IA32_ARG_FIXUP macro into its callers.

    No code changes.

    Signed-off-by: Denys Vlasenko
    Signed-off-by: Andy Lutomirski
    Cc: Alexei Starovoitov
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/4da092094cd78734384ac31e0d4ec1d8f69145a2.1424989793.git.luto@amacapital.net
    Signed-off-by: Ingo Molnar

    Denys Vlasenko
     
  • SYSCALL/SYSRET and SYSENTER/SYSEXIT have weird semantics.
    Moreover, they differ in 32- and 64-bit mode.

    What is saved? What is not? Is rsp set? Are interrupts disabled?
    People tend to not remember these details well enough.

    This patch adds comments which explain in detail
    what registers are modified by each of these instructions.

    The comments are placed immediately before corresponding
    entry and exit points.

    Signed-off-by: Denys Vlasenko
    Signed-off-by: Andy Lutomirski
    Cc: Alexei Starovoitov
    Cc: Borislav Petkov
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Thomas Gleixner
    Cc: Will Drewry
    Link: http://lkml.kernel.org/r/a94b98b63527797c871a81402ff5060b18fa880a.1424989793.git.luto@amacapital.net
    Signed-off-by: Ingo Molnar

    Denys Vlasenko