06 Dec, 2011

1 commit

  • do_notify_resume() gets called with interrupts disabled on x86_32. This
    is different from the x86_64 behavior, where interrupts are enabled at
    the time.

    Queries on lkml on this issue hasn't yielded any clear answer. Lets make
    x86_32 behave the same as x86_64, unless there is a real reason to
    maintain status quo.

    Please refer https://lkml.org/lkml/2011/9/27/130 for more
    details.

    A similar change was suggested in ARM:

    https://lkml.org/lkml/2011/8/25/231

    My 32-bit machine works fine (tm) with this patch.

    Signed-off-by: Srikar Dronamraju
    Acked-by: Masami Hiramatsu
    Signed-off-by: Peter Zijlstra
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/20111025141812.GA21225@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar

    Srikar Dronamraju
     

26 Aug, 2011

1 commit

  • entry_32.S contained a hardcoded alternative instruction entry, and the
    format changed in commit 59e97e4d6fbc ("x86: Make alternative
    instruction pointers relative").

    Replace the hardcoded entry with the altinstruction_entry macro. This
    fixes the 32-bit boot with CONFIG_X86_INVD_BUG=y.

    Reported-and-tested-by: Arnaud Lacombe
    Signed-off-by: Andy Lutomirski
    Cc: Peter Anvin
    Cc: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     

17 Mar, 2011

1 commit

  • …rnel/git/tip/linux-2.6-tip

    * 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: Fix binutils-2.21 symbol related build failures
    x86-64, trampoline: Remove unused variable
    x86, reboot: Fix the use of passed arguments in 32-bit BIOS reboot
    x86, reboot: Move the real-mode reboot code to an assembly file
    x86: Make the GDT_ENTRY() macro in <asm/segment.h> safe for assembly
    x86, trampoline: Use the unified trampoline setup for ACPI wakeup
    x86, trampoline: Common infrastructure for low memory trampolines

    Fix up trivial conflicts in arch/x86/kernel/Makefile

    Linus Torvalds
     

16 Mar, 2011

1 commit

  • * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, binutils, xen: Fix another wrong size directive
    x86: Remove dead config option X86_CPU
    x86: Really print supported CPUs if PROCESSOR_SELECT=y
    x86: Fix a bogus unwind annotation in lib/semaphore_32.S
    um, x86-64: Fix UML build after adding CFI annotations to lib/rwsem_64.S
    x86: Remove unused bits from lib/thunk_*.S
    x86: Use {push,pop}_cfi in more places
    x86-64: Add CFI annotations to lib/rwsem_64.S
    x86, asm: Cleanup unnecssary macros in asm-offsets.c
    x86, system.h: Drop unused __SAVE/__RESTORE macros
    x86: Use bitmap library functions
    x86: Partly unify asm-offsets_{32,64}.c
    x86: Reduce back the alignment of the per-CPU data section

    Linus Torvalds
     

09 Mar, 2011

2 commits

  • New binutils version 2.21.0.20110302-1 started checking that the symbol
    parameter to the .size directive matches the entry name's
    symbol parameter, unearthing two mismatches:

    AS arch/x86/kernel/acpi/wakeup_rm.o
    arch/x86/kernel/acpi/wakeup_rm.S: Assembler messages:
    arch/x86/kernel/acpi/wakeup_rm.S:12: Error: .size expression with symbol `wakeup_code_start' does not evaluate to a constant

    arch/x86/kernel/entry_32.S: Assembler messages:
    arch/x86/kernel/entry_32.S:1421: Error: .size expression with
    symbol `apf_page_fault' does not evaluate to a constant

    The problem was discovered while using Debian's binutils
    (2.21.0.20110302-1) and experimenting with binutils from
    upstream.

    Thanks Alexander and H.J. for the vital help.

    Signed-off-by: Sedat Dilek
    Cc: Alexander van Heukelum
    Cc: H.J. Lu
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: Rafael J. Wysocki
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Sedat Dilek
     
  • Put x86 entry code into a separate link section: .entry.text.

    Separating the entry text section seems to have performance
    benefits - caused by more efficient instruction cache usage.

    Running hackbench with perf stat --repeat showed that the change
    compresses the icache footprint. The icache load miss rate went
    down by about 15%:

    before patch:
    19417627 L1-icache-load-misses ( +- 0.147% )

    after patch:
    16490788 L1-icache-load-misses ( +- 0.180% )

    The motivation of the patch was to fix a particular kprobes
    bug that relates to the entry text section, the performance
    advantage was discovered accidentally.

    Whole perf output follows:

    - results for current tip tree:

    Performance counter stats for './hackbench/hackbench 10' (500 runs):

    19417627 L1-icache-load-misses ( +- 0.147% )
    2676914223 instructions # 0.497 IPC ( +- 0.079% )
    5389516026 cycles ( +- 0.144% )

    0.206267711 seconds time elapsed ( +- 0.138% )

    - results for current tip tree with the patch applied:

    Performance counter stats for './hackbench/hackbench 10' (500 runs):

    16490788 L1-icache-load-misses ( +- 0.180% )
    2717734941 instructions # 0.502 IPC ( +- 0.079% )
    5414756975 cycles ( +- 0.148% )

    0.206747566 seconds time elapsed ( +- 0.137% )

    Signed-off-by: Jiri Olsa
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Nick Piggin
    Cc: Eric Dumazet
    Cc: masami.hiramatsu.pt@hitachi.com
    Cc: ananth@in.ibm.com
    Cc: davem@davemloft.net
    Cc: 2nddept-manager@sdl.hitachi.co.jp
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jiri Olsa
     

01 Mar, 2011

1 commit


26 Feb, 2011

1 commit

  • PAGE_SIZE_asm, PAGE_SHIFT_asm, THREAD_SIZE_asm can be safely removed from
    asm-offsets.c, and be replaced by their non-'_asm' counterparts in the code
    that uses them, since the _AC macro defined in include/linux/const.h makes
    PAGE_SIZE/PAGE_SHIFT/THREAD_SIZE work with as.

    Signed-off-by: Stratos Psomadakis
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Stratos Psomadakis
     

12 Jan, 2011

1 commit

  • When async PF capability is detected hook up special page fault handler
    that will handle async page fault events and bypass other page faults to
    regular page fault handler. Also add async PF handling to nested SVM
    emulation. Async PF always generates exit to L1 where vcpu thread will
    be scheduled out until page is available.

    Acked-by: Rik van Riel
    Signed-off-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Gleb Natapov
     

18 Nov, 2010

1 commit

  • Add parentheses around one pushl_cfi argument.

    Commit df5d1874 "x86: Use {push,pop}{l,q}_cfi in more places"
    caused GNU assembler 2.15 (Debian Sarge) to fail. It is still
    failing as of commit 07bd8516 "x86, asm: Restore parentheses
    around one pushl_cfi argument". This patch solves build failure
    with GNU assembler 2.15.

    Signed-off-by: Tetsuo Handa
    Acked-by: Jan Beulich
    Cc: heukelum@fastmail.fm
    Cc: hpa@linux.intel.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Tetsuo Handa
     

22 Oct, 2010

1 commit

  • These were (intentionally) stripped by "fix CFI macro
    invocations to deal with shortcomings in gas" to expose problems
    with unexpected splitting of arguments by older gas also on
    newer versions, but as it turns out there is at least one distro
    (Ubuntu 6.06) where even not having *any* spaces in a macro
    argument doesn't reliably prevent splitting into multiple
    arguments.

    Signed-off-by: Jan Beulich
    Acked-by: Alexander van Heukelum
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jan Beulich
     

20 Oct, 2010

1 commit

  • gas prior to (perhaps) 2.16.90 has problems with passing non-
    parenthesized expressions containing spaces to macros. Spaces, however,
    get inserted by cpp between any macro expanding to a number and a
    subsequent + or -. For the +, current x86 gas then removes the space
    again (future gas may not do so), but for the - the space gets retained
    and is then considered a separator between macro arguments.

    Fix the respective definitions for both the - and + cases, so that they
    neither contain spaces nor make cpp insert any (the latter by adding
    seemingly redundant parentheses).

    Signed-off-by: Jan Beulich
    LKML-Reference:
    Cc: Alexander van Heukelum
    Signed-off-by: H. Peter Anvin

    Jan Beulich
     

03 Sep, 2010

2 commits

  • ... plus additionally introduce {push,pop}f{l,q}_cfi. All in the
    hope that the code becomes better readable this way (it gets
    quite a bit smaller in any case).

    Signed-off-by: Jan Beulich
    Acked-by: Alexander van Heukelum
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jan Beulich
     
  • When these stubs are actual functions (i.e. having a return
    instruction) and have stack manipulation instructions in them,
    they should also be annotated to allow unwinding through them.

    Signed-off-by: Jan Beulich
    Acked-by: Alexander van Heukelum
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jan Beulich
     

07 Aug, 2010

2 commits


02 Aug, 2010

1 commit


23 Jul, 2010

1 commit

  • Set the callback to receive evtchns from Xen, using the
    callback vector delivery mechanism.

    The traditional way for receiving event channel notifications from Xen
    is via the interrupts from the platform PCI device.
    The callback vector is a newer alternative that allow us to receive
    notifications on any vcpu and doesn't need any PCI support: we allocate
    a vector exclusively to receive events, in the vector handler we don't
    need to interact with the vlapic, therefore we avoid a VMEXIT.

    Signed-off-by: Stefano Stabellini
    Signed-off-by: Sheng Yang
    Signed-off-by: Jeremy Fitzhardinge

    Sheng Yang
     

08 Jul, 2010

1 commit

  • We already have cpufeature indicies above 255, so use a 16-bit number
    for the alternatives index. This consumes a padding field and so
    doesn't add any size, but it means that abusing the padding field to
    create assembly errors on overflow no longer works. We can retain the
    test simply by redirecting it to the .discard section, however.

    [ v3: updated to include open-coded locations ]

    Signed-off-by: H. Peter Anvin
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    H. Peter Anvin
     

04 May, 2010

1 commit

  • The cache flush denied error is an erratum on some AMD 486 clones. If an invd
    instruction is executed in userspace, the processor calls exception 19 (13 hex)
    instead of #GP (13 decimal). On cpus where XMM is not supported, redirect
    exception 19 to do_general_protection(). Also, remove die_if_kernel(), since
    this was the last user.

    Signed-off-by: Brian Gerst
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Brian Gerst
     

11 Dec, 2009

1 commit


10 Dec, 2009

7 commits


23 Oct, 2009

1 commit


14 Oct, 2009

1 commit

  • The function graph tracer replaces the return address with a hook
    to trace the exit of the function call. This hook will finish by
    returning to the real location the function should return to.

    But the current implementation uses a ret to jump to the real
    return location. This causes a imbalance between calls and ret.
    That is the original function does a call, the ret goes to the
    handler and then the handler does a ret without a matching call.

    Although the function graph tracer itself still breaks the branch
    predictor by replacing the original ret, by using a second ret and
    causing an imbalance, it breaks the predictor even more.

    This patch replaces the ret with a jmp to keep the calls and ret
    balanced. I tested this on one box and it showed a 1.7% increase in
    performance. Another box only showed a small 0.3% increase. But no
    box that I tested this on showed a decrease in performance by
    making this change.

    Signed-off-by: Steven Rostedt
    Acked-by: Mathieu Desnoyers
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

11 Sep, 2009

1 commit

  • Move irq-exit functions to .kprobes.text section to protect against
    kprobes recursion.

    When I ran kprobe stress test on x86-32, I found below symbols
    cause unrecoverable recursive probing:

    ret_from_exception
    ret_from_intr
    check_userspace
    restore_all
    restore_all_notrace
    restore_nocheck
    irq_return

    And also, I found some interrupt/exception entry points that
    cause similar problems.

    This patch moves those symbols (including their container functions)
    to .kprobes.text section to prevent any kprobes probing.

    Signed-off-by: Masami Hiramatsu
    Cc: Frederic Weisbecker
    Cc: Ananth N Mavinakayanahalli
    Cc: Jim Keniston
    Cc: Ingo Molnar
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Masami Hiramatsu
     

21 Jun, 2009

1 commit

  • …nel/git/tip/linux-2.6-tip

    * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits)
    tracing/urgent: warn in case of ftrace_start_up inbalance
    tracing/urgent: fix unbalanced ftrace_start_up
    function-graph: add stack frame test
    function-graph: disable when both x86_32 and optimize for size are configured
    ring-buffer: have benchmark test print to trace buffer
    ring-buffer: do not grab locks in nmi
    ring-buffer: add locks around rb_per_cpu_empty
    ring-buffer: check for less than two in size allocation
    ring-buffer: remove useless compile check for buffer_page size
    ring-buffer: remove useless warn on check
    ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index
    tracing: update sample event documentation
    tracing/filters: fix race between filter setting and module unload
    tracing/filters: free filter_string in destroy_preds()
    ring-buffer: use commit counters for commit pointer accounting
    ring-buffer: remove unused variable
    ring-buffer: have benchmark test handle discarded events
    ring-buffer: prevent adding write in discarded area
    tracing/filters: strloc should be unsigned short
    tracing/filters: operand can be negative
    ...

    Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually

    Linus Torvalds
     

19 Jun, 2009

1 commit

  • In case gcc does something funny with the stack frames, or the return
    from function code, we would like to detect that.

    An arch may implement passing of a variable that is unique to the
    function and can be saved on entering a function and can be tested
    when exiting the function. Usually the frame pointer can be used for
    this purpose.

    This patch also implements this for x86. Where it passes in the stack
    frame of the parent function, and will test that frame on exit.

    There was a case in x86_32 with optimize for size (-Os) where, for a
    few functions, gcc would align the stack frame and place a copy of the
    return address into it. The function graph tracer modified the copy and
    not the actual return address. On return from the funtion, it did not go
    to the tracer hook, but returned to the parent. This broke the function
    graph tracer, because the return of the parent (where gcc did not do
    this funky manipulation) returned to the location that the child function
    was suppose to. This caused strange kernel crashes.

    This test detected the problem and pointed out where the issue was.

    This modifies the parameters of one of the functions that the arch
    specific code calls, so it includes changes to arch code to accommodate
    the new prototype.

    Note, I notice that the parsic arch implements its own push_return_trace.
    This is now a generic function and the ftrace_push_return_trace should be
    used instead. This patch does not touch that code.

    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Frederic Weisbecker
    Cc: Helge Deller
    Cc: Kyle McMartin
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

18 Jun, 2009

3 commits

  • asm/desc.h is included in three assembly files, but the only macro
    it defines, GET_DESC_BASE, is never used. This patch removes the
    includes, removes the macro GET_DESC_BASE and the ASSEMBLY guard
    from asm/desc.h.

    Signed-off-by: Alexander van Heukelum
    Signed-off-by: H. Peter Anvin

    Alexander van Heukelum
     
  • The espfix code triggers if we have a protected mode userspace
    application with a 16-bit stack. On returning to userspace, with iret,
    the CPU doesn't restore the high word of the stack pointer. This is an
    "official" bug, and the work-around used in the kernel is to temporarily
    switch to a 32-bit stack segment/pointer pair where the high word of the
    pointer is equal to the high word of the userspace stackpointer.

    The current implementation uses THREAD_SIZE to determine the cut-off,
    but there is no good reason not to use the more natural 64kb... However,
    implementing this by simply substituting THREAD_SIZE with 65536 in
    patch_espfix_desc crashed the test application. patch_espfix_desc tries
    to do what is described above, but gets it subtly wrong if the userspace
    stack pointer is just below a multiple of THREAD_SIZE: an overflow
    occurs to bit 13... With a bit of luck, when the kernelspace
    stackpointer is just below a 64kb-boundary, the overflow then ripples
    trough to bit 16 and userspace will see its stack pointer changed by
    65536.

    This patch moves all espfix code into entry_32.S. Selecting a 16-bit
    cut-off simplifies the code. The game with changing the limit dynamically
    is removed too. It complicates matters and I see no value in it. Changing
    only the top 16-bit word of ESP is one instruction and it also implies
    that only two bytes of the ESPFIX GDT entry need to be changed and this
    can be implemented in just a handful simple to understand instructions.
    As a side effect, the operation to compute the original ESP from the
    ESPFIX ESP and the GDT entry simplifies a bit too, and the remaining
    three instructions have been expanded inline in entry_32.S.

    impact: can now reliably run userspace with ESP=xxxxfffc on 16-bit
    stack segment

    Signed-off-by: Alexander van Heukelum
    Acked-by: Stas Sergeev
    Signed-off-by: H. Peter Anvin

    Alexander van Heukelum
     
  • Returning to a task with a 16-bit stack requires special care: the iret
    instruction does not restore the high word of esp in that case. The
    espfix code fixes this, but currently is not invoked on NMIs. This means
    that a running task gets the upper word of esp clobbered due intervening
    NMIs. To reproduce, compile and run the following program with the nmi
    watchdog enabled (nmi_watchdog=2 on the command line). Using gdb you can
    see that the high bits of esp contain garbage, while the low bits are
    still correct.

    This patch puts the espfix code back into the NMI code path.

    The patch is slightly complicated due to the irqtrace infrastructure not
    being NMI-safe. The NMI return path cannot call TRACE_IRQS_IRET.
    Otherwise, the tail of the normal iret-code is correct for the nmi code
    path too. To be able to share this code-path, the TRACE_IRQS_IRET was
    move up a bit. The espfix code exists after the TRACE_IRQS_IRET, but
    this code explicitly disables interrupts. This short interrupts-off
    section is now not traced anymore. The return-to-kernel path now always
    includes the preliminary test to decide if the espfix code should be
    called. This is never the case, but doing it this way keeps the patch as
    simple as possible and the few extra instructions should not affect
    timing in any significant way.

    #define _GNU_SOURCE
    #include
    #include
    #include
    #include
    #include
    #include

    int modify_ldt(int func, void *ptr, unsigned long bytecount)
    {
    return syscall(SYS_modify_ldt, func, ptr, bytecount);
    }

    /* this is assumed to be usable */
    #define SEGBASEADDR 0x10000
    #define SEGLIMIT 0x20000

    /* 16-bit segment */
    struct user_desc desc = {
    .entry_number = 0,
    .base_addr = SEGBASEADDR,
    .limit = SEGLIMIT,
    .seg_32bit = 0,
    .contents = 0, /* ??? */
    .read_exec_only = 0,
    .limit_in_pages = 0,
    .seg_not_present = 0,
    .useable = 1
    };

    int main(void)
    {
    setvbuf(stdout, NULL, _IONBF, 0);

    /* map a 64 kb segment */
    char *pointer = mmap((void *)SEGBASEADDR, SEGLIMIT+1,
    PROT_EXEC|PROT_READ|PROT_WRITE,
    MAP_SHARED|MAP_ANONYMOUS, -1, 0);
    if (pointer == NULL) {
    printf("could not map space\n");
    return 0;
    }

    /* write ldt, new mode */
    int err = modify_ldt(0x11, &desc, sizeof(desc));
    if (err) {
    printf("error modifying ldt: %i\n", err);
    return 0;
    }

    for (int i=0; i
    Acked-by: Stas Sergeev
    Signed-off-by: H. Peter Anvin

    Alexander van Heukelum
     

14 Mar, 2009

1 commit

  • Fix:

    arch/x86/kernel/entry_32.S:446: Warning: 00000000080001d1 shortened to 00000000000001d1
    arch/x86/kernel/entry_32.S:457: Warning: 000000000800feff shortened to 000000000000feff
    arch/x86/kernel/entry_32.S:527: Warning: 00000000080001d1 shortened to 00000000000001d1
    arch/x86/kernel/entry_32.S:541: Warning: 000000000800feff shortened to 000000000000feff
    arch/x86/kernel/entry_32.S:676: Warning: 0000000008000091 shortened to 0000000000000091

    TIF_SYSCALL_FTRACE is 0x08000000 and until now we checked the
    first 16 bits of the work mask - bit 27 falls outside of that.

    Update the entry_32.S code to check the full 32-bit mask.

    [ %cx => %ecx fix from Cyrill Gorcunov ]

    Signed-off-by: Jaswinder Singh Rajput
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: "H. Peter Anvin"
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jaswinder Singh Rajput
     

24 Feb, 2009

1 commit

  • Impact: Cleanup

    Checkin be44d2aabce2d62f72d5751d1871b6212bf7a1c7 eliminates the use of
    a 16-bit stack for espfix. However, at least one instruction remained
    that only operated on the low 16 bits of %esp.

    This is not a bug per se because the kernel stack is always an aligned
    4K or 8K block. Therefore it cannot cross 64K boundaries; this code,
    in fact, relies strictly on that fact.

    However, it's a lot cleaner (and, for that matter, smaller) to operate
    on the entire 32-bit register.

    Signed-off-by: Stas Sergeev
    CC: Zachary Amsden
    CC: Chuck Ebbert
    Signed-off-by: H. Peter Anvin

    Stas Sergeev
     

14 Feb, 2009

1 commit


13 Feb, 2009

1 commit