18 May, 2010

1 commit

  • * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, fpu: Use static_cpu_has() to implement use_xsave()
    x86: Add new static_cpu_has() function using alternatives
    x86, fpu: Use the proper asm constraint in use_xsave()
    x86, fpu: Unbreak FPU emulation
    x86: Introduce 'struct fpu' and related API
    x86: Eliminate TS_XSAVE
    x86-32: Don't set ignore_fpu_irq in simd exception
    x86: Merge kernel_math_error() into math_error()
    x86: Merge simd_math_error() into math_error()
    x86-32: Rework cache flush denied handler

    Fix trivial conflict in arch/x86/kernel/process.c

    Linus Torvalds
     

04 May, 2010

1 commit

  • The cache flush denied error is an erratum on some AMD 486 clones. If an invd
    instruction is executed in userspace, the processor calls exception 19 (13 hex)
    instead of #GP (13 decimal). On cpus where XMM is not supported, redirect
    exception 19 to do_general_protection(). Also, remove die_if_kernel(), since
    this was the last user.

    Signed-off-by: Brian Gerst
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Brian Gerst
     

26 Mar, 2010

1 commit

  • Support for the PMU's BTS features has been upstreamed in
    v2.6.32, but we still have the old and disabled ptrace-BTS,
    as Linus noticed it not so long ago.

    It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
    regard for other uses (perf) and doesn't provide the flexibility
    needed for perf either.

    Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
    was never used and ptrace-block-step can be implemented using a
    much simpler approach.

    So axe all 3000 lines of it. That includes the *locked_memory*()
    APIs in mm/mlock.c as well.

    Reported-by: Linus Torvalds
    Signed-off-by: Peter Zijlstra
    Cc: Roland McGrath
    Cc: Oleg Nesterov
    Cc: Markus Metzger
    Cc: Steven Rostedt
    Cc: Andrew Morton
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

01 Mar, 2010

1 commit


14 Jan, 2010

1 commit

  • This one is much faster than the spinlock based fallback rwsem code,
    with certain artifical benchmarks having shown 300%+ improvement on
    threaded page faults etc.

    Again, note the 32767-thread limit here. So this really does need that
    whole "make rwsem_count_t be 64-bit and fix the BIAS values to match"
    extension on top of it, but that is conceptually a totally independent
    issue.

    NOT TESTED! The original patch that this all was based on were tested by
    KAMEZAWA Hiroyuki, but maybe I screwed up something when I created the
    cleaned-up series, so caveat emptor..

    Also note that it _may_ be a good idea to mark some more registers
    clobbered on x86-64 in the inline asms instead of saving/restoring them.
    They are inline functions, but they are only used in places where there
    are not a lot of live registers _anyway_, so doing for example the
    clobbers of %r8-%r11 in the asm wouldn't make the fast-path code any
    worse, and would make the slow-path code smaller.

    (Not that the slow-path really matters to that degree. Saving a few
    unnecessary registers is the _least_ of our problems when we hit the slow
    path. The instruction/cycle counting really only matters in the fast
    path).

    Signed-off-by: Linus Torvalds
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Linus Torvalds
     

06 Jan, 2010

1 commit

  • This reverts commit ae1b22f6e46c03cede7cea234d0bf2253b4261cf.

    As Linus said in 982d007a6ee: "There was something really messy about
    cmpxchg8b and clone CPU's, so if you enable it on other CPUs later, do it
    carefully."

    This breaks lguest for those configs, but we can fix that by emulating
    if we have to.

    Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=14884
    Signed-off-by: Rusty Russell
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Rusty Russell
     

09 Dec, 2009

1 commit

  • * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits)
    x86, mm: Correct the implementation of is_untracked_pat_range()
    x86/pat: Trivial: don't create debugfs for memtype if pat is disabled
    x86, mtrr: Fix sorting of mtrr after subtracting
    x86: Move find_smp_config() earlier and avoid bootmem usage
    x86, platform: Change is_untracked_pat_range() to bool; cleanup init
    x86: Change is_ISA_range() into an inline function
    x86, mm: is_untracked_pat_range() takes a normal semiclosed range
    x86, mm: Call is_untracked_pat_range() rather than is_ISA_range()
    x86: UV SGI: Don't track GRU space in PAT
    x86: SGI UV: Fix BAU initialization
    x86, numa: Use near(er) online node instead of roundrobin for NUMA
    x86, numa, bootmem: Only free bootmem on NUMA failure path
    x86: Change crash kernel to reserve via reserve_early()
    x86: Eliminate redundant/contradicting cache line size config options
    x86: When cleaning MTRRs, do not fold WP into UC
    x86: remove "extern" from function prototypes in
    x86, mm: Report state of NX protections during boot
    x86, mm: Clean up and simplify NX enablement
    x86, pageattr: Make set_memory_(x|nx) aware of NX support
    x86, sleep: Always save the value of EFER
    ...

    Fix up conflicts (added both iommu_shutdown and is_untracked_pat_range)
    to 'struct x86_platform_ops') in
    arch/x86/include/asm/x86_init.h
    arch/x86/kernel/x86_init.c

    Linus Torvalds
     

06 Dec, 2009

1 commit


19 Nov, 2009

1 commit

  • Rather than having X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
    (with inconsistent defaults), just having the latter suffices as
    the former can be easily calculated from it.

    To be consistent, also change X86_INTERNODE_CACHE_BYTES to
    X86_INTERNODE_CACHE_SHIFT, and set it to 7 (128 bytes) for NUMA
    to account for last level cache line size (which here matters
    more than L1 cache line size).

    Finally, make sure the default value for X86_L1_CACHE_SHIFT,
    when X86_GENERIC is selected, is being seen before that for the
    individual CPU model options (other than on x86-64, where
    GENERIC_CPU is part of the choice construct, X86_GENERIC is a
    separate option on ix86).

    Signed-off-by: Jan Beulich
    Acked-by: Ravikiran Thirumalai
    Acked-by: Nick Piggin
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jan Beulich
     

26 Oct, 2009

1 commit

  • Commit 79e1dd05d1a22 "x86: Provide an alternative() based
    cmpxchg64()" broke lguest, even on systems which have cmpxchg8b
    support. The emulation code gets used until alternatives get
    run, but it contains native instructions, not their paravirt
    alternatives.

    The simplest fix is to turn this code off except for 386 and 486
    builds.

    Reported-by: Johannes Stezenbach
    Signed-off-by: Rusty Russell
    Acked-by: H. Peter Anvin
    Cc: lguest@ozlabs.org
    Cc: Arjan van de Ven
    Cc: Jeremy Fitzhardinge
    Cc: Linus Torvalds
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Rusty Russell
     

03 Oct, 2009

1 commit


01 Oct, 2009

1 commit

  • Try to avoid the 'alternates()' code when we can statically
    determine that cmpxchg8b is fine. We already have that
    CONFIG_x86_CMPXCHG64 (enabled by PAE support), and we could easily
    also enable it for some of the CPU cases.

    Note, this patch only adds CMPXCHG8B for the obvious Intel CPU's,
    not for others. (There was something really messy about cmpxchg8b
    and clone CPU's, so if you enable it on other CPUs later, do it
    carefully.)

    If we avoid that asm-alternative thing when we can assume the
    instruction exists, we'll generate less support crud, and we'll
    avoid the whole issue with that extra 'nop' for padding instruction
    sizes etc.

    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Linus Torvalds
     

23 Aug, 2009

1 commit


16 Apr, 2009

1 commit

  • Oleg Nesterov found a couple of races in the ptrace-bts code
    and fixes are queued up for it but they did not get ready in time
    for the merge window. We'll merge them in v2.6.31 - until then
    mark the feature as CONFIG_BROKEN. There's no user-space yet
    making use of this so it's not a big issue.

    Cc:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

14 Mar, 2009

1 commit

  • there should be no difference, except:

    * the 64bit variant now also initializes the padlock unit.
    * ->c_early_init() is executed again from ->c_init()
    * the 64bit fixups made into 32bit path.

    Signed-off-by: Sebastian Andrzej Siewior
    Cc: herbert@gondor.apana.org.au
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Sebastian Andrzej Siewior
     

06 Feb, 2009

2 commits


05 Feb, 2009

1 commit


21 Jan, 2009

1 commit

  • Fix:

    arch/x86/mm/tlb.c:47: error: ‘CONFIG_X86_INTERNODE_CACHE_BYTES’ undeclared here (not in a function)

    The CONFIG_X86_INTERNODE_CACHE_BYTES symbol is only defined on 64-bit,
    because vsmp support is 64-bit only. Define it on 32-bit too - where it
    will always be equal to X86_L1_CACHE_BYTES.

    Also move the default of X86_L1_CACHE_BYTES (which is separate from the
    more commonly used L1_CACHE_SHIFT kconfig symbol) from 128 bytes to
    64 bytes.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

14 Jan, 2009

1 commit

  • Right now the generic cacheline size is 128 bytes - that is wasteful
    when structures are aligned, as all modern x86 CPUs have an (effective)
    cacheline sizes of 64 bytes.

    It was set to 128 bytes due to some cacheline aliasing problems on
    older P4 systems, but those are many years old and we dont optimize
    for them anymore. (They'll still get the 128 bytes cacheline size if
    the kernel is specifically built for Pentium 4)

    Signed-off-by: Ingo Molnar
    Acked-by: Arjan van de Ven

    Ingo Molnar
     

06 Jan, 2009

1 commit


26 Nov, 2008

1 commit

  • Impact: add new ftrace plugin

    A prototype for a BTS ftrace plug-in.

    The tracer collects branch trace in a cyclic buffer for each cpu.

    The tracer is not configurable and the trace for each snapshot is
    appended when doing cat /debug/tracing/trace.

    This is a proof of concept that will be extended with future patches
    to become a (hopefully) useful tool.

    Signed-off-by: Markus Metzger
    Signed-off-by: Ingo Molnar

    Markus Metzger
     

28 Oct, 2008

1 commit


13 Oct, 2008

2 commits


12 Oct, 2008

2 commits


10 Sep, 2008

3 commits


09 Sep, 2008

1 commit

  • On 32-bit, at least the generic nops are fairly reasonable, but the
    default nops for 64-bit really look pretty sad, and the P6 nops really do
    look better.

    So I would suggest perhaps moving the static P6 nop selection into the
    CONFIG_X86_64 thing.

    The alternative is to just get rid of that static nop selection, and just
    have two cases: 32-bit and 64-bit, and just pick obviously safe cases for
    them.

    Signed-off-by: H. Peter Anvin

    Linus Torvalds
     

08 Sep, 2008

1 commit

  • arch/x86/kernel/cpu/amd.c is now 100% identical to
    arch/x86/kernel/cpu/amd_64.c, so use amd.c on 64-bit too
    and fix up the namespace impact.

    Simplify the Kconfig glue as well.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     

18 Aug, 2008

1 commit

  • This patch adds some configuration options that allow to compile out
    CPU vendor-specific code in x86 kernels (in arch/x86/kernel/cpu). The
    new configuration options are only visible when CONFIG_EMBEDDED is
    selected, as they are mostly interesting for space savings reasons.

    An example of size saving, on x86 with only Intel CPU support:

    text data bss dec hex filename
    1125479 118760 212992 1457231 163c4f vmlinux.old
    1121355 116536 212992 1450883 162383 vmlinux
    -4124 -2224 0 -6348 -18CC +/-

    However, I'm not exactly sure that the Kconfig wording is correct with
    regard to !64BIT / 64BIT.

    [ mingo@elte.hu: convert macro to inline ]

    Signed-off-by: Thomas Petazzoni
    Signed-off-by: Ingo Molnar

    Thomas Petazzoni
     

25 Jul, 2008

1 commit


22 Jul, 2008

1 commit

  • currently if you use PTRACE_SINGLEBLOCK on AMD K6-3 (i586) it will crash.
    Kernel now wrongly assumes existing DEBUGCTLMSR MSR register there.

    Removed the assumption also for some other non-K6 CPUs but I am not sure there
    (but it can only bring small inefficiency there if my assumption is wrong).

    Based on info from Roland McGrath, Chuck Ebbert and Mikulas Patocka.
    More info at:
    https://bugzilla.redhat.com/show_bug.cgi?id=456175

    Signed-off-by: Jan Kratochvil
    Cc:
    Signed-off-by: Ingo Molnar

    Jan Kratochvil
     

18 Jul, 2008

1 commit

  • Use alternatives to select the workaround for the 11AP Pentium erratum
    for the affected steppings on the fly rather than build time. Remove the
    X86_GOOD_APIC configuration option and replace all the calls to
    apic_write_around() with plain apic_write(), protecting accesses to the
    ESR as appropriate due to the 3AP Pentium erratum. Remove
    apic_read_around() and all its invocations altogether as not needed.
    Remove apic_write_atomic() and all its implementing backends. The use of
    ASM_OUTPUT2() is not strictly needed for input constraints, but I have
    used it for readability's sake.

    I had the feeling no one else was brave enough to do it, so I went ahead
    and here it is. Verified by checking the generated assembly and tested
    with both a 32-bit and a 64-bit configuration, also with the 11AP
    "feature" forced on and verified with gdb on /proc/kcore to work as
    expected (as an 11AP machines are quite hard to get hands on these days).
    Some script complained about the use of "volatile", but apic_write() needs
    it for the same reason and is effectively a replacement for writel(), so I
    have disregarded it.

    I am not sure what the policy wrt defconfig files is, they are generated
    and there is risk of a conflict resulting from an unrelated change, so I
    have left changes to them out. The option will get removed from them at
    the next run.

    Some testing with machines other than mine will be needed to avoid some
    stupid mistake, but despite its volume, the change is not really that
    intrusive, so I am fairly confident that because it works for me, it will
    everywhere.

    Signed-off-by: Maciej W. Rozycki
    Signed-off-by: Ingo Molnar

    Maciej W. Rozycki
     

09 Jul, 2008

1 commit


13 May, 2008

2 commits

  • .. allowing the former to be use in non-PAE kernels, too.

    Signed-off-by: Jan Beulich
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Jan Beulich
     
  • Polish the ds.h interface and add support for PEBS.

    Ds.c is meant to be the resource allocator for per-thread and per-cpu
    BTS and PEBS recording.
    It is used by ptrace/utrace to provide execution tracing of debugged tasks.
    It will be used by profilers (e.g. perfmon2).
    It may be used by kernel debuggers to provide a kernel execution trace.

    Changes in detail:
    - guard DS and ptrace by CONFIG macros
    - separate DS and BTS more clearly
    - simplify field accesses
    - add functions to manage PEBS buffers
    - add simple protection/allocation mechanism
    - added support for Atom

    Opens:
    - buffer overflow handling
    Currently, only circular buffers are supported. This is all we need
    for debugging. Profilers would want an overflow notification.
    This is planned to be added when perfmon2 is made to use the ds.h
    interface.
    - utrace intermediate layer

    Signed-off-by: Markus Metzger
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Markus Metzger
     

01 May, 2008

1 commit