19 Oct, 2010

1 commit

  • Provide a mechanism that allows running code in IRQ context. It is
    most useful for NMI code that needs to interact with the rest of the
    system -- like wakeup a task to drain buffers.

    Perf currently has such a mechanism, so extract that and provide it as
    a generic feature, independent of perf so that others may also
    benefit.

    The IRQ context callback is generated through self-IPIs where
    possible, or on architectures like powerpc the decrementer (the
    built-in timer facility) is set to generate an interrupt immediately.

    Architectures that don't have anything like this get to do with a
    callback from the timer tick. These architectures can call
    irq_work_run() at the tail of any IRQ handlers that might enqueue such
    work (like the perf IRQ handler) to avoid undue latencies in
    processing the work.

    Signed-off-by: Peter Zijlstra
    Acked-by: Kyle McMartin
    Acked-by: Martin Schwidefsky
    [ various fixes ]
    Signed-off-by: Huang Ying
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

08 Oct, 2010

1 commit


28 Sep, 2010

2 commits


27 Sep, 2010

1 commit

  • While debugging bit_spin_lock() hang, it was tracked down to gcc-4.4
    misoptimization of non-inlined constant_test_bit() due to non-volatile
    addr when 'const volatile unsigned long *addr' cast to 'unsigned long *'
    with subsequent unconditional jump to pause (and not to the test) leading
    to hang.

    Compiling with gcc-4.3 or disabling CONFIG_OPTIMIZE_INLINING yields inlined
    constant_test_bit() and correct jump, thus working around the kernel bug.

    Other arches than asm-x86 may implement this slightly differently;
    2.6.29 mitigates the misoptimization by changing the function prototype
    (commit c4295fbb6048d85f0b41c5ced5cbf63f6811c46c) but probably fixing the issue
    itself is better.

    Signed-off-by: Alexander Chumachenko
    Signed-off-by: Michael Shigorin
    Acked-by: Linus Torvalds
    Signed-off-by: H. Peter Anvin

    Alexander Chumachenko
     

25 Sep, 2010

1 commit

  • Using cpuid_eax() to determine feature availability on other than
    the current CPU is invalid. And feature availability should also be
    checked in the hotplug code path.

    Signed-off-by: Jan Beulich
    Cc: Rudolf Marek
    Cc: Fenghua Yu
    Signed-off-by: Guenter Roeck

    Jan Beulich
     

24 Sep, 2010

2 commits


23 Sep, 2010

6 commits

  • This patch adds a workaround for an IOMMU BIOS problem to
    the AMD IOMMU driver. The result of the bug is that the
    IOMMU does not execute commands anymore when the system
    comes out of the S3 state resulting in system failure. The
    bug in the BIOS is that is does not restore certain hardware
    specific registers correctly. This workaround reads out the
    contents of these registers at boot time and restores them
    on resume from S3. The workaround is limited to the specific
    IOMMU chipset where this problem occurs.

    Cc: stable@kernel.org
    Signed-off-by: Joerg Roedel

    Joerg Roedel
     
  • This patch moves the setting of the configuration and
    feature flags out out the acpi table parsing path and moves
    it into the iommu-enable path. This is needed to reliably
    fix resume-from-s3.

    Cc: stable@kernel.org
    Signed-off-by: Joerg Roedel

    Joerg Roedel
     
  • The structure in the x86 jump label code uses the typedef jump_label_t,
    which is defined by the #ifdef arch type. The structure does not need
    to be duplicated there.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • add x86 support for jump label. I'm keeping this patch separate so its clear
    to arch maintainers what was required for x86 support this new feature.
    Hopefully, it wouldn't be too painful for other archs.

    Signed-off-by: Jason Baron
    LKML-Reference:

    [ cleaned up some formatting ]

    Signed-off-by: Steven Rostedt

    Jason Baron
     
  • base patch to implement 'jump labeling'. Based on a new 'asm goto' inline
    assembly gcc mechanism, we can now branch to labels from an 'asm goto'
    statment. This allows us to create a 'no-op' fastpath, which can subsequently
    be patched with a jump to the slowpath code. This is useful for code which
    might be rarely used, but which we'd like to be able to call, if needed.
    Tracepoints are the current usecase that these are being implemented for.

    Acked-by: David S. Miller
    Signed-off-by: Jason Baron
    LKML-Reference:

    [ cleaned up some formating ]

    Signed-off-by: Steven Rostedt

    Jason Baron
     
  • Conflicts:
    kernel/hw_breakpoint.c

    Merge reason: resolve the conflict.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

22 Sep, 2010

1 commit


21 Sep, 2010

3 commits


17 Sep, 2010

2 commits

  • …git/tip/linux-2.6-tip

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: hpet: Work around hardware stupidity
    x86, build: Disable -fPIE when compiling with CONFIG_CC_STACKPROTECTOR=y
    x86, cpufeature: Suppress compiler warning with gcc 3.x
    x86, UV: Fix initialization of max_pnode

    Linus Torvalds
     
  • Lengths and types of breakpoints are encoded in a half byte
    into CPU registers. However when we extract these values
    and store them, we add a high half byte part to them: 0x40 to the
    length and 0x80 to the type.
    When that gets reloaded to the CPU registers, the high part
    is masked.

    While making the instruction breakpoints available for perf,
    I zapped that high part on instruction breakpoint encoding
    and that broke the arch -> generic translation used by ptrace
    instruction breakpoints. Writing dr7 to set an inst breakpoint
    was then failing.

    There is no apparent reason for these high parts so we could get
    rid of them altogether. That's an invasive change though so let's
    do that later and for now fix the problem by restoring that inst
    breakpoint high part encoding in this sole patch.

    Reported-by: Kelvie Wong
    Signed-off-by: Frederic Weisbecker
    Cc: Prasad
    Cc: Mahesh Salgaonkar
    Cc: Will Deacon

    Frederic Weisbecker
     

15 Sep, 2010

3 commits

  • …stedt/linux-2.6-trace into perf/core

    Ingo Molnar
     
  • compat_alloc_user_space() expects the caller to independently call
    access_ok() to verify the returned area. A missing call could
    introduce problems on some architectures.

    This patch incorporates the access_ok() check into
    compat_alloc_user_space() and also adds a sanity check on the length.
    The existing compat_alloc_user_space() implementations are renamed
    arch_compat_alloc_user_space() and are used as part of the
    implementation of the new global function.

    This patch assumes NULL will cause __get_user()/__put_user() to either
    fail or access userspace on all architectures. This should be
    followed by checking the return value of compat_access_user_space()
    for NULL in the callers, at which time the access_ok() in the callers
    can also be removed.

    Reported-by: Ben Hawkes
    Signed-off-by: H. Peter Anvin
    Acked-by: Benjamin Herrenschmidt
    Acked-by: Chris Metcalf
    Acked-by: David S. Miller
    Acked-by: Ingo Molnar
    Acked-by: Thomas Gleixner
    Acked-by: Tony Luck
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Fenghua Yu
    Cc: H. Peter Anvin
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: James Bottomley
    Cc: Kyle McMartin
    Cc: Martin Schwidefsky
    Cc: Paul Mackerras
    Cc: Ralf Baechle
    Cc:

    H. Peter Anvin
     
  • This more or less reverts commits 08be979 (x86: Force HPET
    readback_cmp for all ATI chipsets) and 30a564be (x86, hpet: Restrict
    read back to affected ATI chipsets) to the status of commit 8da854c
    (x86, hpet: Erratum workaround for read after write of HPET
    comparator).

    The delta to commit 8da854c is mostly comments and the change from
    WARN_ONCE to printk_once as we know the call path of this function
    already.

    This needs really in depth explanation:

    First of all the HPET design is a complete failure. Having a counter
    compare register which generates an interrupt on matching values
    forces the software to do at least one superfluous readback of the
    counter register.

    While it is nice in theory to program "absolute" time events it is
    practically useless because the timer runs at some absurd frequency
    which can never be matched to real world units. So we are forced to
    calculate a relative delta and this forces a readout of the actual
    counter value, adding the delta and programming the compare
    register. When the delta is small enough we run into the danger that
    we program a compare value which is already in the past. Due to the
    compare for equal nature of HPET we need to read back the counter
    value after writing the compare rehgister (btw. this is necessary for
    absolute timeouts as well) to make sure that we did not miss the timer
    event. We try to work around that by setting the minimum delta to a
    value which is larger than the theoretical time which elapses between
    the counter readout and the compare register write, but that's only
    true in theory. A NMI or SMI which hits between the readout and the
    write can easily push us beyond that limit. This would result in
    waiting for the next HPET timer interrupt until the 32bit wraparound
    of the counter happens which takes about 306 seconds.

    So we designed the next event function to look like:

    match = read_cnt() + delta;
    write_compare_ref(match);
    return read_cnt() < match ? 0 : -ETIME;

    At some point we got into trouble with certain ATI chipsets. Even the
    above "safe" procedure failed. The reason was that the write to the
    compare register was delayed probably for performance reasons. The
    theory was that they wanted to avoid the synchronization of the write
    with the HPET clock, which is understandable. So the write does not
    hit the compare register directly instead it goes to some intermediate
    register which is copied to the real compare register in sync with the
    HPET clock. That opens another window for hitting the dreaded "wait
    for a wraparound" problem.

    To work around that "optimization" we added a read back of the compare
    register which either enforced the update of the just written value or
    just delayed the readout of the counter enough to avoid the issue. We
    unfortunately never got any affirmative info from ATI/AMD about this.

    One thing is sure, that we nuked the performance "optimization" that
    way completely and I'm pretty sure that the result is worse than
    before some HW folks came up with those.

    Just for paranoia reasons I added a check whether the read back
    compare register value was the same as the value we wrote right
    before. That paranoia check triggered a couple of years after it was
    added on an Intel ICH9 chipset. Venki added a workaround (commit
    8da854c) which was reading the compare register twice when the first
    check failed. We considered this to be a penalty in general and
    restricted the readback (thus the wasted CPU cycles) to the known to
    be affected ATI chipsets.

    This turned out to be a utterly wrong decision. 2.6.35 testers
    experienced massive problems and finally one of them bisected it down
    to commit 30a564be which spured some further investigation.

    Finally we got confirmation that the write to the compare register can
    be delayed by up to two HPET clock cycles which explains the problems
    nicely. All we can do about this is to go back to Venki's initial
    workaround in a slightly modified version.

    Just for the record I need to say, that all of this could have been
    avoided if hardware designers and of course the HPET committee would
    have thought about the consequences for a split second. It's out of my
    comprehension why designing a working timer is so hard. There are two
    ways to achieve it:

    1) Use a counter wrap around aware compare_reg
    Reported-by: Artur Skawina
    Reported-by: Damien Wyart
    Reported-by: John Drescher
    Cc: Venkatesh Pallipadi
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Arjan van de Ven
    Cc: Andreas Herrmann
    Cc: Borislav Petkov
    Cc: stable@kernel.org
    Acked-by: Suresh Siddha
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

14 Sep, 2010

1 commit

  • Gcc 3.x generates a warning

    arch/x86/include/asm/cpufeature.h: In function `__static_cpu_has':
    arch/x86/include/asm/cpufeature.h:326: warning: asm operand 1 probably doesn't match constraints

    on each file.
    But static_cpu_has() for gcc 3.x does not need __static_cpu_has().

    Signed-off-by: Tetsuo Handa
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Tetsuo Handa
     

10 Sep, 2010

1 commit


09 Sep, 2010

2 commits


08 Sep, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
    PCI: bus speed strings should be const
    PCI hotplug: Fix build with CONFIG_ACPI unset
    PCI: PCIe: Remove the port driver module exit routine
    PCI: PCIe: Move PCIe PME code to the pcie directory
    PCI: PCIe: Disable PCIe port services during port initialization
    PCI: PCIe: Ask BIOS for control of all native services at once
    ACPI/PCI: Negotiate _OSC control bits before requesting them
    ACPI/PCI: Do not preserve _OSC control bits returned by a query
    ACPI/PCI: Make acpi_pci_query_osc() return control bits
    ACPI/PCI: Reorder checks in acpi_pci_osc_control_set()
    PCI: PCIe: Introduce commad line switch for disabling port services
    PCI: PCIe AER: Introduce pci_aer_available()
    x86/PCI: only define pci_domain_nr if PCI and PCI_DOMAINS are set
    PCI: provide stub pci_domain_nr function for !CONFIG_PCI configs

    Linus Torvalds
     

05 Sep, 2010

1 commit


01 Sep, 2010

1 commit

  • Implements verification of

    - Bits of ESCR EventMask field (meaningful bits in field are hardware
    predefined and others bits should be set to zero)

    - INSTR_COMPLETED event (it is available on predefined cpu model only)

    - Thread shared events (they should be guarded by "perf_event_paranoid"
    sysctl due to security reason). The side effect of this action is
    that PERF_COUNT_HW_BUS_CYCLES become a "paranoid" general event.

    Signed-off-by: Cyrill Gorcunov
    Tested-by: Lin Ming
    Cc: Frederic Weisbecker
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Cyrill Gorcunov
     

25 Aug, 2010

1 commit


21 Aug, 2010

1 commit


20 Aug, 2010

1 commit

  • TSC's get reset after suspend/resume (even on cpu's with invariant TSC
    which runs at a constant rate across ACPI P-, C- and T-states). And in
    some systems BIOS seem to reinit TSC to arbitrary large value (still
    sync'd across cpu's) during resume.

    This leads to a scenario of scheduler rq->clock (sched_clock_cpu()) less
    than rq->age_stamp (introduced in 2.6.32). This leads to a big value
    returned by scale_rt_power() and the resulting big group power set by the
    update_group_power() is causing improper load balancing between busy and
    idle cpu's after suspend/resume.

    This resulted in multi-threaded workloads (like kernel-compilation) go
    slower after suspend/resume cycle on core i5 laptops.

    Fix this by recomputing cyc2ns_offset's during resume, so that
    sched_clock() continues from the point where it was left off during
    suspend.

    Reported-by: Florian Pritz
    Signed-off-by: Suresh Siddha
    Cc: # [v2.6.32+]
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Suresh Siddha
     

19 Aug, 2010

2 commits

  • Fix dummy inline stubs for trampoline-related functions when no
    trampolines exist (until we get rid of the no-trampoline case
    entirely.)

    Signed-off-by: H. Peter Anvin
    Cc: Joerg Roedel
    Cc: Borislav Petkov
    LKML-Reference:

    H. Peter Anvin
     
  • This patch fixes machine crashes which occur when heavily exercising the
    CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
    AMD Erratum 383 and result in a fatal machine check exception. Here's
    the scenario:

    1. On 32-bit, the swapper_pg_dir page table is used as the initial page
    table for booting a secondary CPU.

    2. To make this work, swapper_pg_dir needs a direct mapping of physical
    memory in it (the low mappings). By adding those low, large page (2M)
    mappings (PAE kernel), we create the necessary conditions for Erratum
    383 to occur.

    3. Other CPUs which do not participate in the off- and onlining game may
    use swapper_pg_dir while the low mappings are present (when leave_mm is
    called). For all steps below, the CPU referred to is a CPU that is using
    swapper_pg_dir, and not the CPU which is being onlined.

    4. The presence of the low mappings in swapper_pg_dir can result
    in TLB entries for addresses below __PAGE_OFFSET to be established
    speculatively. These TLB entries are marked global and large.

    5. When the CPU with such TLB entry switches to another page table, this
    TLB entry remains because it is global.

    6. The process then generates an access to an address covered by the
    above TLB entry but there is a permission mismatch - the TLB entry
    covers a large global page not accessible to userspace.

    7. Due to this permission mismatch a new 4kb, user TLB entry gets
    established. Further, Erratum 383 provides for a small window of time
    where both TLB entries are present. This results in an uncorrectable
    machine check exception signalling a TLB multimatch which panics the
    machine.

    There are two ways to fix this issue:

    1. Always do a global TLB flush when a new cr3 is loaded and the
    old page table was swapper_pg_dir. I consider this a hack hard
    to understand and with performance implications

    2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
    does.

    This patch implements solution 2. It introduces a trampoline_pg_dir
    which has the same layout as swapper_pg_dir with low_mappings. This page
    table is used as the initial page table of the booting CPU. Later in the
    bringup process, it switches to swapper_pg_dir and does a global TLB
    flush. This fixes the crashes in our test cases.

    -v2: switch to swapper_pg_dir right after entering start_secondary() so
    that we are able to access percpu data which might not be mapped in the
    trampoline page table.

    Signed-off-by: Joerg Roedel
    LKML-Reference:
    Signed-off-by: Borislav Petkov
    Signed-off-by: H. Peter Anvin

    Joerg Roedel
     

18 Aug, 2010

2 commits

  • Make do_execve() take a const filename pointer so that kernel_execve() compiles
    correctly on ARM:

    arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type

    This also requires the argv and envp arguments to be consted twice, once for
    the pointer array and once for the strings the array points to. This is
    because do_execve() passes a pointer to the filename (now const) to
    copy_strings_kernel(). A simpler alternative would be to cast the filename
    pointer in do_execve() when it's passed to copy_strings_kernel().

    do_execve() may not change any of the strings it is passed as part of the argv
    or envp lists as they are some of them in .rodata, so marking these strings as
    const should be fine.

    Further kernel_execve() and sys_execve() need to be changed to match.

    This has been test built on x86_64, frv, arm and mips.

    Signed-off-by: David Howells
    Tested-by: Ralf Baechle
    Acked-by: Russell King
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Otherwise we'll duplicate definitions with the pci.h stubs.

    Reported-by: Randy Dunlap
    Acked-by: Randy Dunlap
    Signed-off-by: Jesse Barnes

    Jesse Barnes
     

15 Aug, 2010

1 commit


14 Aug, 2010

2 commits