19 Oct, 2010
1 commit
-
Provide a mechanism that allows running code in IRQ context. It is
most useful for NMI code that needs to interact with the rest of the
system -- like wakeup a task to drain buffers.Perf currently has such a mechanism, so extract that and provide it as
a generic feature, independent of perf so that others may also
benefit.The IRQ context callback is generated through self-IPIs where
possible, or on architectures like powerpc the decrementer (the
built-in timer facility) is set to generate an interrupt immediately.Architectures that don't have anything like this get to do with a
callback from the timer tick. These architectures can call
irq_work_run() at the tail of any IRQ handlers that might enqueue such
work (like the perf IRQ handler) to avoid undue latencies in
processing the work.Signed-off-by: Peter Zijlstra
Acked-by: Kyle McMartin
Acked-by: Martin Schwidefsky
[ various fixes ]
Signed-off-by: Huang Ying
LKML-Reference:
Signed-off-by: Ingo Molnar
08 Oct, 2010
1 commit
-
Conflicts:
arch/x86/kernel/module.cMerge reason: Resolve the conflict, pick up fixes.
Signed-off-by: Ingo Molnar
28 Sep, 2010
2 commits
-
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Avoid 'constant_test_bit()' misoptimization due to cast to non-volatile -
…git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/amd-iommu: Fix rounding-bug in __unmap_single
x86/amd-iommu: Work around S3 BIOS bug
x86/amd-iommu: Set iommu configuration flags in enable-loop
x86, setup: Fix earlyprintk=serial,0x3f8,115200
x86, setup: Fix earlyprintk=serial,ttyS0,115200
27 Sep, 2010
1 commit
-
While debugging bit_spin_lock() hang, it was tracked down to gcc-4.4
misoptimization of non-inlined constant_test_bit() due to non-volatile
addr when 'const volatile unsigned long *addr' cast to 'unsigned long *'
with subsequent unconditional jump to pause (and not to the test) leading
to hang.Compiling with gcc-4.3 or disabling CONFIG_OPTIMIZE_INLINING yields inlined
constant_test_bit() and correct jump, thus working around the kernel bug.Other arches than asm-x86 may implement this slightly differently;
2.6.29 mitigates the misoptimization by changing the function prototype
(commit c4295fbb6048d85f0b41c5ced5cbf63f6811c46c) but probably fixing the issue
itself is better.Signed-off-by: Alexander Chumachenko
Signed-off-by: Michael Shigorin
Acked-by: Linus Torvalds
Signed-off-by: H. Peter Anvin
25 Sep, 2010
1 commit
-
Using cpuid_eax() to determine feature availability on other than
the current CPU is invalid. And feature availability should also be
checked in the hotplug code path.Signed-off-by: Jan Beulich
Cc: Rudolf Marek
Cc: Fenghua Yu
Signed-off-by: Guenter Roeck
24 Sep, 2010
2 commits
-
…/joro/linux-2.6-iommu into x86/urgent
-
…stedt/linux-2.6-trace into perf/core
23 Sep, 2010
6 commits
-
This patch adds a workaround for an IOMMU BIOS problem to
the AMD IOMMU driver. The result of the bug is that the
IOMMU does not execute commands anymore when the system
comes out of the S3 state resulting in system failure. The
bug in the BIOS is that is does not restore certain hardware
specific registers correctly. This workaround reads out the
contents of these registers at boot time and restores them
on resume from S3. The workaround is limited to the specific
IOMMU chipset where this problem occurs.Cc: stable@kernel.org
Signed-off-by: Joerg Roedel -
This patch moves the setting of the configuration and
feature flags out out the acpi table parsing path and moves
it into the iommu-enable path. This is needed to reliably
fix resume-from-s3.Cc: stable@kernel.org
Signed-off-by: Joerg Roedel -
The structure in the x86 jump label code uses the typedef jump_label_t,
which is defined by the #ifdef arch type. The structure does not need
to be duplicated there.Signed-off-by: Steven Rostedt
-
add x86 support for jump label. I'm keeping this patch separate so its clear
to arch maintainers what was required for x86 support this new feature.
Hopefully, it wouldn't be too painful for other archs.Signed-off-by: Jason Baron
LKML-Reference:[ cleaned up some formatting ]
Signed-off-by: Steven Rostedt
-
base patch to implement 'jump labeling'. Based on a new 'asm goto' inline
assembly gcc mechanism, we can now branch to labels from an 'asm goto'
statment. This allows us to create a 'no-op' fastpath, which can subsequently
be patched with a jump to the slowpath code. This is useful for code which
might be rarely used, but which we'd like to be able to call, if needed.
Tracepoints are the current usecase that these are being implemented for.Acked-by: David S. Miller
Signed-off-by: Jason Baron
LKML-Reference:[ cleaned up some formating ]
Signed-off-by: Steven Rostedt
-
Conflicts:
kernel/hw_breakpoint.cMerge reason: resolve the conflict.
Signed-off-by: Ingo Molnar
22 Sep, 2010
1 commit
-
…/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
hw breakpoints: Fix pid namespace bug
x86: Fix instruction breakpoint encoding
oprofile: Add Support for Intel CPU Family 6 / Model 22 (Intel Celeron 540)
kprobes: Fix Kconfig dependency
21 Sep, 2010
3 commits
-
Merge reason: Pick up the latest fixes in -rc5.
Signed-off-by: Ingo Molnar
-
Make text_poke_early available outside of alternative.c. The jump label
patchset wants to make use of it in order to set up the optimal no-op
sequences at run-time.Signed-off-by: Jason Baron
LKML-Reference:
Signed-off-by: Steven Rostedt -
Move Steve's code for finding the best 5-byte no-op from ftrace.c to
alternative.c. The idea is that other consumers (in this case jump label)
want to make use of that code.Signed-off-by: Jason Baron
LKML-Reference:
Signed-off-by: Steven Rostedt
17 Sep, 2010
2 commits
-
…git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: hpet: Work around hardware stupidity
x86, build: Disable -fPIE when compiling with CONFIG_CC_STACKPROTECTOR=y
x86, cpufeature: Suppress compiler warning with gcc 3.x
x86, UV: Fix initialization of max_pnode -
Lengths and types of breakpoints are encoded in a half byte
into CPU registers. However when we extract these values
and store them, we add a high half byte part to them: 0x40 to the
length and 0x80 to the type.
When that gets reloaded to the CPU registers, the high part
is masked.While making the instruction breakpoints available for perf,
I zapped that high part on instruction breakpoint encoding
and that broke the arch -> generic translation used by ptrace
instruction breakpoints. Writing dr7 to set an inst breakpoint
was then failing.There is no apparent reason for these high parts so we could get
rid of them altogether. That's an invasive change though so let's
do that later and for now fix the problem by restoring that inst
breakpoint high part encoding in this sole patch.Reported-by: Kelvie Wong
Signed-off-by: Frederic Weisbecker
Cc: Prasad
Cc: Mahesh Salgaonkar
Cc: Will Deacon
15 Sep, 2010
3 commits
-
…stedt/linux-2.6-trace into perf/core
-
compat_alloc_user_space() expects the caller to independently call
access_ok() to verify the returned area. A missing call could
introduce problems on some architectures.This patch incorporates the access_ok() check into
compat_alloc_user_space() and also adds a sanity check on the length.
The existing compat_alloc_user_space() implementations are renamed
arch_compat_alloc_user_space() and are used as part of the
implementation of the new global function.This patch assumes NULL will cause __get_user()/__put_user() to either
fail or access userspace on all architectures. This should be
followed by checking the return value of compat_access_user_space()
for NULL in the callers, at which time the access_ok() in the callers
can also be removed.Reported-by: Ben Hawkes
Signed-off-by: H. Peter Anvin
Acked-by: Benjamin Herrenschmidt
Acked-by: Chris Metcalf
Acked-by: David S. Miller
Acked-by: Ingo Molnar
Acked-by: Thomas Gleixner
Acked-by: Tony Luck
Cc: Andrew Morton
Cc: Arnd Bergmann
Cc: Fenghua Yu
Cc: H. Peter Anvin
Cc: Heiko Carstens
Cc: Helge Deller
Cc: James Bottomley
Cc: Kyle McMartin
Cc: Martin Schwidefsky
Cc: Paul Mackerras
Cc: Ralf Baechle
Cc: -
This more or less reverts commits 08be979 (x86: Force HPET
readback_cmp for all ATI chipsets) and 30a564be (x86, hpet: Restrict
read back to affected ATI chipsets) to the status of commit 8da854c
(x86, hpet: Erratum workaround for read after write of HPET
comparator).The delta to commit 8da854c is mostly comments and the change from
WARN_ONCE to printk_once as we know the call path of this function
already.This needs really in depth explanation:
First of all the HPET design is a complete failure. Having a counter
compare register which generates an interrupt on matching values
forces the software to do at least one superfluous readback of the
counter register.While it is nice in theory to program "absolute" time events it is
practically useless because the timer runs at some absurd frequency
which can never be matched to real world units. So we are forced to
calculate a relative delta and this forces a readout of the actual
counter value, adding the delta and programming the compare
register. When the delta is small enough we run into the danger that
we program a compare value which is already in the past. Due to the
compare for equal nature of HPET we need to read back the counter
value after writing the compare rehgister (btw. this is necessary for
absolute timeouts as well) to make sure that we did not miss the timer
event. We try to work around that by setting the minimum delta to a
value which is larger than the theoretical time which elapses between
the counter readout and the compare register write, but that's only
true in theory. A NMI or SMI which hits between the readout and the
write can easily push us beyond that limit. This would result in
waiting for the next HPET timer interrupt until the 32bit wraparound
of the counter happens which takes about 306 seconds.So we designed the next event function to look like:
match = read_cnt() + delta;
write_compare_ref(match);
return read_cnt() < match ? 0 : -ETIME;At some point we got into trouble with certain ATI chipsets. Even the
above "safe" procedure failed. The reason was that the write to the
compare register was delayed probably for performance reasons. The
theory was that they wanted to avoid the synchronization of the write
with the HPET clock, which is understandable. So the write does not
hit the compare register directly instead it goes to some intermediate
register which is copied to the real compare register in sync with the
HPET clock. That opens another window for hitting the dreaded "wait
for a wraparound" problem.To work around that "optimization" we added a read back of the compare
register which either enforced the update of the just written value or
just delayed the readout of the counter enough to avoid the issue. We
unfortunately never got any affirmative info from ATI/AMD about this.One thing is sure, that we nuked the performance "optimization" that
way completely and I'm pretty sure that the result is worse than
before some HW folks came up with those.Just for paranoia reasons I added a check whether the read back
compare register value was the same as the value we wrote right
before. That paranoia check triggered a couple of years after it was
added on an Intel ICH9 chipset. Venki added a workaround (commit
8da854c) which was reading the compare register twice when the first
check failed. We considered this to be a penalty in general and
restricted the readback (thus the wasted CPU cycles) to the known to
be affected ATI chipsets.This turned out to be a utterly wrong decision. 2.6.35 testers
experienced massive problems and finally one of them bisected it down
to commit 30a564be which spured some further investigation.Finally we got confirmation that the write to the compare register can
be delayed by up to two HPET clock cycles which explains the problems
nicely. All we can do about this is to go back to Venki's initial
workaround in a slightly modified version.Just for the record I need to say, that all of this could have been
avoided if hardware designers and of course the HPET committee would
have thought about the consequences for a split second. It's out of my
comprehension why designing a working timer is so hard. There are two
ways to achieve it:1) Use a counter wrap around aware compare_reg
Reported-by: Artur Skawina
Reported-by: Damien Wyart
Reported-by: John Drescher
Cc: Venkatesh Pallipadi
Cc: Ingo Molnar
Cc: H. Peter Anvin
Cc: Arjan van de Ven
Cc: Andreas Herrmann
Cc: Borislav Petkov
Cc: stable@kernel.org
Acked-by: Suresh Siddha
Signed-off-by: Thomas Gleixner
14 Sep, 2010
1 commit
-
Gcc 3.x generates a warning
arch/x86/include/asm/cpufeature.h: In function `__static_cpu_has':
arch/x86/include/asm/cpufeature.h:326: warning: asm operand 1 probably doesn't match constraintson each file.
But static_cpu_has() for gcc 3.x does not need __static_cpu_has().Signed-off-by: Tetsuo Handa
LKML-Reference:
Signed-off-by: H. Peter Anvin
10 Sep, 2010
1 commit
-
* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Perform hardware_enable in CPU_STARTING callback
KVM: i8259: fix migration
KVM: fix i8259 oops when no vcpus are online
KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts
09 Sep, 2010
2 commits
-
…git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, mcheck: Avoid duplicate sysfs links/files for thresholding banks
io-mapping: Fix the address space annotations
x86: Fix the address space annotations of iomap_atomic_prot_pfn()
x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline
x86, hwmon: Fix unsafe smp_processor_id() in thermal_throttle_add_dev -
operand::val and operand::orig_val are 32-bit on i386, whereas cmpxchg8b
operands are 64-bit.Fix by adding val64 and orig_val64 union members to struct operand, and
using them where needed.Signed-off-by: Avi Kivity
Signed-off-by: Marcelo Tosatti
08 Sep, 2010
1 commit
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: bus speed strings should be const
PCI hotplug: Fix build with CONFIG_ACPI unset
PCI: PCIe: Remove the port driver module exit routine
PCI: PCIe: Move PCIe PME code to the pcie directory
PCI: PCIe: Disable PCIe port services during port initialization
PCI: PCIe: Ask BIOS for control of all native services at once
ACPI/PCI: Negotiate _OSC control bits before requesting them
ACPI/PCI: Do not preserve _OSC control bits returned by a query
ACPI/PCI: Make acpi_pci_query_osc() return control bits
ACPI/PCI: Reorder checks in acpi_pci_osc_control_set()
PCI: PCIe: Introduce commad line switch for disabling port services
PCI: PCIe AER: Introduce pci_aer_available()
x86/PCI: only define pci_domain_nr if PCI and PCI_DOMAINS are set
PCI: provide stub pci_domain_nr function for !CONFIG_PCI configs
05 Sep, 2010
1 commit
-
This patch fixes the sparse warnings when the return pointer of
iomap_atomic_prot_pfn() is used as an argument of iowrite32()
and friends.Signed-off-by: Francisco Jerez
LKML-Reference:
Cc: Andrew Morton
Signed-off-by: Ingo Molnar
01 Sep, 2010
1 commit
-
Implements verification of
- Bits of ESCR EventMask field (meaningful bits in field are hardware
predefined and others bits should be set to zero)- INSTR_COMPLETED event (it is available on predefined cpu model only)
- Thread shared events (they should be guarded by "perf_event_paranoid"
sysctl due to security reason). The side effect of this action is
that PERF_COUNT_HW_BUS_CYCLES become a "paranoid" general event.Signed-off-by: Cyrill Gorcunov
Tested-by: Lin Ming
Cc: Frederic Weisbecker
Cc: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
25 Aug, 2010
1 commit
-
…l/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states
sched: Fix rq->clock synchronization when migrating tasks
21 Aug, 2010
1 commit
-
…git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, apic: Fix apic=debug boot crash
x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues
x86-32: Fix dummy trampoline-related inline stubs
x86-32: Separate 1:1 pagetables from swapper_pg_dir
x86, cpu: Fix regression in AMD errata checking code
20 Aug, 2010
1 commit
-
TSC's get reset after suspend/resume (even on cpu's with invariant TSC
which runs at a constant rate across ACPI P-, C- and T-states). And in
some systems BIOS seem to reinit TSC to arbitrary large value (still
sync'd across cpu's) during resume.This leads to a scenario of scheduler rq->clock (sched_clock_cpu()) less
than rq->age_stamp (introduced in 2.6.32). This leads to a big value
returned by scale_rt_power() and the resulting big group power set by the
update_group_power() is causing improper load balancing between busy and
idle cpu's after suspend/resume.This resulted in multi-threaded workloads (like kernel-compilation) go
slower after suspend/resume cycle on core i5 laptops.Fix this by recomputing cyc2ns_offset's during resume, so that
sched_clock() continues from the point where it was left off during
suspend.Reported-by: Florian Pritz
Signed-off-by: Suresh Siddha
Cc: # [v2.6.32+]
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar
19 Aug, 2010
2 commits
-
Fix dummy inline stubs for trampoline-related functions when no
trampolines exist (until we get rid of the no-trampoline case
entirely.)Signed-off-by: H. Peter Anvin
Cc: Joerg Roedel
Cc: Borislav Petkov
LKML-Reference: -
This patch fixes machine crashes which occur when heavily exercising the
CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
AMD Erratum 383 and result in a fatal machine check exception. Here's
the scenario:1. On 32-bit, the swapper_pg_dir page table is used as the initial page
table for booting a secondary CPU.2. To make this work, swapper_pg_dir needs a direct mapping of physical
memory in it (the low mappings). By adding those low, large page (2M)
mappings (PAE kernel), we create the necessary conditions for Erratum
383 to occur.3. Other CPUs which do not participate in the off- and onlining game may
use swapper_pg_dir while the low mappings are present (when leave_mm is
called). For all steps below, the CPU referred to is a CPU that is using
swapper_pg_dir, and not the CPU which is being onlined.4. The presence of the low mappings in swapper_pg_dir can result
in TLB entries for addresses below __PAGE_OFFSET to be established
speculatively. These TLB entries are marked global and large.5. When the CPU with such TLB entry switches to another page table, this
TLB entry remains because it is global.6. The process then generates an access to an address covered by the
above TLB entry but there is a permission mismatch - the TLB entry
covers a large global page not accessible to userspace.7. Due to this permission mismatch a new 4kb, user TLB entry gets
established. Further, Erratum 383 provides for a small window of time
where both TLB entries are present. This results in an uncorrectable
machine check exception signalling a TLB multimatch which panics the
machine.There are two ways to fix this issue:
1. Always do a global TLB flush when a new cr3 is loaded and the
old page table was swapper_pg_dir. I consider this a hack hard
to understand and with performance implications2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
does.This patch implements solution 2. It introduces a trampoline_pg_dir
which has the same layout as swapper_pg_dir with low_mappings. This page
table is used as the initial page table of the booting CPU. Later in the
bringup process, it switches to swapper_pg_dir and does a global TLB
flush. This fixes the crashes in our test cases.-v2: switch to swapper_pg_dir right after entering start_secondary() so
that we are able to access percpu data which might not be mapped in the
trampoline page table.Signed-off-by: Joerg Roedel
LKML-Reference:
Signed-off-by: Borislav Petkov
Signed-off-by: H. Peter Anvin
18 Aug, 2010
2 commits
-
Make do_execve() take a const filename pointer so that kernel_execve() compiles
correctly on ARM:arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type
This also requires the argv and envp arguments to be consted twice, once for
the pointer array and once for the strings the array points to. This is
because do_execve() passes a pointer to the filename (now const) to
copy_strings_kernel(). A simpler alternative would be to cast the filename
pointer in do_execve() when it's passed to copy_strings_kernel().do_execve() may not change any of the strings it is passed as part of the argv
or envp lists as they are some of them in .rodata, so marking these strings as
const should be fine.Further kernel_execve() and sys_execve() need to be changed to match.
This has been test built on x86_64, frv, arm and mips.
Signed-off-by: David Howells
Tested-by: Ralf Baechle
Acked-by: Russell King
Signed-off-by: Linus Torvalds -
Otherwise we'll duplicate definitions with the pci.h stubs.
Reported-by: Randy Dunlap
Acked-by: Randy Dunlap
Signed-off-by: Jesse Barnes
15 Aug, 2010
1 commit
-
unifdef-y and header-y have same semantic, so drop unifdef-y
Signed-off-by: Sam Ravnborg
14 Aug, 2010
2 commits
-
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, UV: Make kdump avoid stack dumps - fix !CONFIG_KEXEC breakage
x86, UV: Initialize BAU hub map
x86, UV: Make kdump avoid stack dumps -
Mark arguments to certain system calls as being const where they should be but
aren't. The list includes:(*) The filename arguments of various stat syscalls, execve(), various utimes
syscalls and some mount syscalls.(*) The filename arguments of some syscall helpers relating to the above.
(*) The buffer argument of various write syscalls.
Signed-off-by: David Howells
Acked-by: David S. Miller
Signed-off-by: Linus Torvalds