05 Jan, 2012
2 commits
-
The commit 883c2cfc8bcc0fd00c5d9f596fb8870f481b5bda:
"fix of_flat_dt_is_compatible() to match the full compatible string"
causes silent boot death on the sbc8349 board because it was
just looking for 8349 and not 8349E -- as originally there
were non-E (no SEC/encryption) chips available. Just add the
E to the board detection string since all boards I've seen
were manufactured with the E versions.Signed-off-by: Paul Gortmaker
Signed-off-by: Kumar Gala -
There is an issue on FSL-BookE 64-bit devices (P5020) in which PCIe
devices that are capable of doing 64-bit DMAs (like an Intel e1000) do
not function and crash the kernel if we have >4G of memory in the system.The reason is that the existing code only sets up one inbound window for
access to system memory across PCIe. That window is limited to a 32-bit
address space. So on systems we'll end up utilizing SWIOTLB for dma
mappings. However SWIOTLB dma ops implement dma_alloc_coherent() as
dma_direct_alloc_coherent(). Thus we can end up with dma addresses that
are not accessible because of the inbound window limitation.We could possibly set the SWIOTLB alloc_coherent op to
swiotlb_alloc_coherent() however that does not address the issue since
the swiotlb_alloc_coherent() will behave almost identical to
dma_direct_alloc_coherent() since the devices coherent_dma_mask will be
greater than any address allocated by swiotlb_alloc_coherent() and thus
we'll never bounce buffer it into a range that would be dma-able.The easiest and best solution is to just make it so that a 64-bit
capable device is able to DMA to any internal system address.We accomplish this by opening up a second inbound window that maps all
of memory above the internal SoC address width so we can set it up to
access all of the internal SoC address space if needed.We than fixup the dma_ops and dma_offset for PCIe devices with a dma
mask greater than the maximum internal SoC address.Signed-off-by: Kumar Gala
03 Jan, 2012
4 commits
-
Unpaired calling of probe_hcall_entry and probe_hcall_exit might happen
as following, which could cause incorrect preempt count.__trace_hcall_entry => trace_hcall_entry -> probe_hcall_entry =>
get_cpu_var => preempt_disable__trace_hcall_exit => trace_hcall_exit -> probe_hcall_exit =>
put_cpu_var => preempt_enablewhere:
A => B and A -> B means A calls B, but
=> means A will call B through function name, and B will definitely be
called.
-> means A will call B through function pointer, so B might not be
called if the function pointer is not set.So error happens when only one of probe_hcall_entry and probe_hcall_exit
get called during a hcall.This patch tries to move the preempt count operations from
probe_hcall_entry and probe_hcall_exit to its callers.Reported-by: Paul E. McKenney
Signed-off-by: Li Zhong
Tested-by: Paul E. McKenney
CC: stable@kernel.org [v2.6.32+]
Signed-off-by: Benjamin Herrenschmidt -
When using a >8bpp framebuffer, offb advertises truecolor, not directcolor,
and doesn't touch the color map even if it has a corresponding access method
for the real hardware.Thus it needs to set the pseudo-palette with all 3 components of the color,
like other truecolor framebuffers, not with copies of the color index like
a directcolor framebuffer would do.This went unnoticed for a long time because it's pretty hard to get offb
to kick in with anything but 8bpp (old BootX under MacOS will do that and
qemu does it).Signed-off-by: Benjamin Herrenschmidt
CC: stable@kernel.org -
We rename the mach64 hack to "simple" since that's also applicable
to anything using VGA-style DAC IO ports (set to 8-bit DAC) and we
use it for qemu vga.Note that this is keyed on a device-tree "compatible" property that
is currently only set by an upcoming version of SLOF when using the
qemu "pseries" platform. This is on purpose as other qemu ppc platforms
using OpenBIOS aren't properly setting the DAC to 8-bit at the time of
the writing of this patch.We can fix OpenBIOS later to do that and add the required property, in
which case it will be matched by this change.Signed-off-by: Benjamin Herrenschmidt
-
We used to try to request 8 times more vram than needed, which would
fail if the card has a too small BAR (observed with qemu & kvm).Signed-off-by: Benjamin Herrenschmidt
CC: stable@kernel.org
22 Dec, 2011
1 commit
-
commit c55aef0e5bc6 ("powerpc/boot: Change the load address
for the wrapper to fit the kernel") introduced a WARNING to
inform the user that the uncompressed kernel would overlap
the boot uncompressing wrapper code. Change it to an INFO.I initially thought, this would be a 'WARNING' for the those
boards, where the link_address should be fixed, so that the
user can take actions accordingly.Changing the same to INFO.
Signed-off-by: Suzuki K. Poulose
Signed-off-by: Josh Boyer
20 Dec, 2011
8 commits
-
The MPIC_PRIMARY define was recently made "default" and the meaning was
inverted to MPIC_SECONDARY. This causes compile errors in currituck now, so
fix it to the new manner of allocating mpics.Signed-off-by: Josh Boyer
-
The wrapper code which uncompresses the kernel in case of a 'ppc' boot
is by default loaded at 0x00400000 and the kernel will be uncompressed
to fit the location 0-0x00400000. But with dynamic relocations, the size
of the kernel may exceed 0x00400000(4M). This would cause an overlap
of the uncompressed kernel and the boot wrapper, causing a failure in
boot.The message looks like :
zImage starting: loaded at 0x00400000 (sp: 0x0065ffb0)
Allocating 0x5ce650 bytes for kernel ...
Insufficient memory for kernel at address 0! (_start=00400000, uncompressed size=00591a20)This patch shifts the load address of the boot wrapper code to the next
higher MB, according to the size of the uncompressed vmlinux.With the patch, we get the following message while building the image :
WARN: Uncompressed kernel (size 0x5b0344) overlaps the address of the wrapper(0x400000)
WARN: Fixing the link_address of wrapper to (0x600000)Signed-off-by: Suzuki K. Poulose
Signed-off-by: Josh Boyer -
Now that we have relocatable kernel, supporting CRASH_DUMP only requires
turning the switches on for UP machines.We don't have kexec support on 47x yet. Enabling SMP support would be done
as part of enabling the PPC_47x support.Signed-off-by: Suzuki K. Poulose
Cc: Josh Boyer
Cc: Benjamin Herrenschmidt
Cc: linuxppc-dev
Signed-off-by: Josh Boyer -
The following patch adds relocatable kernel support - based on processing
of dynamic relocations - for PPC44x kernel.We find the runtime address of _stext and relocate ourselves based
on the following calculation.virtual_base = ALIGN(KERNELBASE,256M) +
MODULO(_stext.run,256M)relocate() is called with the Effective Virtual Base Address (as
shown below)| Phys. Addr| Virt. Addr |
Page (256M) |------------------------|
Boundary | | |
| | |
| | |
Kernel Load |___________|_ __ _ _ _ _|
Cc: Benjamin Herrenschmidt
Cc: Kumar Gala
Cc: Tony Breeds
Cc: Josh Boyer
Cc: linuxppc-dev
Signed-off-by: Josh Boyer -
We find the runtime address of _stext and relocate ourselves based
on the following calculation.virtual_base = ALIGN(KERNELBASE,KERNEL_TLB_PIN_SIZE) +
MODULO(_stext.run,KERNEL_TLB_PIN_SIZE)relocate() is called with the Effective Virtual Base Address (as
shown below)| Phys. Addr| Virt. Addr |
Page |------------------------|
Boundary | | |
| | |
| | |
Kernel Load |___________|_ __ _ _ _ _|
Cc: Benjamin Herrenschmidt
Cc: Kumar Gala
Cc: linuxppc-dev
Signed-off-by: Josh Boyer -
The following patch implements the dynamic relocation processing for
PPC32 kernel. relocate() accepts the target virtual address and relocates
the kernel image to the same.Currently the following relocation types are handled :
R_PPC_RELATIVE
R_PPC_ADDR16_LO
R_PPC_ADDR16_HI
R_PPC_ADDR16_HAThe last 3 relocations in the above list depends on value of Symbol indexed
whose index is encoded in the Relocation entry. Hence we need the Symbol
Table for processing such relocations.Note: The GNU ld for ppc32 produces buggy relocations for relocation types
that depend on symbols. The value of the symbols with STB_LOCAL scope
should be assumed to be zero. - Alan ModraSigned-off-by: Suzuki K. Poulose
Signed-off-by: Josh Poimboeuf
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Alan Modra
Cc: Kumar Gala
Cc: linuxppc-dev
Signed-off-by: Josh Boyer -
DYNAMIC_MEMSTART(old RELOCATABLE) was restricted only to PPC_47x variants
of 44x. This patch enables DYNAMIC_MEMSTART for 440x based chipsets.Signed-off-by: Suzuki K. Poulose
Cc: Josh Boyer
Cc: Kumar Gala
Cc: Benjamin Herrenschmidt
Cc: linux ppc dev
Signed-off-by: Josh Boyer -
The current implementation of CONFIG_RELOCATABLE in BookE is based
on mapping the page aligned kernel load address to KERNELBASE. This
approach however is not enough for platforms, where the TLB page size
is large (e.g, 256M on 44x). So we are renaming the RELOCATABLE used
currently in BookE to DYNAMIC_MEMSTART to reflect the actual method.The CONFIG_RELOCATABLE for PPC32(BookE) based on processing of the
dynamic relocations will be introduced in the later in the patch series.This change would allow the use of the old method of RELOCATABLE for
platforms which can afford to enforce the page alignment (platforms with
smaller TLB size).Changes since v3:
* Introduced a new config, NONSTATIC_KERNEL, to denote a kernel which is
either a RELOCATABLE or DYNAMIC_MEMSTART(Suggested by: Josh Boyer)Suggested-by: Scott Wood
Tested-by: Scott WoodSigned-off-by: Suzuki K. Poulose
Cc: Scott Wood
Cc: Kumar Gala
Cc: Josh Boyer
Cc: Benjamin Herrenschmidt
Cc: linux ppc dev
Signed-off-by: Josh Boyer
19 Dec, 2011
8 commits
-
We have an array of 16 entries and a loop of 32 iterations... oops.
Signed-off-by: Benjamin Herrenschmidt
-
As the kernels and initrd's get bigger boot-loaders and possibly
kexec-tools will need to place the initrd outside the RMO. When this
happens we end up with no lowmem and the boot doesn't get very far.Only use initrd_end as the limit for alloc_bottom if it's inside the
RMO.Signed-off-by: Paul Mackerras
Signed-off-by: Tony Breeds
Signed-off-by: Benjamin Herrenschmidt -
We support 16TB of user address space and half a million contexts
so update the comment to reflect this.Signed-off-by: Anton Blanchard
Signed-off-by: Benjamin Herrenschmidt -
Commit d57af9b (taskstats: use real microsecond granularity for CPU times)
renamed msecs_to_cputime to usecs_to_cputime, but failed to update all
numbers on the way. This causes nonsensical cpu idle/iowait values to be
displayed in /proc/stat (the only user of usecs_to_cputime so far).This also renames __cputime_msec_factor to __cputime_usec_factor, adapting
its value and using it directly in cputime_to_usecs instead of doing two
multiplications.Signed-off-by: Andreas Schwab
Acked-by: Anton Blanchard
Signed-off-by: Benjamin Herrenschmidt -
read_n_cells() cannot be marked as .devinit.text since it is referenced
from two functions that are not in that section: of_get_lmb_size() and
hot_add_drconf_scn_to_nid().Signed-off-by: David Rientjes
Signed-off-by: Benjamin Herrenschmidt -
mark_reserved_regions_for_nid() is only called from do_init_bootmem(),
which is in .init.text, so it must be in the same section to avoid a
section mismatch warning.Reported-by: Subrata Modak
Signed-off-by: David Rientjes
Signed-off-by: Benjamin Herrenschmidt -
PPC64 uses long long for u64 in the kernel, but powerpc's asm/types.h
prevents 64-bit userland from seeing this definition, instead defaulting
to u64 == long in userspace. Some user programs (e.g. kvmtool) may actually
want LL64, so this patch adds a check for __SANE_USERSPACE_TYPES__ so that,
if defined, int-ll64.h is included instead.Signed-off-by: Matt Evans
Acked-by: Ingo Molnar
Signed-off-by: Benjamin Herrenschmidt -
Implement a POWER7 optimised copy_to_user/copy_from_user using VMX.
For large aligned copies this new loop is over 10% faster, and for
large unaligned copies it is over 200% faster.If we take a fault we fall back to the old version, this keeps
things relatively simple and easy to verify.On POWER7 unaligned stores rarely slow down - they only flush when
a store crosses a 4KB page boundary. Furthermore this flush is
handled completely in hardware and should be 20-30 cycles.Unaligned loads on the other hand flush much more often - whenever
crossing a 128 byte cache line, or a 32 byte sector if either sector
is an L1 miss.Considering this information we really want to get the loads aligned
and not worry about the alignment of the stores. Microbenchmarks
confirm that this approach is much faster than the current unaligned
copy loop that uses shifts and rotates to ensure both loads and
stores are aligned.We also want to try and do the stores in cacheline aligned, cacheline
sized chunks. If the store queue is unable to merge an entire
cacheline of stores then the L2 cache will have to do a
read/modify/write. Even worse, we will serialise this with the stores
in the next iteration of the copy loop since both iterations hit
the same cacheline.Based on this, the new loop does the following things:
1 - 127 bytes
Get the source 8 byte aligned and use 8 byte loads and stores. Pretty
boring and similar to how the current loop works.128 - 4095 bytes
Get the source 8 byte aligned and use 8 byte loads and stores,
1 cacheline at a time. We aren't doing the stores in cacheline
aligned chunks so we will potentially serialise once per cacheline.
Even so it is much better than the loop we have today.4096 - bytes
If both source and destination have the same alignment get them both
16 byte aligned, then get the destination cacheline aligned. Do
cacheline sized loads and stores using VMX.If source and destination do not have the same alignment, we get the
destination cacheline aligned, and use permute to do aligned loads.In both cases the VMX loop should be optimal - we always do aligned
loads and stores and are always doing stores in cacheline aligned,
cacheline sized chunks.To be able to use VMX we must be careful about interrupts and
sleeping. We don't use the VMX loop when in an interrupt (which should
be rare anyway) and we wrap the VMX loop in disable/enable_pagefault
and fall back to the existing copy_tofrom_user loop if we do need to
sleep.The VMX breakpoint of 4096 bytes was chosen using this microbenchmark:
http://ozlabs.org/~anton/junkcode/copy_to_user.c
Since we are using VMX and there is a cost to saving and restoring
the user VMX state there are two broad cases we need to benchmark:- Best case - userspace never uses VMX
- Worst case - userspace always uses VMX
In reality a userspace process will sit somewhere between these two
extremes. Since we need to test both aligned and unaligned copies we
end up with 4 combinations. The point at which the VMX loop begins to
win is:0% VMX
aligned 2048 bytes
unaligned 2048 bytes100% VMX
aligned 16384 bytes
unaligned 8192 bytesConsidering this is a microbenchmark, the data is hot in cache and
the VMX loop has better store queue merging properties we set the
breakpoint to 4096 bytes, a little below the unaligned breakpoints.Some future optimisations we can look at:
- Looking at the perf data, a significant part of the cost when a
task is always using VMX is the extra exception we take to restore
the VMX state. As such we should do something similar to the x86
optimisation that restores FPU state for heavy users. ie:/*
* If the task has used fpu the last 5 timeslices, just do a full
* restore of the math state immediately to avoid the trap; the
* chances of needing FPU soon are obviously high now
*/
preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5;and
/*
* fpu_counter contains the number of consecutive context switches
* that the FPU is used. If this is over a threshold, the lazy fpu
* saving becomes unlazy to save the trap. This is an unsigned char
* so that after 256 times the counter wraps and the behavior turns
* lazy again; this to deal with bursty apps that only use FPU for
* a short time
*/- We could create a paca bit to mirror the VMX enabled MSR bit and check
that first, avoiding multiple calls to calling enable_kernel_altivec.
That should help with iovec based system calls like readv.- We could have two VMX breakpoints, one for when we know the user VMX
state is loaded into the registers and one when it isn't. This could
be a second bit in the paca so we can calculate the break points quickly.- One suggestion from Ben was to save and restore the VSX registers
we use inline instead of using enable_kernel_altivec.[BenH: Fixed a problem with preempt and fixed build without CONFIG_ALTIVEC]
Signed-off-by: Anton Blanchard
Signed-off-by: Benjamin Herrenschmidt
16 Dec, 2011
8 commits
-
As of commit dd472da38, rwsem.h was moved into asm-generic.
This patch removes the arch file and points the build at
its new location.Signed-off-by: Richard Kuo
Signed-off-by: Benjamin Herrenschmidt -
Conflicts:
arch/powerpc/platforms/40x/ppc40x_simple.c -
The code for "powersurge" SMP would kick in and cause a crash
at boot due to the lack of a NULL test.Signed-off-by: Benjamin Herrenschmidt
-
In the old days, we treated all interrupts from the legacy Apple home made
interrupt controllers as level, with a trick reading the "level" register
along with the "event" register to work arounds bugs where it would
occasionally fail to latch some events.Doing so appeared to work fine for both level and edge interrupts.
Later on, we discovered in Darwin source the magic masks that define which
interrupts are actually level and which are edge, and implemented a
different algorithm, more similar to what Apple does, that treats those
differently.I recently discovered however that this caused problems (including loss
of interrupts) with an old Wallstreet PowerBook when trying to use the
internal modem (connected to a cascaded controller).It looks like some interrupts are treated as edge while they are really
level and I'm starting to seriously doubt the correctness of the Darwin
code (which has other obvious bugs when you read it, so ...)This patch reverts to our original behaviour of treating everything as
a level interrupt. It appears to solve the problems with the modem on
the Wallstreet and everything else seems to be working properly as well.Signed-off-by: Benjamin Herrenschmidt
-
This patch reworks & simplifies pmac_zilog handling of suspend/resume,
essentially removing all the specific code in there and using the
generic uart helpers.This required properly registering the tty as a child of the macio (or platform)
device, so I had to delay the registration a bit (we used to register the ports
very very early). We still register the kernel console early though.I removed a couple of unused or useless flags as well, relying on the
core to not call us when asleep. I also removed the essentially useless
interrupt mutex, simplifying the locking a bit.I removed some code for handling unexpected interrupt which should never
be hit and could potentially be harmful (causing us to access a register
on a powered off SCC). We diable port interrupts on close always so there
should be no need to drain data on a closed port.Signed-off-by: Benjamin Herrenschmidt
09 Dec, 2011
8 commits
-
Based on original work by David 'Shaggy' Kleikamp.
Signed-off-by: Tony Breeds
Signed-off-by: Josh Boyer -
Based on original work by David 'Shaggy' Kleikamp.
Signed-off-by: Tony Breeds
Signed-off-by: Josh Boyer -
Needed for currituck support.
Signed-off-by: Tony Breeds
Signed-off-by: Josh Boyer -
The upcomming currituck patches will need to do 64-bit shifts which will
fail with undefined symbol without this patch.I looked at linking against libgcc but we can't guarantee that libgcc
was compiled with soft-float. Also Using ../lib/div64.S or
../kernel/misc_32.S, this will break the build as the .o's need to be
built with different flags for the bootwrapper vs the kernel. So for
now the easyest option is to just copy code from
arch/powerpc/kernel/misc_32.S I don't think this code changes too often ;PSigned-off-by: Tony Breeds
Signed-off-by: Josh Boyer -
CONFIG_PPC47x doesn't exist in Kconfig and no 476 processor calls this
function ppc44x_pin_tlb() as it has it's own ppc47x_pin_tlb().This code is probably an artifact of the original 476 code that
shouldn't have made it upstream.Signed-off-by: Christoph Egger
Signed-off-by: Tony Breeds
Signed-off-by: Josh Boyer -
Needed if you want to use swiotlb, harmless otherwise.
Signed-off-by: Tony Breeds
Signed-off-by: Josh Boyer -
Currituck doesn't need nor use SDR so aborting the pci setup if there is
no sdr-base would be bad.Add a flag to ppc4xx_pciex_hwops for the backends to state if they need
SDR and then only complain and abort if they do and it's not found in
the device tree.Signed-off-by: Tony Breeds
Signed-off-by: Josh Boyer -
Signed-off-by: Tony Breeds
Signed-off-by: Josh Boyer
08 Dec, 2011
1 commit
-
Most distros use it so we may as well enable it and get regular compile
testing.Signed-off-by: Anton Blanchard
Signed-off-by: Benjamin Herrenschmidt