23 Jun, 2006

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (139 commits)
    [POWERPC] re-enable OProfile for iSeries, using timer interrupt
    [POWERPC] support ibm,extended-*-frequency properties
    [POWERPC] Extra sanity check in EEH code
    [POWERPC] Dont look for class-code in pci children
    [POWERPC] Fix mdelay badness on shared processor partitions
    [POWERPC] disable floating point exceptions for init
    [POWERPC] Unify ppc syscall tables
    [POWERPC] mpic: add support for serial mode interrupts
    [POWERPC] pseries: Print PCI slot location code on failure
    [POWERPC] spufs: one more fix for 64k pages
    [POWERPC] spufs: fail spu_create with invalid flags
    [POWERPC] spufs: clear class2 interrupt status before wakeup
    [POWERPC] spufs: fix Makefile for "make clean"
    [POWERPC] spufs: remove stop_code from struct spu
    [POWERPC] spufs: fix spu irq affinity setting
    [POWERPC] spufs: further abstract priv1 register access
    [POWERPC] spufs: split the Cell BE support into generic and platform dependant parts
    [POWERPC] spufs: dont try to access SPE channel 1 count
    [POWERPC] spufs: use kzalloc in create_spu
    [POWERPC] spufs: fix initial state of wbox file
    ...

    Manually resolved conflicts in:
    drivers/net/phy/Makefile
    include/asm-powerpc/spu.h

    Linus Torvalds
     
  • VGA_MAP_MEM translates to ioremap() on some architectures. It makes sense
    to do this to vga_vram_base, because we're going to access memory between
    vga_vram_base and vga_vram_end.

    But it doesn't really make sense to map starting at vga_vram_end, because
    we aren't going to access memory starting there. On ia64, which always has
    to be different, ioremapping vga_vram_end gives you something completely
    incompatible with ioremapped vga_vram_start, so vga_vram_size ends up being
    nonsense.

    As a bonus, we often know the size up front, so we can use ioremap()
    correctly, rather than giving it a zero size.

    Signed-off-by: Bjorn Helgaas
    Cc: "Antonino A. Daplas"
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bjorn Helgaas
     

21 Jun, 2006

12 commits

  • On partitioned PPC64 systems where a partition is given 1/10 of a
    processor, we have seen mdelay() delaying for 10 times longer than it
    should. The reason is that the generic mdelay(n) does n delays of 1
    millisecond each. However, with 1/10 of a processor, we only get a
    one-millisecond timeslice every 10ms. Thus each 1 millisecond delay
    loop ends up taking 10ms elapsed time.

    The solution is just to use the PPC64 udelay function, which uses the
    timebase to ensure that the delay is based on elapsed time rather than
    how much processing time the partition has been given. (Yes, the
    generic mdelay uses the PPC64 udelay, but the problem is that the
    start time gets reset every millisecond, and each time it gets reset
    we lose another 9ms.)

    Signed-off-by: Anton Blanchard
    Signed-off-by: Paul Mackerras
    Acked-by: Andrew Morton

    Anton Blanchard
     
  • Floating point exceptions should not be enabled by default,
    as this setting impacts the performance on some CPUs, in
    particular the Cell BE. Since the bits are inherited from
    parent processes, the place to change the default is the
    thread struct used for init.

    glibc sets this up correctly per thread in its fesetenv
    function, so user space should not be impacted by this
    setting. None of the other common libc implementations
    (uClibc, dietlibc, newlib, klibc) has support for fp
    exceptions, so they are unlikely to be hit by this either.

    There is a small risk that somebody wrote their own
    application that manually sets the fpscr bits instead
    of calling fesetenv, without changing the MSR bits as well.
    Those programs will break with this change.

    It probably makes sense to change glibc in the future
    to be more clever about FE bits, so that when running
    on a CPU where this is expensive, it disables exceptions
    ASAP, while it keeps them enabled on CPUs where running
    with exceptions on is cheaper than changing the state
    often.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Paul Mackerras

    Arnd Bergmann
     
  • Avoid duplication of the syscall table for the cell platform. Based on an
    idea from David Woodhouse.

    Signed-off-by: Andreas Schwab
    Acked-by: Arnd Bergmann
    Signed-off-by: Paul Mackerras

    Andreas Schwab
     
  • On Tue, Jun 20, 2006 at 02:01:26PM +1000, Benjamin Herrenschmidt wrote:
    > On Mon, 2006-06-19 at 13:08 -0700, Mark A. Greer wrote:
    > > MPC10x-style interrupt controllers have a serial mode that allows
    > > several interrupts to be clocked in through one INT signal.
    > >
    > > This patch adds the software support for that mode.
    >
    > You hard code the clock ratio... why not add a separate call to be
    > called after mpic_init,
    > something like mpic_set_serial_int(int mpic, int enable, int
    > clock_ratio) ?

    How's this?
    --

    MPC10x-style interrupt controllers have a serial mode that allows
    several interrupts to be clocked in through one INT signal.

    This patch adds the software support for that mode.

    Signed-off-by: Mark A. Greer
    --

    arch/powerpc/sysdev/mpic.c | 20 ++++++++++++++++++++
    include/asm-powerpc/mpic.h | 10 ++++++++++
    2 files changed, 30 insertions(+)
    --
    Signed-off-by: Paul Mackerras

    Mark A. Greer
     
  • The SPU context save/restore code is currently built
    for a 4k page size and we provide a _shipped version
    of it since most people don't have the spu toolchain
    that is needed to rebuild that code.

    This patch hardcodes the data structures to a 64k
    page alignment, which also guarantees 4k alignment
    but unfortunately wastes 60k of memory per SPU
    context that is created in the running system.

    We will follow up on this with another patch to
    reduce that overhead or maybe redo the context
    save/restore logic to do this part entirely different,
    but for now it should make experimental systems
    work with either page size.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Paul Mackerras

    arnd@arndb.de
     
  • This patch remove 'stop_code' -- discarded member of struct spu.
    It is written at initialize and interrupt, but never read
    in current implementation.

    Signed-off-by: Masato Noguchi
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Paul Mackerras

    Masato Noguchi
     
  • This changes the hypervisor abstraction of setting cpu affinity to a
    higher level to avoid platform dependent interrupt controller
    routines. I replaced spu_priv1_ops:spu_int_route_set() with a
    new routine spu_priv1_ops:spu_cpu_affinity_set().

    As a by-product, this change eliminated what looked like an
    existing bug in the set affinity code where spu_int_route_set()
    mistakenly called int_stat_get().

    Signed-off-by: Geoff Levand
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Paul Mackerras

    Geoff Levand
     
  • To support muti-platform binaries the spu hypervisor accessor
    routines must have runtime binding.

    I removed the existing statically linked routines in spu.h
    and spu_priv1_mmio.c and created new accessor routines in spu_priv1.h
    that operate indirectly through an ops struct spu_priv1_ops.
    spu_priv1_mmio.c contains the instance of the accessor routines
    for running on raw hardware.

    Signed-off-by: Geoff Levand
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Paul Mackerras

    Geoff Levand
     
  • SPUs are registered as system devices, exposing attributes through
    sysfs. Since the sysdev includes a kref, we can remove the one in
    struct spu (it isn't used at the moment anyway).

    Currently only the interrupt source and numa node attributes are added.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Paul Mackerras

    Jeremy Kerr
     
  • This is a first version of support for the Cell BE "Reliability,
    Availability and Serviceability" features.

    It doesn't yet handle some of the RAS interrupts (the ones described in
    iic_is/iic_irr), I'm still working on a proper way to expose these. They
    are essentially a cascaded controller by themselves (sic !) though I may
    just handle them locally to the iic driver. I need also to sync with
    David Erb on the way he hooked in the performance monitor interrupt.

    So that's all for 2.6.17 and I'll do more work on that with my rework of
    the powerpc interrupt layer that I'm hacking on at the moment.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Paul Mackerras

    Benjamin Herrenschmidt
     
  • Signed-off-by: Jeff Brown
    Signed-off-by: Xianghua Xiao
    Signed-off-by: Jon Loeliger
    Signed-off-by: Paul Mackerras

    Jon Loeliger
     
  • * git://git.infradead.org/hdrcleanup-2.6: (63 commits)
    [S390] __FD_foo definitions.
    Switch to __s32 types in joystick.h instead of C99 types for consistency.
    Add to headers included for userspace in
    Move inclusion of out of user scope in asm-x86_64/mtrr.h
    Remove struct fddi_statistics from user view in
    Move user-visible parts of drivers/s390/crypto/z90crypt.h to include/asm-s390
    Revert include/media changes: Mauro says those ioctls are only used in-kernel(!)
    Include and use __uXX types in
    Use __uXX types in , include too
    Remove private struct dx_hash_info from public view in
    Include and use __uXX types in
    Use __uXX types in for struct divert_blk et al.
    Use __u32 for elf_addr_t in , not u32. It's user-visible.
    Remove PPP_FCS from user view in , remove __P mess entirely
    Use __uXX types in user-visible structures in
    Don't use 'u32' in user-visible struct ip_conntrack_old_tuple.
    Use __uXX types for S390 DASD volume label definitions which are user-visible
    S390 BIODASDREADCMB ioctl should use __u64 not u64 type.
    Remove unneeded inclusion of from
    Fix private integer types used in V4L2 ioctls.
    ...

    Manually resolve conflict in include/linux/mtd/physmap.h

    Linus Torvalds
     

18 Jun, 2006

1 commit


15 Jun, 2006

8 commits

  • Currently the kernel blindly halts all the processors and calls the
    ibm,suspend-me rtas call. If the firmware is not in the correct
    state, we then re-start all the processors and return. It is much
    smarter to first check the firmware state, and only if it is waiting,
    call the ibm,suspend-me call.

    Signed-off-by: Paul Mackerras

    Dave C Boutcher
     
  • On non partitioned machines we currently set the HV bit in kernel space
    only. It turns out we are supposed to maintain the HV bit in both user
    and kernel space.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Paul Mackerras

    Anton Blanchard
     
  • Allocate IOMMU tables local to the relevant node.

    Signed-off-by: Anton Blanchard
    Acked-by: Olof Johansson
    Signed-off-by: Paul Mackerras

    Anton Blanchard
     
  • of_node_to_nid returns -1 if the associativity cannot be found. This
    means pcibus_to_cpumask has to be careful not to pass a negative index into
    node_to_cpumask.

    Since pcibus_to_node could be used a lot, and of_node_to_nid is slow (it
    walks a list doing strcmps), lets also cache the node in the
    pci_controller struct.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Paul Mackerras

    Anton Blanchard
     
  • Remove some stale POWER3/POWER4/970 on 32bit kernel support.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Paul Mackerras

    Anton Blanchard
     
  • Forthcoming machines will extend the FPSCR to 64 bits. We already
    had a 64-bit save area for the FPSCR, but we need to use a new form
    of the mtfsf instruction. Fortunately this new form is decoded as
    an ordinary mtfsf by existing 64-bit processors.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Paul Mackerras

    Anton Blanchard
     
  • Instead of trying to make PPC64 MSI fit in a Intel-centric MSI layer, a
    simple short-term solution is to hook the pci_{en/dis}able_msi() calls
    and make a machdep call.

    The rest of the MSI functions are superfluous for what is needed at this
    time. Many of which can have machdep calls added as needed.

    Ben and Michael Ellerman are looking into rewrite the MSI layer to be
    more generic. However, in the meantime this works as a interim
    solution.

    Signed-off-by: Jake Moilanen
    Signed-off-by: Paul Mackerras

    Jake Moilanen
     
  • Some POWER5+ machines can do 64k hardware pages for normal memory but
    not for cache-inhibited pages. This patch lets us use 64k hardware
    pages for most user processes on such machines (assuming the kernel
    has been configured with CONFIG_PPC_64K_PAGES=y). User processes
    start out using 64k pages and get switched to 4k pages if they use any
    non-cacheable mappings.

    With this, we use 64k pages for the vmalloc region and 4k pages for
    the imalloc region. If anything creates a non-cacheable mapping in
    the vmalloc region, the vmalloc region will get switched to 4k pages.
    I don't know of any driver other than the DRM that would do this,
    though, and these machines don't have AGP.

    When a region gets switched from 64k pages to 4k pages, we do not have
    to clear out all the 64k HPTEs from the hash table immediately. We
    use the _PAGE_COMBO bit in the Linux PTE to indicate whether the page
    was hashed in as a 64k page or a set of 4k pages. If hash_page is
    trying to insert a 4k page for a Linux PTE and it sees that it has
    already been inserted as a 64k page, it first invalidates the 64k HPTE
    before inserting the 4k HPTE. The hash invalidation routines also use
    the _PAGE_COMBO bit, to determine whether to look for a 64k HPTE or a
    set of 4k HPTEs to remove. With those two changes, we can tolerate a
    mix of 4k and 64k HPTEs in the hash table, and they will all get
    removed when the address space is torn down.

    Signed-off-by: Paul Mackerras

    Paul Mackerras
     

12 Jun, 2006

1 commit


09 Jun, 2006

9 commits

  • This gives the ability to control whether alignment exceptions get
    fixed up or reported to the process as a SIGBUS, using the existing
    PR_SET_UNALIGN and PR_GET_UNALIGN prctls. We do not implement the
    option of logging a message on alignment exceptions.

    Signed-off-by: Paul Mackerras

    Paul Mackerras
     
  • This adds the PowerPC part of the code to allow processes to change
    their endian mode via prctl.

    This also extends the alignment exception handler to be able to fix up
    alignment exceptions that occur in little-endian mode, both for
    "PowerPC" little-endian and true little-endian.

    We always enter signal handlers in big-endian mode -- the support for
    little-endian mode does not amount to the creation of a little-endian
    user/kernel ABI. If the signal handler returns, the endian mode is
    restored to what it was when the signal was delivered.

    We have two new kernel CPU feature bits, one for PPC little-endian and
    one for true little-endian. Most of the classic 32-bit processors
    support PPC little-endian, and this is reflected in the CPU feature
    table. There are two corresponding feature bits reported to userland
    in the AT_HWCAP aux vector entry.

    This is based on an earlier patch by Anton Blanchard.

    Signed-off-by: Paul Mackerras

    Paul Mackerras
     
  • POWER6 moves some of the MMCRA bits and also requires some bits to be
    cleared each PMU interrupt.

    Signed-off-by: Michael Neuling
    Acked-by: Anton Blanchard
    Signed-off-by: Paul Mackerras

    Michael Neuling
     
  • Make sure dma_alloc_coherent allocates memory from the local node. This
    is important on Cell where we avoid going through the slow cpu
    interconnect.

    Note: I could only test this patch on Cell, it should be verified on
    some pseries machine by those that have the hardware.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Paul Mackerras

    Christoph Hellwig
     
  • On 64bit powerpc we can find out what node a pci bus hangs off, so
    implement the topology.h macros that export this information.

    For 32bit this seems a little more difficult, but I don't know of 32bit
    powerpc NUMA machines either, so let's leave it out for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Paul Mackerras

    Christoph Hellwig
     
  • This patch attempts to handle RTAS "busy" return codes in a more simple
    and consistent manner. Typical callers of RTAS shouldn't have to
    manage wait times and delay calls.

    This patch also changes the kernel to use msleep() rather than udelay()
    when a runtime delay is necessary. This will avoid CPU soft lockups
    for extended delay conditions.

    Signed-off-by: John Rose
    Signed-off-by: Paul Mackerras

    John Rose
     
  • Our MMU hash management code would not set the "C" bit (changed bit) in
    the hardware PTE when updating a RO PTE into a RW PTE. That would cause
    the hardware to possibly to a write back to the hash table to set it on
    the first store access, which in addition to being a performance issue,
    might also hit a bug when running with native hash management (non-HV)
    as our code is specifically optimized for the case where no write back
    happens.

    Thus there is a very small therocial window were a hash PTE can become
    corrupted if that HPTE has just been upgraded to read write, a store
    access happens on it, and that races with another processor evicting
    that same slot. Since eviction (caused by an almost full hash) is
    extremely rare, the bug is very unlikely to happen fortunately.

    This fixes by allowing the updating of the protection bits in the native
    hash handling to also set (but not clear) the "C" bit, and, in order to
    also improve performances in the general case, by always setting that
    bit on newly inserted hash PTE so that writeback really never happens.

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Paul Mackerras

    Benjamin Herrenschmidt
     
  • This patch cleans up some locking & error handling in the ppc vdso and
    moves the vdso base pointer from the thread struct to the mm context
    where it more logically belongs. It brings the powerpc implementation
    closer to Ingo's new x86 one and also adds an arch_vma_name() function
    allowing to print [vsdo] in /proc//maps if Ingo's x86 vdso patch is
    also applied.

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Paul Mackerras

    Benjamin Herrenschmidt
     
  • I have tested PPC_PTRACE_GETREGS and PPC_PTRACE_SETREGS on umview.

    I do not understand why historically these tags has been defined as
    PPC_PTRACE_GETREGS and PPC_PTRACE_SETREGS instead of simply
    PTRACE_[GS]ETREGS. The other "originality" is that the address must be
    put into the "addr" field instead of the "data" field as stated in the
    manual.

    Signed-off-by: renzo davoli
    Signed-off-by: Paul Mackerras

    Renzo Davoli
     

01 Jun, 2006

1 commit


27 May, 2006

1 commit

  • Some driver wants to use CMSPAR, but it was missing on alpha and powerpc.
    This adds it, with the same value as every other architecture uses.

    (akpm: fixes the build of an upcoming gregkh USB patch)

    Signed-off-by: Paul Mackerras
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Mackerras
     

24 May, 2006

5 commits