01 Jul, 2006

1 commit


24 Jun, 2006

1 commit


20 Jun, 2006

1 commit

  • This is the first in a series of cleanups that will hopefully
    allow a seamless attempt at using the generic IRQ handling
    infrastructure in the Linux kernel.

    Define PIL_DEVICE_IRQ and vector all device interrupts through
    there.

    Get rid of the ugly pil0_dummy_{bucket,desc}, instead vector
    the timer interrupt directly to a specific handler since the
    timer interrupt is the only event that will be signaled on
    PIL 14.

    The irq_worklist is now in the per-cpu trap_block[].

    Signed-off-by: David S. Miller

    David S. Miller
     

10 Jun, 2006

1 commit


10 Apr, 2006

1 commit


28 Mar, 2006

1 commit

  • The kernel's implementation of notifier chains is unsafe. There is no
    protection against entries being added to or removed from a chain while the
    chain is in use. The issues were discussed in this thread:

    http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2

    We noticed that notifier chains in the kernel fall into two basic usage
    classes:

    "Blocking" chains are always called from a process context
    and the callout routines are allowed to sleep;

    "Atomic" chains can be called from an atomic context and
    the callout routines are not allowed to sleep.

    We decided to codify this distinction and make it part of the API. Therefore
    this set of patches introduces three new, parallel APIs: one for blocking
    notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
    really just the old API under a new name). New kinds of data structures are
    used for the heads of the chains, and new routines are defined for
    registration, unregistration, and calling a chain. The three APIs are
    explained in include/linux/notifier.h and their implementation is in
    kernel/sys.c.

    With atomic and blocking chains, the implementation guarantees that the chain
    links will not be corrupted and that chain callers will not get messed up by
    entries being added or removed. For raw chains the implementation provides no
    guarantees at all; users of this API must provide their own protections. (The
    idea was that situations may come up where the assumptions of the atomic and
    blocking APIs are not appropriate, so it should be possible for users to
    handle these things in their own way.)

    There are some limitations, which should not be too hard to live with. For
    atomic/blocking chains, registration and unregistration must always be done in
    a process context since the chain is protected by a mutex/rwsem. Also, a
    callout routine for a non-raw chain must not try to register or unregister
    entries on its own chain. (This did happen in a couple of places and the code
    had to be changed to avoid it.)

    Since atomic chains may be called from within an NMI handler, they cannot use
    spinlocks for synchronization. Instead we use RCU. The overhead falls almost
    entirely in the unregister routine, which is okay since unregistration is much
    less frequent that calling a chain.

    Here is the list of chains that we adjusted and their classifications. None
    of them use the raw API, so for the moment it is only a placeholder.

    ATOMIC CHAINS
    -------------
    arch/i386/kernel/traps.c: i386die_chain
    arch/ia64/kernel/traps.c: ia64die_chain
    arch/powerpc/kernel/traps.c: powerpc_die_chain
    arch/sparc64/kernel/traps.c: sparc64die_chain
    arch/x86_64/kernel/traps.c: die_chain
    drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
    kernel/panic.c: panic_notifier_list
    kernel/profile.c: task_free_notifier
    net/bluetooth/hci_core.c: hci_notifier
    net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
    net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
    net/ipv6/addrconf.c: inet6addr_chain
    net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
    net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
    net/netlink/af_netlink.c: netlink_chain

    BLOCKING CHAINS
    ---------------
    arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
    arch/s390/kernel/process.c: idle_chain
    arch/x86_64/kernel/process.c idle_notifier
    drivers/base/memory.c: memory_chain
    drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
    drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
    drivers/macintosh/adb.c: adb_client_list
    drivers/macintosh/via-pmu.c sleep_notifier_list
    drivers/macintosh/via-pmu68k.c sleep_notifier_list
    drivers/macintosh/windfarm_core.c wf_client_list
    drivers/usb/core/notify.c usb_notifier_list
    drivers/video/fbmem.c fb_notifier_list
    kernel/cpu.c cpu_chain
    kernel/module.c module_notify_list
    kernel/profile.c munmap_notifier
    kernel/profile.c task_exit_notifier
    kernel/sys.c reboot_notifier_list
    net/core/dev.c netdev_chain
    net/decnet/dn_dev.c: dnaddr_chain
    net/ipv4/devinet.c: inetaddr_chain

    It's possible that some of these classifications are wrong. If they are,
    please let us know or submit a patch to fix them. Note that any chain that
    gets called very frequently should be atomic, because the rwsem read-locking
    used for blocking chains is very likely to incur cache misses on SMP systems.
    (However, if the chain's callout routines may sleep then the chain cannot be
    atomic.)

    The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
    material written by Keith Owens and suggestions from Paul McKenney and Andrew
    Morton.

    [jes@sgi.com: restructure the notifier chain initialization macros]
    Signed-off-by: Alan Stern
    Signed-off-by: Chandra Seetharaman
    Signed-off-by: Jes Sorensen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Stern
     

22 Mar, 2006

1 commit


20 Mar, 2006

17 commits

  • Niagara does not implement some of the VIS instructions in
    hardware, so we have to emulate them.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Check TLB flush hypervisor calls for errors and report them.

    Pass HV_MMU_ALL always for now, we can add back the optimization
    to avoid the I-TLB flush later.

    Always explicitly page align the virtual address arguments.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Should be "Dax" not "Iax".

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Actually make use of the 'limit' we compute.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • It's extremely noisy and causes much grief on slow
    consoles with large numbers of cpus.

    We'll have to provide this some saner way in order
    to re-enable this.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • We're about to seriously die in these cases so it is important
    that the messages make it to the console.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • 1) Add error return checking for TLB load hypervisor
    calls.

    2) Don't fallthru to dtlb tsb miss handler from itlb tsb
    miss handler, oops.

    3) On window fixups, propagate fault information to fixup
    handler correctly.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The sibling cpu bringup is extremely fragile. We can only
    perform the most basic calls until we take over the trap
    table from the firmware/hypervisor on the new cpu.

    This means no accesses to %g4, %g5, %g6 since those can't be
    TLB translated without our trap handlers.

    In order to achieve this:

    1) Change sun4v_init_mondo_queues() so that it can operate in
    several modes.

    It can allocate the queues, or install them in the current
    processor, or both.

    The boot cpu does both in it's call early on.

    Later, the boot cpu allocates the sibling cpu queue, starts
    the sibling cpu, then the sibling cpu loads them in.

    2) init_cur_cpu_trap() is changed to take the current_thread_info()
    as an argument instead of reading %g6 directly on the current
    cpu.

    3) Create a trampoline stack for the sibling cpus. We do our basic
    kernel calls using this stack, which is locked into the kernel
    image, then go to our proper thread stack after taking over the
    trap table.

    4) While we are in this delicate startup state, we put 0xdeadbeef
    into %g4/%g5/%g6 in order to catch accidental accesses.

    5) On the final prom_set_trap_table*() call, we put &init_thread_union
    into %g6. This is a hack to make prom_world(0) work. All that
    wants to do is restore the %asi register using
    get_thread_current_ds().

    Longer term we should just do the OBP calls to set the trap table by
    hand just like we do for everything else. This would avoid that silly
    prom_world(0) issue, then we can remove the init_thread_union hack.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • No trap levels above 2 in privileged mode on SUN4V.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The trap code was calling itself :-)

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: David S. Miller

    David S. Miller
     
  • Technically the hypervisor call supports sending in a list
    of all cpus to get the cross-call, but I only pass in one
    cpu at a time for now.

    The multi-cpu support is there, just ifdef'd out so it's easy to
    enable or delete it later.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Sun4v has 4 interrupt queues: cpu, device, resumable errors,
    and non-resumable errors. A set of head/tail offset pointers
    help maintain a work queue in physical memory. The entries
    are 64-bytes in size.

    Each queue is allocated then registered with the hypervisor
    as we bring cpus up.

    The two error queues each get a kernel side buffer that we
    use to quickly empty the main interrupt queue before we
    call up to C code to log the event and possibly take evasive
    action.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: David S. Miller

    David S. Miller
     
  • On uniprocessor, it's always zero for optimize that.

    On SMP, the jmpl to the stub kills the return address stack in the cpu
    branch prediction logic, so expand the code sequence inline and use a
    code patching section to fix things up. This also always better and
    explicit register selection, which will be taken advantage of in a
    future changeset.

    The hard_smp_processor_id() function is big, so do not inline it.

    Fix up tests for Jalapeno to also test for Serrano chips too. These
    tests want "jbus Ultra-IIIi" cases to match, so that is what we should
    test for.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The are distrupting, which by the sparc v9 definition means they
    can only occur when interrupts are enabled in the %pstate register.
    This never occurs in any of the trap handling code running at
    trap levels > 0.

    So just mark it as an unexpected trap.

    This allows us to kill off the cee_stuff member of struct thread_info.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • UltraSPARC has special sets of global registers which are switched to
    for certain trap types. There is one set for MMU related traps, one
    set of Interrupt Vector processing, and another set (called the
    Alternate globals) for all other trap types.

    For what seems like forever we've hard coded the values in some of
    these trap registers. Some examples include:

    1) Interrupt Vector global %g6 holds current processors interrupt
    work struct where received interrupts are managed for IRQ handler
    dispatch.

    2) MMU global %g7 holds the base of the page tables of the currently
    active address space.

    3) Alternate global %g6 held the current_thread_info() value.

    Such hardcoding has resulted in some serious issues in many areas.
    There are some code sequences where having another register available
    would help clean up the implementation. Taking traps such as
    cross-calls from the OBP firmware requires some trick code sequences
    wherein we have to save away and restore all of the special sets of
    global registers when we enter/exit OBP.

    We were also using the IMMU TSB register on SMP to hold the per-cpu
    area base address, which doesn't work any longer now that we actually
    use the TSB facility of the cpu.

    The implementation is pretty straight forward. One tricky bit is
    getting the current processor ID as that is different on different cpu
    variants. We use a stub with a fancy calling convention which we
    patch at boot time. The calling convention is that the stub is
    branched to and the (PC - 4) to return to is in register %g1. The cpu
    number is left in %g6. This stub can be invoked by using the
    __GET_CPUID macro.

    We use an array of per-cpu trap state to store the current thread and
    physical address of the current address space's page tables. The
    TRAP_LOAD_THREAD_REG loads %g6 with the current thread from this
    table, it uses __GET_CPUID and also clobbers %g1.

    TRAP_LOAD_IRQ_WORK is used by the interrupt vector processing to load
    the current processor's IRQ software state into %g6. It also uses
    __GET_CPUID and clobbers %g1.

    Finally, TRAP_LOAD_PGD_PHYS loads the physical address base of the
    current address space's page tables into %g7, it clobbers %g1 and uses
    __GET_CPUID.

    Many refinements are possible, as well as some tuning, with this stuff
    in place.

    Signed-off-by: David S. Miller

    David S. Miller
     

13 Jan, 2006

1 commit


30 Sep, 2005

1 commit


29 Sep, 2005

3 commits


26 Sep, 2005

1 commit

  • At boot time, determine the D-cache, I-cache and E-cache size and
    line-size. Use them in cache flushes when appropriate.

    This change was motivated by discovering that the D-cache on
    UltraSparc-IIIi and later are 64K not 32K, and the flushes done by the
    Cheetah error handlers were assuming a 32K size.

    There are still some pieces of code that are hard coding things and
    will need to be fixed up at some point.

    While we're here, fix the D-cache and I-cache parity error handlers
    to run with interrupts disabled, and when the trap occurs at trap
    level > 1 log the event via a counter displayed in /proc/cpuinfo.

    Signed-off-by: David S. Miller

    David S. Miller
     

30 Aug, 2005

2 commits

  • Current uncorrectable error handling was poor enough
    that the processor could just loop taking the same
    trap over and over again. Fix things up so that we
    at least get a log message and perhaps even some register
    state.

    In the process, much consolidation became possible,
    particularly with the correctable error handler.

    Prefix assembler and C function names with "spitfire"
    to indicate that these are for Ultra-I/II/IIi/IIe only.

    More work is needed to make these routines robust and
    featureful to the level of the Ultra-III error handlers.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Verify we really are taking a data access exception trap, at TL1, from
    one of the window spill/fill handlers.

    Else call a new function, data_access_exception_tl1, to log the error.

    Signed-off-by: David S. Miller

    David S. Miller
     

20 Aug, 2005

1 commit


25 Jul, 2005

1 commit


24 May, 2005

1 commit

  • Older UltraSPARC-III chips have a P-Cache bug that makes us disable it
    by default at boot time.

    However, this does hurt performance substantially, particularly with
    memcpy(), and the bug is _incredibly_ obscure. I have never seen it
    triggered in practice, ever.

    So provide a "-P" boot option that forces the P-Cache on. It taints
    the kernel, so if it does trigger and cause some data corruption or
    OOPS, we will find out in the logs that this option was on when it
    happened.

    Signed-off-by: David S. Miller

    David S. Miller
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds