07 Dec, 2006

2 commits

  • Move the irqbalance quirks for E7320/E7520/E7525(Errata 23 in
    http://download.intel.com/design/chipsets/specupdt/30304203.pdf) to early
    quirks.

    And add a PCI quirk for these platforms to check(which happens very late
    during the boot) if the APIC routing is indeed set to default flat mode.

    This fixes the breakage(in x86_64) of this quirk due to cpu hotplug which
    selects physical mode instead of the logical flat(as needed for this errata
    workaround).

    Signed-off-by: Suresh Siddha
    Signed-off-by: Andi Kleen
    Cc: Andi Kleen
    Cc: "Li, Shaohua"
    Signed-off-by: Andrew Morton

    Siddha, Suresh B
     
  • The function doesn't exist (anymore).

    Signed-off-by: Jan Beulich
    Signed-off-by: Andi Kleen

    Jan Beulich
     

22 Oct, 2006

1 commit

  • By default route the 8254 over the 8259 and only disable
    it on ATI boards where this causes double timer interrupts.

    This should unbreak some Nvidia boards where the timer doesn't
    seem to tick of it isn't enabled in the 8259. At least one
    VIA board also seemed to have a little trouble with the disabled
    8259.

    For 2.6.20 we'll try both dynamically without black listing, but I think
    for .19 this is the safer approach because it has been already well tested
    in earlier kernels. This also makes the x86-64 behaviour the same
    as i386.

    Command line options can change all this of course.

    Signed-off-by: Andi Kleen

    Andi Kleen
     

05 Oct, 2006

1 commit

  • Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
    of passing regs around manually through all ~1800 interrupt handlers in the
    Linux kernel.

    The regs pointer is used in few places, but it potentially costs both stack
    space and code to pass it around. On the FRV arch, removing the regs parameter
    from all the genirq function results in a 20% speed up of the IRQ exit path
    (ie: from leaving timer_interrupt() to leaving do_IRQ()).

    Where appropriate, an arch may override the generic storage facility and do
    something different with the variable. On FRV, for instance, the address is
    maintained in GR28 at all times inside the kernel as part of general exception
    handling.

    Having looked over the code, it appears that the parameter may be handed down
    through up to twenty or so layers of functions. Consider a USB character
    device attached to a USB hub, attached to a USB controller that posts its
    interrupts through a cascaded auxiliary interrupt controller. A character
    device driver may want to pass regs to the sysrq handler through the input
    layer which adds another few layers of parameter passing.

    I've build this code with allyesconfig for x86_64 and i386. I've runtested the
    main part of the code on FRV and i386, though I can't test most of the drivers.
    I've also done partial conversion for powerpc and MIPS - these at least compile
    with minimal configurations.

    This will affect all archs. Mostly the changes should be relatively easy.
    Take do_IRQ(), store the regs pointer at the beginning, saving the old one:

    struct pt_regs *old_regs = set_irq_regs(regs);

    And put the old one back at the end:

    set_irq_regs(old_regs);

    Don't pass regs through to generic_handle_irq() or __do_IRQ().

    In timer_interrupt(), this sort of change will be necessary:

    - update_process_times(user_mode(regs));
    - profile_tick(CPU_PROFILING, regs);
    + update_process_times(user_mode(get_irq_regs()));
    + profile_tick(CPU_PROFILING);

    I'd like to move update_process_times()'s use of get_irq_regs() into itself,
    except that i386, alone of the archs, uses something other than user_mode().

    Some notes on the interrupt handling in the drivers:

    (*) input_dev() is now gone entirely. The regs pointer is no longer stored in
    the input_dev struct.

    (*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does
    something different depending on whether it's been supplied with a regs
    pointer or not.

    (*) Various IRQ handler function pointers have been moved to type
    irq_handler_t.

    Signed-Off-By: David Howells
    (cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)

    David Howells
     

27 Sep, 2006

1 commit


26 Sep, 2006

4 commits


27 Jun, 2006

4 commits

  • This patch creates a new interface for IOMMUs by adding a centralized
    location for IOMMU allocation (for translation tables/apertures) and
    IOMMU initialization. In creating these, code was moved around for
    abstraction, uniformity, and consiceness.

    Take note of the move of the iommu_setup bootarg parsing code to
    __setup. This is enabled by moving back the location of the aperture
    allocation/detection to mem init (which while ugly, was already the
    location of the swiotlb_init).

    While a slight departure from the previous patch, I belive this provides
    the true intention of the previous versions of the patch which changed
    this code. It also makes the addition of the upcoming calgary code much
    cleaner than previous patches.

    [AK: Removed one broken change. iommu_setup still has to be called
    early]

    Signed-off-by: Muli Ben-Yehuda
    Signed-off-by: Jon Mason
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Jon Mason
     
  • swiotlb relies on the gart specific iommu_aperture variable to know if
    we discovered a hardware IOMMU before swiotlb initialization. Introduce
    iommu_detected to do the same thing, but in a HW IOMMU neutral manner,
    in preparation for adding the Calgary HW IOMMU.

    Signed-Off-By: Muli Ben-Yehuda
    Signed-Off-By: Jon Mason
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Jon Mason
     
  • These are the x86_64-specific pieces to enable reliable stack traces. The
    only restriction with this is that it currently cannot unwind across the
    interrupt->normal stack boundary, as that transition is lacking proper
    annotation.

    Signed-off-by: Jan Beulich
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • - Rename the GART_IOMMU option to IOMMU to make clear it's not
    just for AMD
    - Rewrite the help text to better emphatise this fact
    - Make it an embedded option because too many people get it wrong.

    To my astonishment I discovered the aacraid driver tests this
    symbol directly. This looks quite broken to me - it's an internal
    implementation detail of the PCI DMA API. Can the maintainer
    please clarify what this test was intended to do?

    Cc: linux-scsi@vger.kernel.org
    Cc: alan@redhat.com
    Cc: markh@osdl.org
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

26 Mar, 2006

2 commits


27 Feb, 2006

2 commits

  • SMP time selection originally ran after all CPUs were brought up because
    it needed to know the number of CPUs to decide if it needs an MP safe
    timer or not.

    This is not needed anymore because we know present CPUs early.

    This fixes a couple of problems:
    - apicmaintimer didn't always work because it relied on state that was
    set up time_init_gtod too late.
    - The output for the used timer in early kernel log was misleading
    because time_init_gtod could actually change it later. Now always
    print the final timer choice

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • It didn't set up the CPU possible map early enough, so the
    option didn't actually work.

    Noticed by Heiko Carstens

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

18 Feb, 2006

1 commit


05 Feb, 2006

4 commits

  • On some broken motherboards (at least one NForce3 based AMD64 laptop)
    the PIT timer runs at a incorrect frequency. This patch adds a new
    option "apicpmtimer" that allows to use the APIC timer and calibrate it
    using the PMTimer. It requires the earlier patch that allows to run the
    main timer from the APIC.

    Specifying apicpmtimer implies apicmaintimer.

    The option defaults to off for now.

    I tested it on a few systems and the resulting APIC timer frequencies
    were usually a bit off, but always
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • At resume time, TSC's value or something similar might be changed a lot
    against suspend time. This could make system gets a very big lost ticks.
    See http://bugzilla.kernel.org/show_bug.cgi?id=5825

    Signed-off-by: Shaohua Li
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Shaohua Li
     
  • Another piece from the no-idle-tick patch.

    This can be enabled with the "apicmaintimer" option.

    This is mainly useful when the PIT/HPET interrupt is unreliable.
    Note there are some systems that are known to stop the APIC
    timer in C3. For those it will never work, but this case
    should be automatically detected.

    It also only works with PM timer right now. When HPET is used
    the way the main timer handler computes the delay doesn't work.

    It should be a bit more efficient because there is one less
    regular interrupt to process on the boot processor.

    Requires earlier bugfix from Venkatesh

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Avoids some ifdef mess later.

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

12 Jan, 2006

3 commits

  • They already do this in hardware and the Linux algorithm
    actually adds errors.

    Cc: mingo@elte.hu
    Cc: rohit.seth@intel.com

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • cpumask.h wasn't included implicitely into proto.h in this case.
    Just move it over to smp.h

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • AK: I hacked Muli's original patch a lot and there were a lot
    of changes - all bugs are probably to blame on me now.
    There were also some changes in the fall back behaviour
    for swiotlb - in particular it doesn't try to use GFP_DMA
    now anymore. Also all DMA mapping operations use the
    same core dma_alloc_coherent code with proper fallbacks now.
    And various other changes and cleanups.

    Known problems: iommu=force swiotlb=force together breaks
    needs more testing.

    This patch cleans up x86_64's DMA mapping dispatching code. Right now
    we have three possible IOMMU types: AGP GART, swiotlb and nommu, and
    in the future we will also have Xen's x86_64 swiotlb and other HW
    IOMMUs for x86_64. In order to support all of them cleanly, this
    patch:

    - introduces a struct dma_mapping_ops with function pointers for each
    of the DMA mapping operations of gart (AMD HW IOMMU), swiotlb
    (software IOMMU) and nommu (no IOMMU).

    - gets rid of:

    if (swiotlb)
    return swiotlb_xxx();

    - PCI_DMA_BUS_IS_PHYS is now checked against the dma_ops being set
    This makes swiotlb faster by avoiding double copying in some cases.

    Signed-Off-By: Muli Ben-Yehuda
    Signed-Off-By: Jon D. Mason
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Muli Ben-Yehuda
     

15 Nov, 2005

2 commits

  • We should zap the low mappings, as soon as possible, so that we can catch
    kernel bugs more effectively. Previously early boot had NULL mapped
    and didn't trap on NULL references.

    This patch introduces boot_level4_pgt, which will always have low identity
    addresses mapped. Druing boot, all the processors will use this as their
    level4 pgt. On BP, we will switch to init_level4_pgt as soon as we enter C
    code and zap the low mappings as soon as we are done with the usage of
    identity low mapped addresses. On AP's we will zap the low mappings as
    soon as we jump to C code.

    Signed-off-by: Suresh Siddha
    Signed-off-by: Ashok Raj
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Siddha, Suresh B
     
  • Add a new 4GB GFP_DMA32 zone between the GFP_DMA and GFP_NORMAL zones.

    As a bit of historical background: when the x86-64 port
    was originally designed we had some discussion if we should
    use a 16MB DMA zone like i386 or a 4GB DMA zone like IA64 or
    both. Both was ruled out at this point because it was in early
    2.4 when VM is still quite shakey and had bad troubles even
    dealing with one DMA zone. We settled on the 16MB DMA zone mainly
    because we worried about older soundcards and the floppy.

    But this has always caused problems since then because
    device drivers had trouble getting enough DMA able memory. These days
    the VM works much better and the wide use of NUMA has proven
    it can deal with many zones successfully.

    So this patch adds both zones.

    This helps drivers who need a lot of memory below 4GB because
    their hardware is not accessing more (graphic drivers - proprietary
    and free ones, video frame buffer drivers, sound drivers etc.).
    Previously they could only use IOMMU+16MB GFP_DMA, which
    was not enough memory.

    Another common problem is that hardware who has full memory
    addressing for >4GB misses it for some control structures in memory
    (like transmit rings or other metadata). They tended to allocate memory
    in the 16MB GFP_DMA or the IOMMU/swiotlb then using pci_alloc_consistent,
    but that can tie up a lot of precious 16MB GFPDMA/IOMMU/swiotlb memory
    (even on AMD systems the IOMMU tends to be quite small) especially if you have
    many devices. With the new zone pci_alloc_consistent can just put
    this stuff into memory below 4GB which works better.

    One argument was still if the zone should be 4GB or 2GB. The main
    motivation for 2GB would be an unnamed not so unpopular hardware
    raid controller (mostly found in older machines from a particular four letter
    company) who has a strange 2GB restriction in firmware. But
    that one works ok with swiotlb/IOMMU anyways, so it doesn't really
    need GFP_DMA32. I chose 4GB to be compatible with IA64 and because
    it seems to be the most common restriction.

    The new zone is so far added only for x86-64.

    For other architectures who don't set up this
    new zone nothing changes. Architectures can set a compatibility
    define in Kconfig CONFIG_DMA_IS_DMA32 that will define GFP_DMA32
    as GFP_DMA. Otherwise it's a nop because on 32bit architectures
    it's normally not needed because GFP_NORMAL (=0) is DMA able
    enough.

    One problem is still that GFP_DMA means different things on different
    architectures. e.g. some drivers used to have #ifdef ia64 use GFP_DMA
    (trusting it to be 4GB) #elif __x86_64__ (use other hacks like
    the swiotlb because 16MB is not enough) ... . This was quite
    ugly and is now obsolete.

    These should be now converted to use GFP_DMA32 unconditionally. I haven't done
    this yet. Or best only use pci_alloc_consistent/dma_alloc_coherent
    which will use GFP_DMA32 transparently.

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

13 Sep, 2005

2 commits


11 Sep, 2005

1 commit


08 Jul, 2005

1 commit

  • There has been some discuss about solving the SMP MTRR suspend/resume
    breakage, but I didn't find a patch for it. This is an intent for it. The
    basic idea is moving mtrr initializing into cpu_identify for all APs (so it
    works for cpu hotplug). For BP, restore_processor_state is responsible for
    restoring MTRR.

    Signed-off-by: Shaohua Li
    Acked-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     

17 May, 2005

1 commit

  • There are unfortunately more and more multi processor Opteron systems which
    don't have HPET timer support in the southbridge. This covers in particular
    Nvidia and VIA chipsets. They also don't guarantee that the TSCs are
    synchronized between CPUs; and especially with MP powernow the systems are
    nearly unusable because the time gets very inconsistent between CPUs.

    The timer code for x86-64 was originally written under the assumption that we
    could fall back to the HPET timer on such systems. But this doesn't work
    there.

    Another alternative is to use the ACPI PM timer as primary time source. This
    patch does that. The kernel only uses PM timer when there is no other choice
    because it has some disadvantages.

    Ported over from i386. It should be faster than the i386 version because I
    dropped the "read three times" workaround, but is still considerable slower
    than HPET and also does not work together with vsyscalls which have to be
    disabled.

    Cc:
    Signed-off-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

17 Apr, 2005

3 commits

  • This will allow hotplug CPU in the future and in general cleans up a lot of
    crufty code. It also should plug some races that the old hackish way
    introduces. Remove one old race workaround in NMI watchdog setup that is not
    needed anymore.

    I removed the old total sum of bogomips reporting code. The brag value of
    BogoMips has been greatly devalued in the last years on the open market.

    Real CPU hotplug will need some more work, but the infrastructure for it is
    there now.

    One drawback: the new TSC sync algorithm is less accurate than before. The
    old way of zeroing TSCs is too intrusive to do later. Instead the TSC of the
    BP is duplicated now, which is less accurate.

    akpm:

    - sync_tsc_bp_init seems to have the sense of `init' inverted.

    - SPIN_LOCK_UNLOCKED is deprecated - use DEFINE_SPINLOCK.

    Cc:
    Cc:
    Signed-off-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Use a real VMA to map the 32bit vsyscall page

    This interacts better with Hugh's upcomming VMA walk optimization
    Also removes some ugly special cases.

    Code roughly modelled after the ppc64 vdso version from Ben Herrenschmidt.

    Signed-off-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds