13 Oct, 2007

1 commit


01 Jun, 2007

3 commits

  • With these two lines in the reverse order the drives/block/ccis.c was
    oopsing in msi_free_irqs. Silly us calling writel on an area after
    we unmap it.

    BUG: unable to handle kernel paging request at virtual address f8b2200c
    printing eip:
    c01e9cc7
    *pdpt = 0000000000003001
    *pde = 0000000037e48067
    *pte = 0000000000000000
    Oops: 0002 [#1]
    SMP
    Modules linked in: cciss ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core
    sunrpc loop dm_multipath button battery asus_acpi ac tg3 floppy sg dm_snapshot
    dm_zero dm_mirror ext3 jbd dm_mod ata_piix libata mptsas scsi_transport_sas
    mptspi scsi_transport_spi mptscsih mptbase sd_mod scsi_mod
    CPU: 1
    EIP: 0060:[] Not tainted VLI
    EFLAGS: 00010286 (2.6.22-rc2-gd2579053 #1)
    EIP is at msi_free_irqs+0x81/0xbe
    eax: f8b22000 ebx: f71f3180 ecx: f7fff280 edx: c1886eb8
    esi: f7c4e800 edi: f7c4ec48 ebp: 00000002 esp: f5a0dec8
    ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
    Process rmmod (pid: 5286, ti=f5a0d000 task=c47d2550 task.ti=f5a0d000)
    Stack: 00000002 f8b72294 00000400 f8b69ca7 f8b6bc6c 00000002 00000000 00000000
    00000000 00000000 00000000 f5a997f4 f8b69d61 f7c5a4b0 f7c4e848 f7c4e848
    f7c4e800 f7c4e800 f8b72294 f7c4e848 f8b72294 c01e3cdf f7c4e848 c024c469
    Call Trace:
    [] cciss_shutdown+0xae/0xc3 [cciss]
    [] cciss_remove_one+0xa5/0x178 [cciss]
    [] pci_device_remove+0x16/0x35
    [] __device_release_driver+0x71/0x8e
    [] driver_detach+0xa0/0xde
    [] bus_remove_driver+0x27/0x41
    [] pci_unregister_driver+0xb/0x13
    [] cciss_cleanup+0xf/0x51 [cciss]
    [] sys_delete_module+0x110/0x135
    [] sysenter_past_esp+0x5f/0x85

    Here's a patch that just reverses the 2 lines of code as Eric suggests. Please
    consider this for inclusion.

    Signed-off-by: Mike Miller
    Signed-off-by: Chase Maupin
    Signed-off-by: "Eric W. Biederman"
    Cc: Andi Kleen
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • "Mike Miller (OS Dev)" writes:

    Found what seems the problem with our vectors being listed backward. In
    drivers/pci/msi.c we should be using list_add_tail rather than list_add to
    preserve the ordering across various kernels. Please consider this for
    inclusion.

    Signed-off-by: "Eric W. Biederman"
    Screwed-up-by: Michael Ellerman
    Cc: "Mike Miller (OS Dev)"
    Cc: Andi Kleen
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • In file included from drivers/pci/msi.c:22:
    include/asm/smp.h:17:26: asm/arch/smp.h: No such file or directory
    include/asm/smp.h:20:3: #error " included in non-SMP build"
    include/asm/smp.h:23:1: warning: "raw_smp_processor_id" redefined
    In file included from include/linux/sched.h:65,
    from include/linux/mm.h:4,
    from drivers/pci/msi.c:10:
    include/linux/smp.h:85:1: warning: this is the location of the previous
    definition

    Tested on powerpc, i386, and x86_64.

    Signed-off-by: Dan Williams
    Acked-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Dan Williams
     

12 May, 2007

1 commit

  • Today's find is a triggered assertion in msi_free_irqs() when the system
    doesn't support MSI, in which case arch_setup_msi_irqs() always returns
    an error.

    The problem is that when this happens we branch into msi_free_irqs(), to
    which you added the following assertion loop:

    list_for_each_entry(entry, &dev->msi_list, list)
    BUG_ON(irq_has_action(entry->irq));

    Well, if arch_setup_msi_irqs() fails, entry->irq will be zero and
    although that's never assigned to any normal devices we use that IRQ
    number for the timer interrupt on sparc64 so this assertion triggers.

    Better to test for zero before doing the irq_has_action() assertion
    thing.

    Signed-off-by: David S. Miller
    Signed-off-by: Linus Torvalds

    David Miller
     

09 May, 2007

1 commit


03 May, 2007

18 commits

  • This patch introduces an optional function, arch_teardown_msi_irqs(),
    which gives an arch the opportunity to do per-device teardown for
    MSI/X. If that's not required, the default version simply calls
    arch_teardown_msi_irq() for each msi irq required.

    arch_teardown_msi_irqs() is simply passed a pdev, attached to the pdev
    is a list of msi_descs, it is up to the arch to free the irq associated
    with each of these as appropriate.

    For archs that _don't_ implement arch_teardown_msi_irqs(), all msi_descs
    with irq == 0 are considered unallocated, and the arch teardown routine
    is not called on them.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • This patch introduces an optional function, arch_setup_msi_irqs(),
    (note the plural) which gives an arch the opportunity to do per-device
    setup for MSI/X and then allocate all the requested MSI/Xs at once.

    If that's not required by the arch, the default version simply calls
    arch_setup_msi_irq() for each MSI irq required.

    arch_setup_msi_irqs() is passed a pdev, attached to the pdev is a list
    of msi_descs with irq == 0, it is up to the arch to connect these up to
    an irq (via set_irq_msi()) or return an error. For convenience the number
    of vectors and the type are passed also.

    All msi_descs with irq != 0 are considered allocated, and the arch
    teardown routine will be called on them when necessary.

    The existing semantics of pci_enable_msix() are that if the requested
    number of irqs can not be allocated, the maximum number that _could_ be
    allocated is returned. To support that, we define that in case of an
    error from arch_setup_msi_irqs(), the number of msi_descs with irq != 0
    are considered allocated, and are counted toward the "max that could be
    allocated".

    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • set_irq_msi() currently connects an irq_desc to an msi_desc. The archs call
    it at some point in their setup routine, and then the generic code sets up the
    reverse mapping from the msi_desc back to the irq.

    set_irq_msi() should do both connections, making it the one and only call
    required to connect an irq with it's MSI desc and vice versa.

    The arch code MUST call set_irq_msi(), and it must do so only once it's sure
    it's not going to fail the irq allocation.

    Given that there's no need for the arch to return the irq anymore, the return
    value from the arch setup routine just becomes 0 for success and anything else
    for failure.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • Now that we keep a list of msi descriptors, we don't need first_msi_irq
    in the pci dev.

    If we somehow have zero MSIs configured list_entry() will give us weird
    oopes or nice memory corruption bugs. So be paranoid. Add BUG_ONs and also
    a check in pci_msi_check_device() to make sure nvec > 0.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • The msi descriptors are linked together with what looks a lot like
    a linked list, but isn't a struct list_head list. Make it one.

    The only complication is that previously we walked a list of irqs, and
    got the descriptor for each with get_irq_msi(). Now we have a list of
    descriptors and need to get the irq out of it, so it needs to be in the
    actual struct msi_desc. We use 0 to indicate no irq is setup.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • Add an arch_check_device(), which gives archs a chance to check the input
    to pci_enable_msi/x. The arch might be interested in the value of nvec so
    pass it in. Propagate the error value returned from the arch routine out
    to the caller.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • As pointed out by Eric, the name pci_msi_supported() suggests it should
    return a boolean value, however it doesn't. So update the name to be
    a bit less confusing and update the doco too.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • Consolidate precondition checks into a single if statement.

    Signed-off-by: Michael Ellerman
    Acked-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • pci_enable_msi() and pci_enable_msix() both search for the MSI/MSI-X
    capability, we can fold this into pci_msi_supported() by passing the
    type in.

    Update the code to match the comment for pci_msi_supported(). That is
    it returns 0 on success, and anything else indicates an error.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • We don't need a special cache just for msi descriptors. They're not
    particularly large, under 100 bytes for sure, and don't seem to require any
    special alignment etc. On most systems there will be relatively few MSIs,
    and hence we waste most of a page on the cache. Better to just kzalloc the
    space for the few we do need.

    Signed-off-by: Michael Ellerman
    Acked-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • Move EXPORT_SYMBOL()s near their definition.

    Signed-off-by: Michael Ellerman
    Acked-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • When freeing MSIs and MSI-Xs, we BUG_ON() if the irq has not been
    freed, ie. if it still has an action. We can consolidate all of these
    BUG_ON()s into msi_free_irqs() as all the code paths lead there almost
    immediately anyway.

    Signed-off-by: Michael Ellerman
    Acked-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • For the MSI-X case we do exactly the same logic in pci_disable_msix() and
    msi_remove_pci_irq_vectors(), so consolidate them.

    msi_remove_pci_irq_vectors() wasn't setting dev->first_msi_irq to 0, but
    I think it should have been, so the consolidated version does.

    Signed-off-by: Michael Ellerman
    Acked-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • Although it might be nice to do a printk before BUG'ing, it's really not
    necessary, and it complicates the code.

    The behaviour has changed slightly, in that before we set a flag if the irq
    had an action, and continued freeing the other irqs. But as I see it that's
    all irrelevant because we end up BUG'ing anyway.

    Signed-off-by: Michael Ellerman
    Acked-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • Although it might be nice to do a printk before BUG'ing, it's really not
    necessary, and it complicates the code.

    Signed-off-by: Michael Ellerman
    Acked-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • Although it might be nice to do a printk before BUG'ing, it's really not
    necessary, and it complicates the code.

    The behaviour has changed slightly, in that before we set a flag if the irq
    had an action, and continued freeing the other irqs. But as I see it that's
    all irrelevant because we end up BUG'ing anyway.

    Signed-off-by: Michael Ellerman
    Acked-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • Although it might be nice to do a printk before BUG'ing, it's really not
    necessary, and it complicates the code.

    Signed-off-by: Michael Ellerman
    Acked-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • This patch fixes a kernel bug which is triggered when using the
    irqbalance daemon with MSI-X hardware.

    Because both MSI-X interrupt messages and MSI-X table writes are posted,
    it's possible for them to cross while in-flight. This results in
    interrupts being received long after the kernel thinks they're disabled,
    and in interrupts being sent to stale vectors after rebalancing.

    This patch performs a read flush after writes to the MSI-X table for
    mask and unmask operations. Since the SMP affinity is set while
    the interrupt is masked, and since it's unmasked immediately after,
    no additional flushes are required in the various affinity setting
    routines.

    This patch has been validated with (unreleased) network hardware which
    uses MSI-X.

    Revised with input from Eric Biederman.

    Signed-off-by: Mitch Williams
    Acked-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Mitch Williams
     

04 Apr, 2007

1 commit

  • This is a simplified and actually more comprehensive form of a bug
    fix from Mitch Williams .

    When we mask or unmask a msi-x irqs the writes may be posted because
    we are writing to memory mapped region. This means the mask and
    unmask don't happen immediately but at some unspecified time in the
    future. Which is out of sync with how the mask/unmask logic work
    for ioapic irqs.

    The practical result is that we get very subtle and hard to track down
    irq migration bugs.

    This patch performs a read flush after writes to the MSI-X table for mask
    and unmask operations. Since the SMP affinity is set while the interrupt
    is masked, and since it's unmasked immediately after, no additional flushes
    are required in the various affinity setting routines.

    The testing by Mitch Williams on his especially problematic system should
    still be valid as I have only simplified the code, not changed the
    functionality.

    We currently have 7 drivers: cciss, mthca, cxgb3, forceth, s2io,
    pcie/portdrv_core, and qla2xxx in 2.6.21 that are affected by this
    problem when the hardware they driver is plugged into the right slot.

    Given the difficulty of reproducing this bug and tracing it down to
    anything that even remotely resembles a cause, even if people are
    being affected we aren't likely to see many meaningful bug reports, and
    the people who see this bug aren't likely to be able to reproduce this
    bug in a timely fashion. So it is best to get this problem fixed
    as soon as we can so people don't have problems.

    Then if people do have a kernel message stating "No irq for vector" we
    will know it is yet another novel cause that needs a complete new
    investigation.

    Cc: Greg KH
    Cc: Andrew Morton
    Signed-off-by: "Eric W. Biederman"
    Acked-by: Mitch Williams
    Acked-by: "Siddha, Suresh B"
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

13 Mar, 2007

1 commit

  • There are two ways pci_save_state and pci_restore_state are used. As
    helper functions during suspend/resume, and as helper functions around
    a hardware reset event. When used as helper functions around a hardware
    reset event there is no reason to believe the calls will be paired, nor
    is there a good reason to believe that if we restore the msi state from
    before the reset that it will match the current msi state. Since arch
    code may change the msi message without going through the driver, drivers
    currently do not have enough information to even know when to call
    pci_save_state to ensure they will have msi state in sync with the other
    kernel irq reception data structures.

    It turns out the solution is straight forward, cache the state in the
    existing msi data structures (not the magic pci saved things) and
    have the msi code update the cached state each time we write to the hardware.
    This means we never need to read the hardware to figure out what the hardware
    state should be.

    By modifying the caching in this manner we get to remove our save_state
    routines and only need to provide restore_state routines.

    The only fields that were at all tricky to regenerate were the msi and msi-x
    control registers and the way we regenerate them currently is a bit dependent
    upon assumptions on how we use the allow msi registers to be configured and used
    making the code a little bit brittle. If we ever change what cases we allow
    or how we configure the msi bits we can address the fragility then.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman
    Acked-by: Auke Kok
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

05 Mar, 2007

3 commits

  • For devices that do not support msi-x we only support 1 interrupt. Therefore
    we can disable that one interrupt by disabling the msi capability itself. If
    we leave the intx interrupts disabled while we have the msi capability
    disabled no interrupts should be delivered from that device.

    Devices with just the minimal msi support (and thus hitting this code path)
    include things like the intel e1000 nic, so it looks like is going to be a
    fairly common case and thus important to get right.

    Signed-off-by: Eric W. Biederman
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • enable/disable_msi_mode have several side effects which keeps them from being
    generally useful. So this patch replaces them with with two much more
    targeted functions: msi_set_enable and msix_set_enable.

    This patch makes pci_dev->msi_enabled and pci_dev->msix_enabled the definitive
    way to test if linux has enabled the msi capability, and has the appropriate
    msi data structures set up.

    This patch ensures that while writing the msi messages in save/restore and
    during device initialization we have the msi capability disabled so we don't
    get into races. The pci spec requires that we do not have the msi capability
    enabled and the msi messages unmasked while we write the messages. Completely
    disabling the capability is overkill but it is easy :)

    Care has been taken so we never have both a msi capability and intx enabled
    simultaneously. We haven't run into a problem yet but better safe then sorry.

    Signed-off-by: Eric W. Biederman
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • In some cases when we are not using msi we need a way to ensure that the
    hardware does not have an msi capability enabled. Currently the code has been
    calling disable_msi_mode to try and achieve that. However disable_msi_mode
    has several other side effects and is only available when msi support is
    compiled in so it isn't really appropriate.

    Instead this patch implements pci_msi_off which disables all msi and msix
    capabilities unconditionally with no additional side effects.

    pci_disable_device was redundantly clearing the bus master enable flag and
    clearing the msi enable bit. A device that is not allowed to perform bus
    mastering operations cannot generate intx or msi interrupt messages as those
    are essentially a special case of dma, and require bus mastering. So the call
    in pci_disable_device to disable msi capabilities was redundant.

    quirk_pcie_pxh also called disable_msi_mode and is updated to use pci_msi_off.

    Signed-off-by: Eric W. Biederman
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

08 Feb, 2007

10 commits

  • The arch hooks arch_setup_msi_irq and arch_teardown_msi_irq are now
    responsible for allocating and freeing the linux irq in addition to
    setting up the the linux irq to work with the interrupt.

    arch_setup_msi_irq now takes a pci_device and a msi_desc and returns
    an irq.

    With this change in place this code should be useable by all platforms
    except those that won't let the OS touch the hardware like ppc RTAS.

    Signed-off-by: Eric W. Biederman
    Acked-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • We need to be able to get from an irq number to a struct msi_desc.
    The msi_desc array in msi.c had several short comings the big one was
    that it could not be used outside of msi.c. Using irq_data in struct
    irq_desc almost worked except on some architectures irq_data needs to
    be used for something else.

    So this patch adds a msi_desc pointer to irq_desc, adds the appropriate
    wrappers and changes all of the msi code to use them.

    The dynamic_irq_init/cleanup code was tweaked to ensure the new
    field is left in a well defined state.

    Signed-off-by: Eric W. Biederman
    Acked-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • The attach_msi_entry has been reduced to a single simple assignment,
    so for simplicity remove the abstraction and directory perform the
    assignment.

    Signed-off-by: Eric W. Biederman
    Acked-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Since msi_remove_pci_irq_vectors is designed to be called during
    hotplug remove it is actively wrong to query the hardware and expect
    meaningful results back.

    To that end remove the pci_find_capability calls. Testing
    dev->msi_enabled and dev->msix_enabled gives us all of the information
    we need.

    Signed-off-by: Eric W. Biederman
    Acked-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • With the removal of msi_lookup_irq all of the functions using msi_lock
    operated on a single device and none of them could reasonably be
    called on that device at the same time.

    Since what little synchronization that needs to happen needs to happen
    outside of the msi functions, msi_lock could never be contended and as
    such is useless and just complicates the code.

    Signed-off-by: Eric W. Biederman
    Acked-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • The function msi_lookup_irq was horrible. As a side effect of running
    it changed dev->irq, and then the callers would need to change it
    back. In addition it does a global scan through all of the irqs,
    which seems to be the sole justification of the msi_lock.

    To remove the neede for msi_lookup_irq I added first_msi_irq to struct
    pci_dev. Then depending on the context I replaced msi_lookup_irq with
    dev->first_msi_irq, dev->msi_enabled, or dev->msix_enabled.

    msi_enabled and msix_enabled were already present in pci_dev for other
    reasons.

    Signed-off-by: Eric W. Biederman
    Acked-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • The PCI save/restore code doesn't need to care about MSI vs MSI-X, all
    it really wants is to say "save/restore all MSI(-X) info for this device".

    This is borne out in the code, we call the MSI and MSI-X save routines
    side by side, and similarly with the restore routines.

    So combine the MSI/MSI-X routines into pci_save_msi_state() and
    pci_restore_msi_state(). It is up to those routines to decide what state
    needs to be saved.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • pci_scan_msi_device() doesn't do anything anymore, so remove it.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • I don't see any reason why we need pci_msi_quirk, quirk code can just
    call pci_no_msi() instead.

    Remove the check of pci_msi_quirk in msi_init(). This is safe as all
    calls to msi_init() are protected by calls to pci_msi_supported(),
    which checks pci_msi_enable, which is disabled by pci_no_msi().

    The pci_disable_msi routines didn't check pci_msi_quirk, only
    pci_msi_enable, but as far as I can see that was a bug not a feature.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Michael Ellerman
     
  • Cleanup MSI code as follows:

    - fix some types
    - fix strange local variable definition
    - delete unnecessary blank line
    - add comment to #endif which is far from corresponding #ifdef

    Signed-off-by: Satoru Takeuchi
    Signed-off-by: Greg Kroah-Hartman

    Satoru Takeuchi
     

08 Dec, 2006

1 commit