16 Oct, 2007

3 commits

  • Signed-off-by: Jens Axboe

    Jens Axboe
     
  • When the bonding device senses a carrier loss of its active slave it replaces
    that slave with a new one. In between the times when the carrier of an IPoIB
    device goes down and ipoib_neigh is destroyed, it is possible that the
    bonding driver will send a packet on a new slave that uses an old ipoib_neigh.
    This patch detects and prevents this from happenning.

    Signed-off-by: Moni Shoua
    Signed-off-by: Or Gerlitz
    Acked-by: Roland Dreier
    Signed-off-by: Jeff Garzik

    Moni Shoua
     
  • IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
    whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
    created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
    call.

    When using the bonding driver, neighbours are created by the net stack on behalf
    of the bonding (master) device. On the tx flow the bonding code gets an skb such
    that skb->dev points to the master device, it changes this skb to point on the
    slave device and calls the slave hard_start_xmit function.

    Under this scheme, ipoib_neigh_destructor assumption that for each struct
    neighbour it gets, n->dev is an ipoib device and hence netdev_priv(n->dev)
    can be casted to struct ipoib_dev_priv is buggy.

    To fix it, this patch adds a dev field to struct ipoib_neigh which is used
    instead of the struct neighbour dev one, when n->dev->flags has the
    IFF_MASTER bit set.

    Signed-off-by: Moni Shoua
    Signed-off-by: Or Gerlitz
    Acked-by: Roland Dreier
    Signed-off-by: Jeff Garzik

    Moni Shoua
     

15 Oct, 2007

1 commit

  • * master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (207 commits)
    [SCSI] gdth: fix CONFIG_ISA build failure
    [SCSI] esp_scsi: remove __dev{init,exit}
    [SCSI] gdth: !use_sg cleanup and use of scsi accessors
    [SCSI] gdth: Move members from SCp to gdth_cmndinfo, stage 2
    [SCSI] gdth: Setup proper per-command private data
    [SCSI] gdth: Remove gdth_ctr_tab[]
    [SCSI] gdth: switch to modern scsi host registration
    [SCSI] gdth: gdth_interrupt() gdth_get_status() & gdth_wait() fixes
    [SCSI] gdth: clean up host private data
    [SCSI] gdth: Remove virt hosts
    [SCSI] gdth: Reorder scsi_host_template intitializers
    [SCSI] gdth: kill gdth_{read,write}[bwl] wrappers
    [SCSI] gdth: Remove 2.4.x support, in-kernel changelog
    [SCSI] gdth: split out pci probing
    [SCSI] gdth: split out eisa probing
    [SCSI] gdth: split out isa probing
    gdth: Make one abuse of scsi_cmnd less obvious
    [SCSI] NCR5380: Use scsi_eh API for REQUEST_SENSE invocation
    [SCSI] usb storage: use scsi_eh API in REQUEST_SENSE execution
    [SCSI] scsi_error: Refactoring scsi_error to facilitate in synchronous REQUEST_SENSE
    ...

    Linus Torvalds
     

13 Oct, 2007

3 commits

  • This changes the uevent buffer functions to use a struct instead of a
    long list of parameters. It does no longer require the caller to do the
    proper buffer termination and size accounting, which is currently wrong
    in some places. It fixes a known bug where parts of the uevent
    environment are overwritten because of wrong index calculations.

    Many thanks to Mathieu Desnoyers for finding bugs and improving the
    error handling.

    Signed-off-by: Kay Sievers
    Cc: Mathieu Desnoyers
    Cc: Cornelia Huck
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     
  • This adds a 'roles' attribute to rport like transport_fc. The role can
    be initiator or target. That is, the initiator driver creates target
    remote ports and the target driver creates initiator remote ports.

    Signed-off-by: FUJITA Tomonori
    Signed-off-by: Mike Christie
    Signed-off-by: James Bottomley

    FUJITA Tomonori
     
  • This converts ib_srp to use the srp transport class.

    I don't have ib hardware so I've not tested this patch.

    Signed-off-by: FUJITA Tomonori
    Cc: Roland Dreier
    Signed-off-by: James Bottomley

    FUJITA Tomonori
     

12 Oct, 2007

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (87 commits)
    mlx4_core: Fix section mismatches
    IPoIB: Allow setting policy to ignore multicast groups
    IB/mthca: Mark error paths as unlikely() in post_srq_recv functions
    IB/ipath: Minor fix to ordering of freeing and zeroing of tid pages.
    IB/ipath: Remove redundant link state checks
    IB/ipath: Fix IB_EVENT_PORT_ERR event
    IB/ipath: Better handling of unexpected GPIO interrupts
    IB/ipath: Maintain active time on all chips
    IB/ipath: Fix QHT7040 serial number check
    IB/ipath: Indicate a couple of chip bugs to userspace
    IB/ipath: iba6110 rev4 no longer needs recv header overrun workaround
    IB/ipath: Use counters in ipath_poll and cleanup interrupts in ipath_close
    IB/ipath: Remove duplicate copy of LMC
    IB/ipath: Add ability to set the LMC via the sysfs debugging interface
    IB/ipath: Optimize completion queue entry insertion and polling
    IB/ipath: Implement IB_EVENT_QP_LAST_WQE_REACHED
    IB/ipath: Generate flush CQE when QP is in error state
    IB/ipath: Remove redundant code
    IB/ipath: Future proof eeprom checksum code (contents reading)
    IB/ipath: UC RDMA WRITE with IMMEDIATE doesn't send the immediate
    ...

    Linus Torvalds
     

11 Oct, 2007

9 commits

  • Expansion of original idea from Denis V. Lunev

    Add robustness and locking to the local_port_range sysctl.
    1. Enforce that low < high when setting.
    2. Use seqlock to ensure atomic update.

    The locking might seem like overkill, but there are
    cases where sysadmin might want to change value in the
    middle of a DoS attack.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • The conversion to use netdevice internal stats left an unused variable
    in ipoib_neigh_free(), since there's no longer any reason to get
    netdev_priv() in order to increment dropped packets. Delete the
    unused priv variable.

    Signed-off-by: Roland Dreier
    Signed-off-by: Jeff Garzik

    Roland Dreier
     
  • Use the stats member of struct netdevice in IPoIB, so we can save
    memory by deleting the stats member of struct ipoib_dev_priv, and save
    code by deleting ipoib_get_stats().

    Signed-off-by: Roland Dreier
    Signed-off-by: David S. Miller

    Roland Dreier
     
  • Since hardware header operations are part of the protocol class
    not the device instance, make them into a separate object and
    save memory.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • It's been a useless no-op for long enough in 2.6 so I figured it's time to
    remove it. The number of people that could object because they're
    maintaining unified 2.4 and 2.6 drivers is probably rather small.

    [ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ]

    Signed-off-by: Ralf Baechle
    Signed-off-by: Jeff Garzik
    Signed-off-by: David S. Miller

    Ralf Baechle
     
  • This patch makes most of the generic device layer network
    namespace safe. This patch makes dev_base_head a
    network namespace variable, and then it picks up
    a few associated variables. The functions:
    dev_getbyhwaddr
    dev_getfirsthwbytype
    dev_get_by_flags
    dev_get_by_name
    __dev_get_by_name
    dev_get_by_index
    __dev_get_by_index
    dev_ioctl
    dev_ethtool
    dev_load
    wireless_process_ioctl

    were modified to take a network namespace argument, and
    deal with it.

    vlan_ioctl_set and brioctl_set were modified so their
    hooks will receive a network namespace argument.

    So basically anthing in the core of the network stack that was
    affected to by the change of dev_base was modified to handle
    multiple network namespaces. The rest of the network stack was
    simply modified to explicitly use &init_net the initial network
    namespace. This can be fixed when those components of the network
    stack are modified to handle multiple network namespaces.

    For now the ifindex generator is left global.

    Fundametally ifindex numbers are per namespace, or else
    we will have corner case problems with migration when
    we get that far.

    At the same time there are assumptions in the network stack
    that the ifindex of a network device won't change. Making
    the ifindex number global seems a good compromise until
    the network stack can cope with ifindex changes when
    you change namespaces, and the like.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Several devices have multiple independant RX queues per net
    device, and some have a single interrupt doorbell for several
    queues.

    In either case, it's easier to support layouts like that if the
    structure representing the poll is independant from the net
    device itself.

    The signature of the ->poll() call back goes from:

    int foo_poll(struct net_device *dev, int *budget)

    to

    int foo_poll(struct napi_struct *napi, int budget)

    The caller is returned the number of RX packets processed (or
    the number of "NAPI credits" consumed if you want to get
    abstract). The callee no longer messes around bumping
    dev->quota, *budget, etc. because that is all handled in the
    caller upon return.

    The napi_struct is to be embedded in the device driver private data
    structures.

    Furthermore, it is the driver's responsibility to disable all NAPI
    instances in it's ->stop() device close handler. Since the
    napi_struct is privatized into the driver's private data structures,
    only the driver knows how to get at all of the napi_struct instances
    it may have per-device.

    With lots of help and suggestions from Rusty Russell, Roland Dreier,
    Michael Chan, Jeff Garzik, and Jamal Hadi Salim.

    Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
    Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.

    [ Ported to current tree and all drivers converted. Integrated
    Stephen's follow-on kerneldoc additions, and restored poll_list
    handling to the old style to fix mutual exclusion issues. -DaveM ]

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • The kernel IB stack allows (through the RDMA CM) userspace
    applications to join and use multicast groups from the IPoIB MGID
    range. This allows multicast traffic to be handled directly from
    userspace QPs, without going through the kernel stack, which gives
    better performance for some applications.

    However, to fully interoperate with IP multicast, such userspace
    applications need to participate in IGMP reports and queries, or else
    routers may not forward the multicast traffic to the system where the
    application is running. The simplest way to do this is to share the
    kernel IGMP implementation by using the IP_ADD_MEMBERSHIP option to
    join multicast groups that are being handled directly in userspace.

    However, in such cases, the actual multicast traffic should not also
    be handled by the IPoIB interface, because that would burn resources
    handling multicast packets that will just be discarded in the kernel.

    To handle this, this patch adds lookup on the database used for IB
    multicast group reference counting when IPoIB is joining multicast
    groups, and if a multicast group is already handled by user space,
    then the IPoIB kernel driver ignores the group. This is controlled by
    a per-interface policy flag. When the flag is set, IPoIB will not
    join and attach its QP to a multicast group which already has an entry
    in the database; when the flag is cleared, IPoIB will behave as before
    this change.

    For each IPoIB interface, the /sys/class/net/$intf/umcast attribute
    controls the policy flag. The default value is off/0.

    Signed-off-by: Or Gerlitz
    Signed-off-by: Roland Dreier

    Or Gerlitz
     
  • Signed-off-by: Eli Cohen
    Signed-off-by: Roland Dreier

    Eli Cohen
     

10 Oct, 2007

23 commits

  • Fixed to be the same as everywhere else. copy and then zero the page *
    in the array first, and then pass the copy to the VM routines.

    Signed-off-by: Dave Olson
    Signed-off-by: Roland Dreier

    Dave Olson
     
  • This patch removes some redundant checks when the SMA changes the link
    state since the same checks are made in the lower level function that
    sets the state.

    Signed-off-by: Ralph Campbell
    Signed-off-by: Roland Dreier

    Ralph Campbell
     
  • The link state event calls were being generated when the SM told the SMA
    to change link states. This works for IB_EVENT_PORT_ACTIVE but not if
    the link goes down and stays down. The fix is to generate event calls
    from the interrupt handler when the HW link state changes.

    Signed-off-by: Ralph Campbell
    Signed-off-by: Roland Dreier

    Ralph Campbell
     
  • The General Purpose I/O pins can be configured to cause interrupts. At
    the end of the interrupt code dealing with all known causes, a message
    is output if any bits remain un-handled. Since this is a "can't happen"
    scenario, it should only be triggered by bugs elsewhere. It is harmless,
    and potentially beneficial, to limit the damage by masking any such
    unexpected interrupts.

    This patch adds disabling of interrupts from any pins that should
    not have been allowed to interrupt, in addition to emitting a message.

    Signed-off-by: Michael Albaugh
    Signed-off-by: Roland Dreier

    Michael Albaugh
     
  • There is a count of "active hours" maintained in EEPROM, to aid
    troubleshooting. The definition of "active" is based on traffic
    exceeding a threshold in any given 5-second polling interval. As
    originally written, the check was inadvertently bypassed for chips whose
    counters were 64-bits wide, and only applied to chips with 32-bit wide
    counters.

    This patch moves the test for amount of traffic "out" to a more common
    location, rather than depending on a side-effect of the software
    emulation of 64-bit counts on chips whose hardware is only 32-bits wide.

    Signed-off-by: Michael Albaugh
    Signed-off-by: Roland Dreier

    Michael Albaugh
     
  • Remove all the OEM and bringup boards, and complain and fail
    initialization if one is found. QHT7040 with GPIO rework (128ywwuuuu)
    is OK, older 112ywwuuuu is no longer supported). The check that had been
    added was failing both the 112 and 128 series.

    Signed-off-by: Dave Olson
    Signed-off-by: Roland Dreier

    Dave Olson
     
  • A couple of chip bugs in the iba6110 and in the iba6120 are not in more
    recent chips. This first bug swaps two of the pioavail register
    locations. In the second bug, the chip can sometimes forget to dma the
    pio avail register to memory. We indicate the presence of these bugs
    with runtime flags and we indicate the presence of the flags by bumping
    the SWMINOR.

    Signed-off-by: Arthur Jones
    Signed-off-by: Roland Dreier

    Arthur Jones
     
  • iba6110 rev3 and earlier had a chip bug where the chip could overrun the
    recv header queue. rev4 fixed this chip bug so userspace no longer needs
    to workaround it. Now we only set the workaround flag for older chip
    versions.

    Signed-off-by: Arthur Jones
    Signed-off-by: Roland Dreier

    Arthur Jones
     
  • ipath_poll() suffered from a couple subtle bugs. Under the right
    conditions we could leave recv interrupts enabled on an ipath user
    context on close, thereby taking potentially unwanted interrupts on the
    next open -- this is fixed by unconditionally turning off recv
    interrupts on close. Also, we now use counters rather than set/clear
    bits which allows us to make sure we catch all interrupts at the cost of
    changing the semantics slightly (it's now give me all events since the
    last time I called poll() rather than give me all events since I called
    _this_ poll routine). We also added some memory barriers which may help
    ensure we get all notifications in a timely manner.

    Signed-off-by: Arthur Jones
    Signed-off-by: Roland Dreier

    Arthur Jones
     
  • The LMC value was being saved by the SMA in two places. This patch
    cleans it up so only one copy is kept.

    Signed-off-by: Ralph Campbell
    Signed-off-by: Roland Dreier

    Ralph Campbell
     
  • This patch adds the ability to set the LMC via a sysfs file as if the SM
    sent a SubnSet(PortInfo) MAD. It is useful for debugging when no SM is
    running.

    Signed-off-by: Ralph Campbell
    Signed-off-by: Roland Dreier

    Ralph Campbell
     
  • The code to add an entry to the completion queue stored the QPN which is
    needed for the user level verbs view of the completion queue entry but
    the kernel struct ib_wc contains a pointer to the QP instead of a QPN.
    When the kernel polled for a completion queue entry, the QPN was lookup
    up and the QP pointer recovered. This patch stores the CQE differently
    based on whether the CQ is a kernel CQ or a user CQ thus avoiding the
    QPN to QP lookup overhead.

    Signed-off-by: Ralph Campbell
    Signed-off-by: Roland Dreier

    Ralph Campbell
     
  • This patch implements the IB_EVENT_QP_LAST_WQE_REACHED event which is
    needed by ib_ipoib to destroy the QP when used in connected mode.

    Signed-off-by: Ralph Campbell
    Signed-off-by: Roland Dreier

    Ralph Campbell
     
  • Follow the IB spec. (C10-96) for post send which states that a flushed
    completion event should be generated for work requests posted when a QP
    is in the error state.

    Signed-off-by: Ralph Campbell
    Signed-off-by: Roland Dreier

    Ralph Campbell
     
  • This patch removes some redundant initialization code.

    Signed-off-by: Ralph Campbell
    Signed-off-by: Roland Dreier

    Ralph Campbell
     
  • In an earlier change, the amount of data read from the flash was
    mistakenly limited to the size known to the current driver. This causes
    problems when the length is increased, and written with the new longer
    version; the checksum would fail because not enough data was read.
    Always read the full 128 byte length to prevent this.

    Signed-off-by: Dave Olson
    Signed-off-by: Roland Dreier

    Dave Olson
     
  • This patch fixes a bug in the receive processing for UC RDMA WRITE with
    immediate which caused the last packet to be dropped.

    Signed-off-by: Ralph Campbell
    Signed-off-by: Roland Dreier

    Ralph Campbell
     
  • This is a comment change, only, correcting the comment to match the
    implemented workaround, rather than the original workaround, and
    clarifying why it's needed.

    Signed-off-by: Dave Olson
    Signed-off-by: Roland Dreier

    Dave Olson
     
  • The ipathfs file system is used to export binary data verses ASCII data
    such as through /sys. This patch removes some unneeded files since the
    data is available through other /sys files.

    Signed-off-by: Ralph Campbell
    Signed-off-by: Roland Dreier

    Ralph Campbell
     
  • There have been a number of issues where host bandwidth via HT or PCIe
    to the InfiniPath chip has been limited in some fashion (BIOS,
    configuration, etc.), resulting in user confusion. This check gives a
    clear warning that something is wrong and needs to be resolved.

    Signed-off-by: Dave Olson
    Signed-off-by: Roland Dreier

    Dave Olson
     
  • The code to post UD sends tried to process work requests at the time
    ib_post_send() is called without using a WQE queue. This was fine as
    long as HW resources were available for sending a packet. This patch
    changes UD to be handled more like RC and UC and shares more code.

    Signed-off-by: Ralph Campbell
    Signed-off-by: Roland Dreier

    Ralph Campbell
     
  • Different processors have different ordering restrictions for write
    combining. By taking advantage of this, we can eliminate some write
    barriers when writing to the send buffers.

    Signed-off-by: Ralph Campbell
    Signed-off-by: Roland Dreier

    Ralph Campbell
     
  • On iba6110 rev4, support for three more IB counters were added. The
    LocalLinkIntegrityError counter, the ExcessiveBufferOverrunErrors
    counter and support for error counting of flow control packets on an
    invalid VL. These counters trigger GPIO interrupts and the sw keeps
    track of the counts. Since we also use GPIO interrupts to signal packet
    reception, we need to turn off the fast interrupts, or we risk losing a
    GPIO interrupt.

    Signed-off-by: Arthur Jones
    Signed-off-by: Roland Dreier

    Arthur Jones