30 Apr, 2013

4 commits


25 Apr, 2013

36 commits

  • Commit 6681712d67eef14c4ce793561c3231659153a320
    vxlan: generalize forwarding tables

    relaxed the address checks in rtnl_fdb_del() to use is_zero_ether_addr().
    This allows users to add multicast addresses using the fdb API. However,
    the check in rtnl_fdb_del() still uses a more strict
    is_valid_ether_addr() which rejects multicast addresses. Thus it
    is possible to add an fdb that can not be later removed.
    Relax the check in rtnl_fdb_del() as well.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • During high throughput it is likely that we receive both: an RX and TX
    interrupt. The normal behaviour is that once we enter the ISR the
    interrupts are disabled in the IRQ chip and so the ISR is invoked only
    once and the interrupt line is disabled once. It will be re-enabled
    after napi completes.
    With threaded interrupts on the other hand the interrupt the interrupt
    is disabled immediately and the ISR is marked for "later". By having TX
    and RX interrupt marked pending we invoke them both and disable the
    interrupt line twice. The napi callback is still executed once and so
    after it completes we remain with interrupts disabled.

    The initial patch simply removed the cpsw_{enable|disable}_irq() calls
    and it worked well on my AM335X ES1.0 (beagle bone). On ES2.0 (beagle
    bone black) it caused an never ending interrupt (even after the mask via
    cpsw_intr_disable()) according to Mugunthan V N. Since I don't have the
    ES2.0 and no idea what is going on this patch tracks the state of the
    irq_disable() call and execute it only when not yet done.
    The book keeping is done on the first struct since with dual_emac we can
    have two of those and only one interrupt line.

    Signed-off-by: Sebastian Andrzej Siewior
    Acked-by: Mugunthan V N
    Signed-off-by: David S. Miller

    Sebastian Siewior
     
  • text data bss dec hex filename
    15530 92 4 15626 3d0a cpsw.o.before
    15478 92 4 15574 3cd6 cpsw.o.after

    52 bytes smaller, 13 for each invocation.

    Signed-off-by: Sebastian Andrzej Siewior
    Acked-by: Mugunthan V N
    Signed-off-by: David S. Miller

    Sebastian Siewior
     
  • This driver does not clean up properly after leaving. Here is a list:
    - Use unregister_netdev(). free_netdev() is good but not enough
    - Use the above also on the other ndev in case of dual mac
    - Free data.slave_data. The name of the strucre makes it look like
    it is platform_data but it is not. It is just a trick!
    - Free all irqs. Again: freeing one irq is good start, but freeing all
    of them is better.

    With this rmmod & modprobe of cpsw seems to work. The remaining issue
    is:
    |WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0x9c/0xd4()
    |sysfs: cannot create duplicate filename '/devices/ocp.2/4a100000.ethernet/4a101000.mdio'
    |WARNING: at lib/kobject.c:196 kobject_add_internal+0x1a4/0x1c8()

    comming from of_platform_populate() and I am not sure that this belongs
    here.

    Signed-off-by: Sebastian Andrzej Siewior
    Acked-by: Mugunthan V N
    Signed-off-by: David S. Miller

    Sebastian Siewior
     
  • If compiled as modules each one of these modules is missing something.
    With this patch the modules are loaded on demand and don't taint the
    kernel due to license issues.

    Signed-off-by: Sebastian Andrzej Siewior
    Acked-by: Mugunthan V N
    Signed-off-by: David S. Miller

    Sebastian Siewior
     
  • In case that we run into OOM during the allocation of the new rx-skb we
    don't get one and we have one skb less than we used to have. If this
    continues to happen then we end up with no rx-skbs at all.
    This patch changes the following:
    - if we fail to allocate the new skb, then we treat the currently
    completed skb as the new one and so drop the currently received data.
    - instead of testing multiple times if the device is gone we rely one
    the status field which is set to -ENOSYS in case the channel is going
    down and incomplete requests are purged.
    cpdma_chan_stop() removes most of the packages with -ENOSYS. The
    currently active packet which is removed has the "tear down" bit set.
    So if that bit is set, we send ENOSYS as well otherwise we pass the
    status bits which are required to figure out which of the two possible
    just finished.

    Acked-by: Mugunthan V N
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: David S. Miller

    Sebastian Siewior
     
  • The gfp_mask argument is not used in cpdma_chan_submit() and always set
    to GFP_KERNEL even in atomic sections. This patch drops it since it is
    unused.

    Acked-by: Mugunthan V N
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: David S. Miller

    Sebastian Siewior
     
  • netif_running() reports false before the ->ndo_stop() callback is
    called. That means if one executes "ifconfig down" and the system
    receives an interrupt before the interrupt source has been disabled we
    hang for always for two reasons:
    - we never disable the interrupt source because devices claim to be
    already inactive and don't feel responsible.
    - since the ISR always reports IRQ_HANDLED the line is never deactivated
    because it looks like the ISR feels responsible.

    This patch changes the logic in the ISR a little:
    - If none of the status registers reports an active source (RX or TX,
    misc is ignored because it is not actived) we leave with IRQ_NONE.
    - the interrupt is deactivated
    - The first active network device is taken and napi is scheduled. If
    none are active (a small race window between ndo_down() and the
    interrupt the) then we leave and should not come back because the
    source is off.
    There is no need to schedule the second NAPI because both share the
    same dma queue.

    Signed-off-by: Sebastian Andrzej Siewior
    Acked-by: Mugunthan V N
    Signed-off-by: David S. Miller

    Sebastian Siewior
     
  • if during "ifconfig up" we run out of mem we continue regardless how
    many skbs we got. In worst case we have zero RX skbs and can't ever
    receive further packets since the RX skbs are never reallocated. If
    cpdma_chan_submit() fails we even leak the skb.
    This patch changes the behavior here:
    If we fail to allocate an skb during bring up we don't continue and
    report that error. Same goes for errors from cpdma_chan_submit().
    While here I changed to __netdev_alloc_skb_ip_align() so GFP_KERNEL can
    be used.

    Signed-off-by: Sebastian Andrzej Siewior
    Acked-by: Mugunthan V N
    Signed-off-by: David S. Miller

    Sebastian Siewior
     
  • __cpdma_chan_process() holds the lock with interrupts off (and its
    caller as well), same goes for cpdma_ctlr_start(). With interrupts off,
    jiffies will not make any progress and if the wait condition never gets
    true we wait for ever.
    Tgis patch adds a a simple udelay and counting down attempt.

    Acked-by: Mugunthan V N
    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: David S. Miller

    Sebastian Siewior
     
  • Need remove erroneous semicolon, which is found by EXTRA_CFLAGS=-W,
    the related commit number: c54419321455631079c7d6e60bc732dd0c5914c5
    ("GRE: Refactor GRE tunneling code")

    Signed-off-by: Chen Gang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Chen Gang
     
  • The firmware supports a maximum of 4K FCoE exchanges. In 4-port devices,
    or when working in multi-function mode, this resource needs to be distributed
    between the various possible FCoE functions.

    This information needs to be calculated by bnx2x and propagated into bnx2fc
    via cnic. bnx2fc can then use this value to calculate corresponding xid
    resources instead of using global constants.

    Signed-off-by: Bhanu Prakash Gollapudi
    Signed-off-by: Michael Chan
    Signed-off-by: Yuval Mintz
    Signed-off-by: David S. Miller

    Bhanu Prakash Gollapudi
     
  • Enables hardware generation of IP header and
    protocol specific checksums for transmitted
    packets.

    Enabled hardware discarding of received packets with
    invalid IP header or protocol specific checksums.

    The feature is enabled by default but can be
    enabled/disabled by ethtool.

    Signed-off-by: Fugang Duan
    Signed-off-by: Jim Baxter
    Reviewed-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Jim Baxter
     
  • The "changed" variable should be a 64 bit type, otherwise it can't store
    all the features. The way the code is now the test for whether
    NETIF_F_RXCSUM changed is always false and we return immediately.

    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Dan Carpenter
     
  • OVS locking was recently changed to have private OVS lock which
    simplified overall locking. Therefore there is no need to have
    another global genl lock to protect OVS data structures. Following
    patch uses of parallel_ops genl family for OVS. This also allows
    more granual OVS locking using ovs_mutex for protecting OVS data
    structures, which gives more concurrencey. E.g multiple genl
    operations OVS_PACKET_CMD_EXECUTE can run in parallel, etc.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • All genl callbacks are serialized by genl-mutex. This can become
    bottleneck in multi threaded case.
    Following patch adds an parameter to genl_family so that a
    particular family can get concurrent netlink callback without
    genl_lock held.
    New rw-sem is used to protect genl callback from genl family unregister.
    in case of parallel_ops genl-family read-lock is taken for callbacks and
    write lock is taken for register or unregistration for any family.
    In case of locked genl family semaphore and gel-mutex is locked for
    any openration.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • Signed-off-by: Fengguang Wu
    Acked-by: Dan Carpenter
    Signed-off-by: David S. Miller

    Wu Fengguang
     
  • This reverts commit 068a2de57ddf4f4 (net: release dst entry while
    cache-hot for GSO case too)

    Before GSO packet segmentation, we already take care of skb->dst if it
    can be released.

    There is no point adding extra test for every segment in the gso loop.

    Signed-off-by: Eric Dumazet
    Cc: Krishna Kumar
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Currently, packet_sock has a struct tpacket_stats stats member for
    TPACKET_V1 and TPACKET_V2 statistic accounting, and with TPACKET_V3
    ``union tpacket_stats_u stats_u'' was introduced, where however only
    statistics for TPACKET_V3 are held, and when copied to user space,
    TPACKET_V3 does some hackery and access also tpacket_stats' stats,
    although everything could have been done within the union itself.

    Unify accounting within the tpacket_stats_u union so that we can
    remove 8 bytes from packet_sock that are there unnecessary. Note that
    even if we switch to TPACKET_V3 and would use non mmap(2)ed option,
    this still works due to the union with same types + offsets, that are
    exposed to the user space.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • There's a 4 byte hole in packet_ring_buffer structure before
    prb_bdqc, that can be filled with 'pending' member, thus we can
    reduce the overall structure size from 224 bytes to 216 bytes.
    This also has the side-effect, that in struct packet_sock 2*4 byte
    holes after the embedded packet_ring_buffer members are removed,
    and overall, packet_sock can be reduced by 1 cacheline:

    Before: size: 1344, cachelines: 21, members: 24
    After: size: 1280, cachelines: 20, members: 24

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Daniel Borkmann says:

    ====================
    This is a joint effort with Willem to bring optional i) tx hw/sw
    timestamping into PF_PACKET, that was reported by Paul Chavent,
    and ii) to expose the type of timestamp to the user, which is in
    the current situation not possible to distinguish with the RX_RING
    and TX_RING API (but distinguishable through the normal timestamping
    API), reported by Richard Cochran. This set is based on top of
    ``packet: account statistics only in tpacket_stats_u''. Related
    discussion can be found in: http://patchwork.ozlabs.org/patch/238125/
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Bring the timestamping section in sync with the implementation.

    Signed-off-by: Daniel Borkmann
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Currently, there is no way to find out which timestamp is reported in
    tpacket{,2,3}_hdr's tp_sec, tp_{n,u}sec members. It can be one of
    SOF_TIMESTAMPING_SYS_HARDWARE, SOF_TIMESTAMPING_RAW_HARDWARE,
    SOF_TIMESTAMPING_SOFTWARE, or a fallback variant late call from the
    PF_PACKET code in software.

    Therefore, report in the tp_status member of the ring buffer which
    timestamp has been reported for RX and TX path. This should not break
    anything for the following reasons: i) in RX ring path, the user needs
    to test for tp_status & TP_STATUS_USER, and later for other flags as
    well such as TP_STATUS_VLAN_VALID et al, so adding other flags will
    do no harm; ii) in TX ring path, time stamps with PACKET_TIMESTAMP
    socketoption are not available resp. had no effect except that the
    application setting this is buggy. Next to TP_STATUS_AVAILABLE, the
    user also should check for other flags such as TP_STATUS_WRONG_FORMAT
    to reclaim frames to the application. Thus, in case TX ts are turned
    off (default case), nothing happens to the application logic, and in
    case we want to use this new feature, we now can also check which of
    the ts source is reported in the status field as provided in the docs.

    Reported-by: Richard Cochran
    Signed-off-by: Daniel Borkmann
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • This makes it more readable and clearer what bits are still free to
    use. The compiler reduces this to a constant for us anyway.

    Signed-off-by: Daniel Borkmann
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Currently, we only have software timestamping for the TX ring buffer
    path, but this limitation stems rather from the implementation. By
    just reusing tpacket_get_timestamp(), we can also allow hardware
    timestamping just as in the RX path.

    Signed-off-by: Daniel Borkmann
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • When transmit timestamping is enabled at the socket level, record a
    timestamp on packets written to a PACKET_TX_RING. Tx timestamps are
    always looped to the application over the socket error queue. Software
    timestamps are also written back into the packet frame header in the
    packet ring.

    Reported-by: Paul Chavent
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • Jeff Kirsher says:

    ====================
    This series contains updates to ixgbe, igb and pci.

    The ixgbe changes contains a fix to a possible divide by zero by bailing
    out of the ixgbe_update_itr() function if the last interrupt timeslice is
    zero. In addition, support is added for the new OCP x520 adapter as well
    as LX support for 82599 devices. Jacob provides a patch to change
    variable wol_supported to wol_enabled to better reflect what the code
    is actually doing (i.e. checking if WoL is enabled).

    Alex adds SRIOV helper function to pci that will determine if a PF
    has any VFs that are currently assigned to a guest.

    The remaining 8 patches are against igb and contain the following changes:
    * implement SERDES loopback configuration for i210 devices by unsetting
    sigdetect bit, so as to fix Ethtool loopback test failure
    * add support for the SMBI semaphore for I210/I211 devices
    * implement the new generic pci_vfs_assigned helper function (Alex's PCI
    helper function)
    * display warning when link speed is downgraded due to Smartspeed
    * ensure that VLAN hardware filtering remains enabled when the device is
    in promiscuous mode and VT mode simultaneously
    * cleanup dead code in igb
    * bump the driver version

    v2: updated the PCI patch to add SRIOV helper function to remove extern
    from the declaration of pci_vfs_assigned in pci.h and return 0 if
    SR-IOV is disabled which is inline with other PCI SR-IOV functions
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pablo Neira Ayuso says:

    ====================
    The following patchset contains fixes for recently applied
    Netfilter/IPVS updates to the net-next tree, most relevantly
    they are:

    * Fix sparse warnings introduced in the RCU conversion, from
    Julian Anastasov.

    * Fix wrong endianness in the size field of IPVS sync messages,
    from Simon Horman.

    * Fix missing if checking in nf_xfrm_me_harder, from Dan Carpenter.

    * Fix off by one access in the IPVS SCTP tracking code, again from
    Dan Carpenter.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: Carolyn Wyborny
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher

    Carolyn Wyborny
     
  • This patch removes id defines from the hardware files that will not be
    productized for Linux. These id's were not implemented for support in the
    base driver itself, they were just available defines.

    Signed-off-by: Carolyn Wyborny
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher

    Carolyn Wyborny
     
  • The 82575 manual initialization scripts are not supported on 82580 and
    above. Rather than call the function to immediately return, clarify the
    code by removing this pointless function call.

    Signed-off-by: Matthew Vick
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher

    Matthew Vick
     
  • When using the new bridge FDB interface to allow SR-IOV virtual function
    network devices to communicate with SW bridged network devices the
    physical function is placed into promiscuous mode and hardware VLAN
    filtering is disabled. This defeats the ability to use VLAN tagging
    to isolate user networks. When the device is in promiscuous mode and
    VT mode simultaneously ensure that VLAN hardware filtering remains
    enabled.

    Signed-off-by: Greg Rose
    Tested-by: Sibai Li
    Signed-off-by: Jeff Kirsher

    Greg Rose
     
  • Current igb driver doesn't tell nothing when Link Speed is downgraded due to
    SmartSpeed. As a result, users suspect that there is something wrong with
    NIC. If the cause of it is SmartSpeed, there is no means to replace NIC. This
    patch make igb notify users that SmartSpeed worked.

    Signed-off-by: Koki Sanagi
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher

    Koki Sanagi
     
  • This change makes it so that the igb driver uses the generic helper
    pci_vfs_assigned instead of the igb specific function igb_vfs_are_assigned.

    Signed-off-by: Alexander Duyck
    Tested-by: Sibai Li
    Signed-off-by: Jeff Kirsher

    Alexander Duyck
     
  • This function is meant to add a helper function that will determine if a PF
    has any VFs that are currently assigned to a guest. We currently have been
    implementing this function per driver, and going forward I would like to avoid
    that by making this function generic and using this helper.

    v2: Removed extern from declaration of pci_vfs_assigned in pci.h and return
    0 if SR-IOV is disabled with is inline with other PCI SRIOV functions.

    Signed-off-by: Alexander Duyck
    Acked-by: Bjorn Helgaas
    Signed-off-by: Jeff Kirsher

    Alexander Duyck
     
  • It was previously thought that, since I210/I211 are single port devices,
    they did not need the SMBI semaphore. This is not the case. Add support for
    the SMBI semaphore.

    Signed-off-by: Matthew Vick
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher

    Matthew Vick