31 Oct, 2014

18 commits

  • - add a test specifically targeting verifier state pruning.
    It checks state propagation between registers, storing that
    state into stack and state pruning algorithm recognizing
    equivalent stack and register states.

    - add summary line to spot failures easier

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • verifier keeps track of register state spilled to stack.
    registers are 8-byte wide and always aligned, so instead of tracking them
    in every byte-sized stack slot, use MAX_BPF_STACK / 8 array to track
    spilled register state.
    Though verifier runs in user context and its state freed immediately
    after verification, it makes sense to reduce its memory usage.
    This optimization reduces sizeof(struct verifier_state)
    from 12464 to 1712 on 64-bit and from 6232 to 1112 on 32-bit.

    Note, this patch doesn't change existing limits, which are there to bound
    time and memory during verification: 4k total number of insns in a program,
    1k number of jumps (states to visit) and 32k number of processed insn
    (since an insn may be visited multiple times). Theoretical worst case memory
    during verification is 1712 * 1k = 17Mbyte. Out-of-memory situation triggers
    cleanup and rejects the program.

    Suggested-by: Andy Lutomirski
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • Guenter Roeck says:

    ====================
    net: dsa: Fixes and enhancements

    Patch 01/15 addresses a bug indicated by an an annoying and unhelpful
    log message.

    Patches 02/15 and 03/15 are minor enhancements, adding support for
    known switch revisions.

    Patches 04/15 and 05/15 add support for MV88E6352 and MV88E6176.

    Patch 06/15 adds support for hardware monitoring, specifically for
    reporting the chip temperature, to the dsa subsystem.

    Patches 07/15 and 08/15 implement hardware monitoring for MV88E6352,
    MV88E6176, MV88E6123, MV88E6161, and MV88E6165.

    Patch 09/15 and 10/15 add support for EEPROM access to the DSA subsystem.

    Patch 11/15 implements EEPROM access for MV88E6352 and MV88E6176.

    Patch 12/15 adds support for reading switch registers to the DSA
    subsystem.

    Patches 13/15 amd 14/15 implement support for reading switch registers
    to the drivers for MV88E6352, MV88E6176, MV88E6123, MV88E6161, and MV88E6165.

    Patch 15/15 adds support for reading additional RMON registers to the drivers
    for MV88E6352, MV88E6176, MV88E6123, MV88E6161, and MV88E6165.

    The series was tested on top of v3.18-rc2 in an x86 system with MV88E6352.
    Testing in systems with 88E6131, 88E6060 and MV88E6165 was done earlier
    (I don't have access to those systems right now). The series was also build
    tested using my build system at http://server.roeck-us.net:8010/builders.
    Look into the 'dsa' column for build results.

    The series merges cleanly into net-next as of today (10/29).

    v3:
    - Fix bug in eeprom patches seen if devicetree is enabled:
    eeprom-length property is attached to switch devicetree node,
    not to dsa node, and there was a compile error.
    v2:
    - Made reporting chip temperatures through the hwmon subsystem optional
    with new Kconfig option
    - Changed the hwmon chip name to _dsa
    - Made EEPROM presence and size configurable through platform and devicetree
    data
    - Various minor changes and fixes (see individual patches for details)
    ====================

    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    David S. Miller
     
  • Display sw_in_discards, sw_in_filtered, and sw_out_filtered for chips
    supported by mv88e6123_61_65 and mv88e6352 drivers.

    The variables are provided in port registers, not the normal status registers.
    Mark by adding 0x100 to the register offset and add special handling code
    to mv88e6xxx_get_ethtool_stats.

    Signed-off-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • Report switch register values to ethtool.

    Signed-off-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • The infrastructure can now report switch registers to ethtool.
    Add support for it to the mv88e6123_61_65 driver.

    Signed-off-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • Add support for reading switch registers with 'ethtool -d'.

    Signed-off-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • MV88E6352 supports read and write access to its configuration eeprom.

    There is no means to detect if an EEPROM is connected to the switch.
    Also, the switch supports EEPROMs with different sizes, but can not detect
    or report the type or size of connected EEPROMs. Therefore, do not implement
    the get_eeprom_len callback but depend on platform or devicetree data to
    provide information about EEPROM presence and size.

    Signed-off-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • The dsa core now supports reading from and writing to a switch EEPROM
    if connected. Describe optional devicetree property indicating that
    an EEPROM is present and its size.

    Signed-off-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • On some chips it is possible to access the switch eeprom.
    Add infrastructure support for it.

    Signed-off-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • MV88E6123 and compatible chips support reading the chip temperature
    from PHY register 6:26.

    Signed-off-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • MV88E6352 supports reading the chip temperature from two PHY registers,
    6:26 and 6:27. Report it using the more accurate register 6:27.
    Also report temperature limit and alarm.

    Signed-off-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • Some switches provide chip temperature data.
    Add support for reporting it through the hwmon subsystem.

    Signed-off-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • MV88E6176 is mostly compatible to MV88E6352 and is documented
    in the same functional specification. Add support for it.

    Signed-off-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • Marvell 88E6352 is mostly compatible to MV88E6123/61/65,
    but requires indirect phy access. Also, its configuration
    registers are a bit different.

    Signed-off-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • Report known silicon revisions when probing Marvell 88E6131 switches.

    Signed-off-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • Report known silicon revisions when probing Marvell 88E6060 switches.

    Signed-off-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Guenter Roeck
     
  • Setting skb->protocol to a private protocol type may result in warning
    messages such as
    e1000e 0000:00:19.0 em1: checksum_partial proto=dada!

    This happens if the L3 protocol is IP or IPv6 and skb->ip_summed is set
    to CHECKSUM_PARTIAL. Looking through the code, it appears that changing
    skb->protocol for transmitted packets is not necessary and may actually
    be harmful. For example, it prevents purposely unmodified (from a DSA
    perspective) network drivers from properly setting up their transmit
    checksum offload pointers since they inspect skb->protocol to set up the
    IPv4 header or IPv6 header pointers. So don't unnecessarily change the
    protocol field.

    Signed-off-by: Guenter Roeck
    Signed-off-by: David S. Miller

    Guenter Roeck
     

30 Oct, 2014

15 commits

  • In neigh_parms_release() we loop over all entries to find the entry given in
    argument and being able to remove it from the list. By using a double linked
    list, we can avoid this loop.

    Here are some numbers with 30 000 dummy interfaces configured:

    Before the patch:
    $ time rmmod dummy
    real 2m0.118s
    user 0m0.000s
    sys 1m50.048s

    After the patch:
    $ time rmmod dummy
    real 1m9.970s
    user 0m0.000s
    sys 0m47.976s

    Suggested-by: Thierry Herbelot
    Signed-off-by: Nicolas Dichtel
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • napi_schedule() can be called from any context and has to mask hard
    irqs.

    Add a variant that can only be called from hard interrupts handlers
    or when irqs are already masked.

    Many NIC drivers can use it from their hard IRQ handler instead of
    generic variant.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • David Vrabel says:

    ====================
    xen-netback: minor cleanups

    Two minor xen-netback cleanups originally from Zoltan.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This flag is unnecessary, it came from some old code.

    Suggested-by: Eric Dumazet
    Signed-off-by: Zoltan Kiss
    Signed-off-by: David Vrabel
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Zoltan Kiss
     
  • Otherwise the interrupt handler still calls napi_complete. Although it
    won't schedule NAPI again as either NAPI_STATE_DISABLE or
    NAPI_STATE_SCHED is set, it is just unnecessary, and it makes more
    sense to do this way.

    Signed-off-by: Zoltan Kiss
    Signed-off-by: David Vrabel
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Zoltan Kiss
     
  • Add a sysctl that causes an interface's optimistic addresses
    to be considered equivalent to other non-deprecated addresses
    for source address selection purposes. Preferred addresses
    will still take precedence over optimistic addresses, subject
    to other ranking in the source address selection algorithm.

    This is useful where different interfaces are connected to
    different networks from different ISPs (e.g., a cell network
    and a home wifi network).

    The current behaviour complies with RFC 3484/6724, and it
    makes sense if the host has only one interface, or has
    multiple interfaces on the same network (same or cooperating
    administrative domain(s), but not in the multiple distinct
    networks case.

    For example, if a mobile device has an IPv6 address on an LTE
    network and then connects to IPv6-enabled wifi, while the wifi
    IPv6 address is undergoing DAD, IPv6 connections will try use
    the wifi default route with the LTE IPv6 address, and will get
    stuck until they time out.

    Also, because optimistic nodes can receive frames, issue
    an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC
    flag appropriately set). A second RTM_NEWADDR is sent if DAD
    completes (the address flags have changed), otherwise an
    RTM_DELADDR is sent.

    Also: add an entry in ip-sysctl.txt for optimistic_dad.

    Signed-off-by: Erik Kline
    Acked-by: Lorenzo Colitti
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Erik Kline
     
  • Hayes Wang says:

    ====================
    r8152: support nway_reset

    Fix the CHECK from checkpatch.pl and support nway_reset.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Support the nway_reset() function for ethtool.

    Signed-off-by: Hayes Wang
    Signed-off-by: David S. Miller

    hayeswang
     
  • Replace tx_underun with tx_underrun for checkpatch.pl.

    Signed-off-by: Hayes Wang
    Signed-off-by: David S. Miller

    hayeswang
     
  • While testing upcoming Yaogong patch (converting out of order queue
    into an RB tree), I hit the max reordering level of linux TCP stack.

    Reordering level was limited to 127 for no good reason, and some
    network setups [1] can easily reach this limit and get limited
    throughput.

    Allow a new max limit of 300, and add a sysctl to allow admins to even
    allow bigger (or lower) values if needed.

    [1] Aggregation of links, per packet load balancing, fabrics not doing
    deep packet inspections, alternative TCP congestion modules...

    Signed-off-by: Eric Dumazet
    Cc: Yaogong Wang
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Several warnings and errors of coding style rules corrected.
    Compile tested.

    Signed-off-by: Roberto Medina
    Signed-off-by: David S. Miller

    Roberto Medina
     
  • This patch generalizes commit d6a4a1041176 ("tcp: GSO should be TSQ
    friendly") to protocols using skb_set_owner_w()

    TCP uses its own destructor (tcp_wfree) and needs a more complex scheme
    as explained in commit 6ff50cd55545 ("tcp: gso: do not generate out of
    order packets")

    This allows UDP sockets using UFO to get proper backpressure,
    thus avoiding qdisc drops and excessive cpu usage.

    Here are performance test results (macvlan on vlan):

    - Before
    # netperf -t UDP_STREAM ...
    Socket Message Elapsed Messages
    Size Size Time Okay Errors Throughput
    bytes bytes secs # # 10^6bits/sec

    212992 65507 60.00 144096 1224195 1258.56
    212992 60.00 51 0.45

    Average: CPU %user %nice %system %iowait %steal %idle
    Average: all 0.23 0.00 25.26 0.08 0.00 74.43

    - After
    # netperf -t UDP_STREAM ...
    Socket Message Elapsed Messages
    Size Size Time Okay Errors Throughput
    bytes bytes secs # # 10^6bits/sec

    212992 65507 60.00 109593 0 957.20
    212992 60.00 109593 957.20

    Average: CPU %user %nice %system %iowait %steal %idle
    Average: all 0.18 0.00 8.38 0.02 0.00 91.43

    [edumazet] Rewrote patch and changelog.

    Signed-off-by: Toshiaki Makita
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Toshiaki Makita
     
  • The hardware can automatically generate pause frames when the number
    of free buffers drops under a certain threshold, but in order to do this,
    the address of the last free buffer needs to be written to a specific
    register for each RX queue.

    This has to be done in 'gfar_clean_rx_ring' which is called for each
    RX queue. In order not to impact performance, by adding a register write
    for each incoming packet, this operation is done only when the PAUSE frame
    transmission is enabled.

    Whenever the link is readjusted, this capability is turned on or off.

    Signed-off-by: Matei Pavaluca
    Signed-off-by: David S. Miller

    Matei Pavaluca
     
  • Local flow control options needed in order to resolve the negotiation
    are incorrectly calculated.

    Previously 'mii_advertise_flowctrl' was called to determine the local advertising
    options, but these were determined based on FLOW_CTRL_RX/TX flags which are
    never set through ethtool.
    The patch simply translates from ethtool flow options to mii flow options.

    Signed-off-by: Pavaluca Matei
    Signed-off-by: David S. Miller

    Pavaluca Matei-B46610
     
  • The phy device supports 802.3x flow control, but the specific flags are not set
    in the phy initialisation code. Flow control flags need to be added to the
    supported capabilities of the phydev by the driver.

    This is needed in order for ethtool to work ('ethtool -A' code checks for these
    flags)

    Signed-off-by: Pavaluca Matei
    Signed-off-by: David S. Miller

    Pavaluca Matei-B46610
     

29 Oct, 2014

7 commits

  • ERROR: "lockdep_ovsl_is_held" [net/openvswitch/vport-gre.ko] undefined!

    Reported-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    David S. Miller
     
  • Amir Vadai says:

    ====================
    Mellanox ethernet driver update Oct-27-2014

    This patchset introduces some small bug fixes, support in get/set of
    vlan offload and get/set/capabilities of the link.

    First 7 patches by Saeed, add support in setting/getting link speed and getting
    cable capabilities.
    Next 2 patches also by Saeed, enable the user to turn rx/tx vlan offloading on
    and off.
    Jenni fixed a bug in error flow during device initalization.
    Ido and Jack fixed some code duplication and errors discovered by static checker.
    last patch by me is a fix to make ethtool report the actual rings used by
    indirection QP.

    Patches were applied and tested against commit 61ed53d ("Merge tag 'ntb-3.18'
    of git://github.com/jonmason/ntb")
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Hardware requires the number of rings in indirection table to be a power
    of 2. When setting number of channels to a non power of 2 number,
    indirection table is using only the closest power of 2 rings.
    Report this number in 'ethtool -x' and not the total number of rx rings.

    Signed-off-by: Amir Vadai
    Signed-off-by: Eugenia Emantayev
    Signed-off-by: David S. Miller

    Amir Vadai
     
  • Upon failures, destroy_netdev is called, and spinlocks/works must be
    initialized before calling it. Otherwise kernel panic may occur.

    Signed-off-by: Eugenia Emantayev
    Signed-off-by: Amir Vadai
    Signed-off-by: David S. Miller

    Eugenia Emantayev
     
  • This is instead of calling the actual implementation of
    napi_synchronize, for better encapsulation.

    Signed-off-by: Ido Shamay
    Signed-off-by: Amir Vadai
    Signed-off-by: David S. Miller

    Ido Shamay
     
  • clang flagged the following. All are actually cosmetic cleanups, not really bugs:

    drivers/net/ethernet/mellanox/mlx4/en_main.c:233:3: warning: Value stored to 'err' is never read
    err = -ENOMEM;
    ^ ~~~~~~~
    drivers/net/ethernet/mellanox/mlx4/en_main.c:293:3: warning: Value stored to 'err' is never read
    err = -ENOMEM;

    drivers/net/ethernet/mellanox/mlx4/en_netdev.c:648:16: warning: Assigned value is garbage or undefined
    entry->reg_id = reg_id;
    ^ ~~~~~~
    drivers/net/ethernet/mellanox/mlx4/en_netdev.c:659:2: warning: Function call argument is an uninitialized value
    mlx4_en_uc_steer_release(priv, priv->dev->dev_addr, *qpn, reg_id);
    (NOTE: reg_id is only used in the device-managed flow steering path, in which is it always initialized.
    This is not a bug. Cleanup here is therefore cosmetic only).

    drivers/net/ethernet/mellanox/mlx4/en_rx.c:122:3: warning: Value stored to 'frag_info' is never read
    frag_info = &priv->frag_info[i];
    ^ ~~~~~~~~~~~~~~~~~~~

    Signed-off-by: Jack Morgenstein

    Signed-off-by: Amir Vadai
    Signed-off-by: David S. Miller

    Jack Morgenstein
     
  • Move mlx4_en_reset_config to en_netdev.c as it now serves more general purpose.
    Add support for turning OFF/ON the rx/tx vlan offlad.

    Signed-off-by: Saeed Mahameed
    Signed-off-by: Amir Vadai
    Signed-off-by: David S. Miller

    Saeed Mahameed