11 May, 2012

19 commits

  • Signed-off-by: Marek Lindner
    Acked-by: Simon Wunderlich
    Signed-off-by: Antonio Quartulli

    Marek Lindner
     
  • Signed-off-by: Marek Lindner
    Acked-by: Simon Wunderlich
    Signed-off-by: Antonio Quartulli

    Marek Lindner
     
  • The B.A.T.M.A.N. IV OGM receive function still was hard-coded although
    it is a routing protocol specific function. This patch takes advantage
    of the dynamic packet handler registration to remove the hard-coded
    function calls.

    Signed-off-by: Marek Lindner
    Acked-by: Simon Wunderlich
    Signed-off-by: Antonio Quartulli

    Marek Lindner
     
  • The packet handler array replaces the growing switch statement, thus
    dealing with incoming packets in a more efficient way. It also adds
    to possibility to register packet handlers on the fly.

    Signed-off-by: Marek Lindner
    Acked-by: Simon Wunderlich
    Signed-off-by: Antonio Quartulli

    Marek Lindner
     
  • Signed-off-by: Marek Lindner
    Acked-by: Simon Wunderlich
    Signed-off-by: Antonio Quartulli

    Marek Lindner
     
  • In is_type_dhcprequest(), while parsing a DHCP message, if the entry we found in
    the option list is neither a padding nor the dhcp-type, we have to ignore it and
    jump as many bytes as its length + 1. The "+ 1" byte is given by the subtype
    field itself that has to be jumped too.

    Reported-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Antonio Quartulli
     
  • According to the RFC4944 (Transmission of IPv6 Packets over
    IEEE 802.15.4 Networks), chapter 7:

    The IPv6 link-local address [RFC4291] for an IEEE 802.15.4 interface
    is formed by appending the Interface Identifier, as defined above, to
    the prefix FE80::/64.

    10 bits 54 bits 64 bits
    +----------+-----------------------+----------------------------+
    |1111111010| (zeros) | Interface Identifier |
    +----------+-----------------------+----------------------------+

    This patch adds IPv6 address generation support for the 6lowpan
    interfaces.

    Signed-off-by: Alexander Smirnov
    Signed-off-by: David S. Miller

    alex.bluesman.smirnov@gmail.com
     
  • An implementation of CoDel AQM, from Kathleen Nichols and Van Jacobson.

    http://queue.acm.org/detail.cfm?id=2209336

    This AQM main input is no longer queue size in bytes or packets, but the
    delay packets stay in (FIFO) queue.

    As we don't have infinite memory, we still can drop packets in enqueue()
    in case of massive load, but mean of CoDel is to drop packets in
    dequeue(), using a control law based on two simple parameters :

    target : target sojourn time (default 5ms)
    interval : width of moving time window (default 100ms)

    Based on initial work from Dave Taht.

    Refactored to help future codel inclusion as a plugin for other linux
    qdisc (FQ_CODEL, ...), like RED.

    include/net/codel.h contains codel algorithm as close as possible than
    Kathleen reference.

    net/sched/sch_codel.c contains the linux qdisc specific glue.

    Separate structures permit a memory efficient implementation of fq_codel
    (to be sent as a separate work) : Each flow has its own struct
    codel_vars.

    timestamps are taken at enqueue() time with 1024 ns precision, allowing
    a range of 2199 seconds in queue, and 100Gb links support. iproute2 uses
    usec as base unit.

    Selected packets are dropped, unless ECN is enabled and packets can get
    ECN mark instead.

    Tested from 2Mb to 10Gb speeds with no particular problems, on ixgbe and
    tg3 drivers (BQL enabled).

    Usage: tc qdisc ... codel [ limit PACKETS ] [ target TIME ]
    [ interval TIME ] [ ecn ]

    qdisc codel 10: parent 1:1 limit 2000p target 3.0ms interval 60.0ms ecn
    Sent 13347099587 bytes 8815805 pkt (dropped 0, overlimits 0 requeues 0)
    rate 202365Kbit 16708pps backlog 113550b 75p requeues 0
    count 116 lastcount 98 ldelay 4.3ms dropping drop_next 816us
    maxpacket 1514 ecn_mark 84399 drop_overlimit 0

    CoDel must be seen as a base module, and should be used keeping in mind
    there is still a FIFO queue. So a typical setup will probably need a
    hierarchy of several qdiscs and packet classifiers to be able to meet
    whatever constraints a user might have.

    One possible example would be to use fq_codel, which combines Fair
    Queueing and CoDel, in replacement of sfq / sfq_red.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Dave Taht
    Cc: Kathleen Nichols
    Cc: Van Jacobson
    Cc: Tom Herbert
    Cc: Matt Mathis
    Cc: Yuchung Cheng
    Cc: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Class bytes/packets stats can be misleading because they are updated in
    enqueue() while packet might be dropped later.

    We already fixed all qdiscs but sch_atm.

    This patch makes the final cleanup.

    class rate estimators can now match qdisc ones.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Use the new bool function ether_addr_equal_64bits to add
    some clarity and reduce the likelihood for misuse of
    compare_ether_addr_64bits for sorting.

    Done via cocci script:

    $ cat compare_ether_addr_64bits.cocci
    @@
    expression a,b;
    @@
    - !compare_ether_addr_64bits(a, b)
    + ether_addr_equal_64bits(a, b)

    @@
    expression a,b;
    @@
    - compare_ether_addr_64bits(a, b)
    + !ether_addr_equal_64bits(a, b)

    @@
    expression a,b;
    @@
    - !ether_addr_equal_64bits(a, b) == 0
    + ether_addr_equal_64bits(a, b)

    @@
    expression a,b;
    @@
    - !ether_addr_equal_64bits(a, b) != 0
    + !ether_addr_equal_64bits(a, b)

    @@
    expression a,b;
    @@
    - ether_addr_equal_64bits(a, b) == 0
    + !ether_addr_equal_64bits(a, b)

    @@
    expression a,b;
    @@
    - ether_addr_equal_64bits(a, b) != 0
    + ether_addr_equal_64bits(a, b)

    @@
    expression a,b;
    @@
    - !!ether_addr_equal_64bits(a, b)
    + ether_addr_equal_64bits(a, b)

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • Add an optimized boolean function to check if
    2 ethernet addresses are the same.

    This is to avoid any confusion about compare_ether_addr_64bits
    returning an unsigned, and not being able to use the
    compare_ether_addr_64bits function for sorting ala memcmp.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • Use the new bool function ether_addr_equal to add
    some clarity and reduce the likelihood for misuse
    of compare_ether_addr for sorting.

    Done via cocci script:

    $ cat compare_ether_addr.cocci
    @@
    expression a,b;
    @@
    - !compare_ether_addr(a, b)
    + ether_addr_equal(a, b)

    @@
    expression a,b;
    @@
    - compare_ether_addr(a, b)
    + !ether_addr_equal(a, b)

    @@
    expression a,b;
    @@
    - !ether_addr_equal(a, b) == 0
    + ether_addr_equal(a, b)

    @@
    expression a,b;
    @@
    - !ether_addr_equal(a, b) != 0
    + !ether_addr_equal(a, b)

    @@
    expression a,b;
    @@
    - ether_addr_equal(a, b) == 0
    + !ether_addr_equal(a, b)

    @@
    expression a,b;
    @@
    - ether_addr_equal(a, b) != 0
    + ether_addr_equal(a, b)

    @@
    expression a,b;
    @@
    - !!ether_addr_equal(a, b)
    + ether_addr_equal(a, b)

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • Calling pci_disable_sriov() while VFs are assigned to VMs causes
    kernel panic. This patch uses PCI_DEV_FLAGS_ASSIGNED bit state of the
    VF's pci_dev to avoid this. Also, the unconditional function reset cmd
    issued on a PF probe can delete the VF configuration for the
    previously enabled VFs. A scratchpad register is now used to issue a
    function reset only when needed (i.e., in a crash dump scenario.)

    Signed-off-by: Sathya Perla
    Signed-off-by: David S. Miller

    Sathya Perla
     
  • If enabled, L2TP data packets have sequence numbers which a receiver
    can use to drop out of sequence frames or try to reorder them. The
    first frame has sequence number 0, but the L2TP code currently expects
    it to be 1. This results in the first data frame being handled as out
    of sequence.

    This one-line patch fixes the problem.

    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    James Chapman
     
  • When L2TP data packet reordering is enabled, packets are held in a
    queue while waiting for out-of-sequence packets. If a packet gets
    lost, packets will be held until the reorder timeout expires, when we
    are supposed to then advance to the sequence number of the next packet
    but we don't currently do so. As a result, the data channel is stuck
    because we are waiting for a packet that will never arrive - all
    packets age out and none are passed.

    The fix is to add a flag to the session context, which is set when the
    reorder timeout expires and tells the receive code to reset the next
    expected sequence number to that of the next packet in the queue.

    Tested in a production L2TP network with Starent and Nortel L2TP gear.

    Signed-off-by: James Chapman
    Signed-off-by: David S. Miller

    James Chapman
     
  • As proposed by Eric, make the tcp_input.o thinner.

    add/remove: 1/1 grow/shrink: 1/4 up/down: 868/-1329 (-461)
    function old new delta
    tcp_try_rmem_schedule - 864 +864
    tcp_ack 4811 4815 +4
    tcp_validate_incoming 817 815 -2
    tcp_collapse 860 858 -2
    tcp_send_rcvq 555 353 -202
    tcp_data_queue 3435 3033 -402
    tcp_prune_queue 721 - -721

    Signed-off-by: Pavel Emelyanov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • As noted by Eric, no checks are performed on the data size we're
    putting in the read queue during repair. Thus, validate the given
    data size with the common rmem management routine.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • It actually works on the input queue and will use its read mem
    routines, thus it's better to have in in the tcp_input.c file.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • David S. Miller
     

10 May, 2012

21 commits

  • Update version number to better match the version of the out of tree
    driver with similar functionality.

    Signed-off-by: Don Skidmore
    Tested-by: Phil Schmitt
    Signed-off-by: Jeff Kirsher

    Don Skidmore
     
  • When the hwmon code was initially added it was with the assumption that a
    sysfs patch would be also coming soon. Since that isn't the case some
    clean up needs to be done. This patch does that.

    Signed-off-by: Don Skidmore
    Tested-by: Stephen Ko
    Signed-off-by: Jeff Kirsher

    Don Skidmore
     
  • Kernel software timestamping requires that the driver calls skb_tx_timestamp
    just before passing the skb to the MAC, in order to provide the best software
    timestamps. This patch adds this call for that support.

    Signed-off-by: Jacob Keller
    Tested-by: Phil Schmitt
    Signed-off-by: Jeff Kirsher

    Jacob Keller
     
  • This patch adds support for the ethtool get_ts_info operation, which enables
    access of available timestamp/timesync support for that device. It can query
    which ptp clock device is associated with the particular port.

    Signed-off-by: Jacob Keller
    Tested-by: Stephen Ko
    Signed-off-by: Jeff Kirsher

    Jacob Keller
     
  • The current value of the udelay timeout for ixgbe_disable_rx_buff is too
    short. This causes the security path to not not be properly disabled during
    the section that is meant to have it turned off. The end result causes a race
    condition that results in RX issues.

    Signed-off-by: Jacob Keller
    Tested-by: Phil Schmitt
    Signed-off-by: Jeff Kirsher

    Jacob Keller
     
  • This patch enables the PPS system in the PHC framework, by enabling
    the clock-out feature on the X540 device. Causes the SDP0 to be set as
    a 1Hz clock. Also configures the timesync interrupt cause in order to
    report each pulse to the PPS via the PHC framework, which can be used
    for general system clock synchronization. (This allows a stable method
    for tuning the general system time via the on-board SYSTIM register
    based clock.)

    Signed-off-by: Jacob E Keller
    Tested-by: Stephen Ko
    Signed-off-by: Jeff Kirsher

    Jacob E Keller
     
  • This patch enables hardware timestamping for use with PTP software by
    extracting a ns counter from an arbitrary fixed point cycles counter.
    The hardware generates SYSTIME registers using the DMA tick which
    changes based on the current link speed. These SYSTIME registers are
    converted to ns using the cyclecounter and timecounter structures
    provided by the kernel. Using the SO_TIMESTAMPING api, software can
    enable and access timestamps for PTP packets.

    The SO_TIMESTAMPING API has space for 3 different kinds of timestamps,
    SYS, RAW, and SOF. SYS hardware timestamps are hardware ns values that
    are then scaled to the software clock. RAW hardware timestamps are the
    direct raw value of the ns counter. SOF software timestamps are the
    software timestamp calculated as close as possible to the software
    transmit, but are not offloaded to the hardware. This patch only
    supports the RAW hardware timestamps due to inefficiency of the SYS
    design.

    This patch also enables the PHC subsystem features for atomically
    adjusting the cycle register, and adjusting the clock frequency in
    parts per billion. This frequency adjustment works by slightly
    adjusting the value added to the cycle registers each DMA tick. This
    causes the hardware registers to overflow rapidly (approximately once
    every 34 seconds, when at 10gig link). To solve this, the timecounter
    structure is used, along with a timer set for every 25 seconds. This
    allows for detecting register overflow and converting the cycle
    counter registers into ns values needed for providing useful
    timestamps to the network stack.

    Only the basic required clock functions are supported at this time,
    although the hardware supports some ancillary features and these could
    easily be enabled in the future.

    Note that use of this hardware timestamping requires modifying daemon
    software to use the SO_TIMESTAMPING API for timestamps, and the
    ptp_clock PHC framework for accessing the clock. The timestamps have
    no relation to the system time at all, so software must use the posix
    clock generated by the PHC framework instead.

    Signed-off-by: Jacob E Keller
    Tested-by: Stephen Ko
    Signed-off-by: Jeff Kirsher

    Jacob Keller
     
  • If the VF sends a MACVLAN request with index of zero then it is not
    actually trying to add a filter. Check the index value and only
    indicate that operation is not allowed when the VF is actually trying
    to add a filter.

    Signed-off-by: Greg Rose
    Tested-by: Sibai Li
    Signed-off-by: Jeff Kirsher

    Greg Rose
     
  • The drop enable bit can be used to improve the performance of the adapter
    in the case of multiple queues being present. This performance gain is due
    to the fact that some slower CPUs can cause the FIFO to backfill preventing
    faster CPUs from receiving additional work. By setting the drop enable bit
    we prevent this and instead just drop the packets that would have been
    bound for the slower CPU.

    Signed-off-by: Alexander Duyck
    Tested-by: Ross Brattain
    Signed-off-by: Jeff Kirsher

    Alexander Duyck
     
  • This change cleans up the logic in the priority based flow control
    configuration routines. Both the 82599 and 82598 based routines perform
    similar functions however they are both arranged completely differently.
    This patch goes over both of them to clean up the code.

    In addition I am dropping the ixgbe_fc_pfc flow control mode and instead
    just replacing it with checks for if priority flow control is enabled.
    This allows us to maintain some of the link flow control information which
    allows for an easier transition between link and priority flow control.

    Signed-off-by: Alexander Duyck
    Tested-by: Ross Brattain
    Signed-off-by: Jeff Kirsher

    Alexander Duyck
     
  • Previously we would get a mailbox error and still process the message.
    Instead we should exit on error.

    In addition we should also be flushing the ACK of the message so that we
    can guarantee that the other end is aware we have received the message
    while we are processing it.

    Signed-off-by: Alexander Duyck
    Tested-by: Sibai Li
    Signed-off-by: Jeff Kirsher

    Alexander Duyck
     
  • Current igb outputs registers related to TX/RX queues(ex. RDT, RDH, TDT, TDH).
    But it thinks the number of RX/TX queues is 4. But 82576 has 16 RX/TX queues.
    This patch modifies igb to output the rest of the registers if the device is
    82576.

    Signed-off-by: Koki Sanagi
    Acked-by: Carolyn Wyborny
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher

    Koki Sanagi
     
  • Jeff Kirsher
     
  • o Linux stack estimates MSS from skb->len or skb_shinfo(skb)->gso_size.
    In case of LRO skb->len is aggregate of len of number of packets hence MSS
    obtained using skb->len would be incorrect. Incorrect estimation of recv MSS
    would lead to delayed acks in some traffic patterns (which sends two or three
    packets and wait for ack and only then send remaining packets). This leads to
    drop in performance. Hence we need to set gso_size to MSS obtained from firmware.

    o This is fixed recently in firmware hence the MSS is obtained based on
    capability. If fw is capable of sending the MSS then only driver sets the gso_size.

    Signed-off-by: Rajesh Borundia
    Signed-off-by: David S. Miller

    Rajesh Borundia
     
  • Driver queries DIMM information from firmware and accordingly
    sets "presence" field of the structure.
    "presence" field when set to 0xff denotes invalid flag. And when
    set to 0x0 denotes DIMM memory is not present.

    Signed-off-by: Sucheta Chakraborty
    Signed-off-by: Rajesh Borundia
    Signed-off-by: David S. Miller

    Sucheta Chakraborty
     
  • o 0x3, 0x7, 0xF, 0x1F, 0x3F, 0x7F and 0xFF are the allowed capture masks.

    Signed-off-by: Manish chopra
    Signed-off-by: Rajesh Borundia
    Signed-off-by: David S. Miller

    Manish chopra
     
  • disable fw dump by default at start up.

    Signed-off-by: Sritej Velaga
    Signed-off-by: Rajesh Borundia
    Signed-off-by: David S. Miller

    Sritej Velaga
     
  • David S. Miller
     
  • Signed-off-by: Ben Hutchings

    Ben Hutchings
     
  • Currently allows for SFP+ eeprom to be returned using the ethtool API.
    This can be extended in future to handle different eeprom formats
    and sizes

    Signed-off-by: Stuart Hodgson
    [bwh: Drop redundant validation, comment, whitespace]
    Signed-off-by: Ben Hutchings

    Stuart Hodgson
     
  • ETHTOOL_GMODULEINFO returns a new struct ethtool_modinfo that will return the
    type and size of plug-in module eeprom (such as SFP+) for parsing
    by userland program.

    ETHTOOL_GMODULEEEPROM returns the raw eeprom information
    using the existing ethtool_eeprom structture to return the data

    Signed-off-by: Stuart Hodgson
    Signed-off-by: Ben Hutchings

    Stuart Hodgson