29 Aug, 2009

40 commits

  • Remove the copy of the MD5 authentication key from tcp_check_req().
    This key has already been copied by tcp_v4_syn_recv_sock() or
    tcp_v6_syn_recv_sock().

    Signed-off-by: John Dykstra
    Signed-off-by: David S. Miller

    John Dykstra
     
  • Maintain a per-qdisc bitmap for pfifo_fast giving availability
    of skbs for each band. This allows faster lookup for a skb when
    there are no high priority skbs. Also, it helps in (rare) cases
    when there are no skbs on the list, where an immediate lookup is
    faster than iterating through the three bands.

    Signed-off-by: Krishna Kumar
    Signed-off-by: David S. Miller

    Krishna Kumar
     
  • When processing a received IPv6 Router Advertisement, the kernel
    creates or updates an IPv6 Neighbor Cache entry for the sender --
    but presently this does not occur if IPv6 forwarding is enabled
    (net.ipv6.conf.*.forwarding = 1), or if IPv6 Router Advertisements
    are not accepted (net.ipv6.conf.*.accept_ra = 0), because in these
    cases processing of the Router Advertisement has already halted.

    This patch allows the Neighbor Cache to be updated in these cases,
    while still avoiding any modification to routes or link parameters.

    This continues to satisfy RFC 4861, since any entry created in the
    Neighbor Cache as the result of a received Router Advertisement is
    still placed in the STALE state.

    Signed-off-by: David Ward
    Signed-off-by: David S. Miller

    David Ward
     
  • - Better small packet receive performance.
    - Better handling of Flow control on 5709.
    - Fixed iSCSI TMP ABORT TASK problem.
    - Added iSCSI TCP timestamp option.

    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • There is a race condition in the time-wait sockets code that can lead
    to premature termination of FIN_WAIT2 and, subsequently, to RST
    generation when the FIN,ACK from the peer finally arrives:

    Time TCP header
    0.000000 30755 > http [SYN] Seq=0 Win=2920 Len=0 MSS=1460 TSV=282912 TSER=0
    0.000008 http > 30755 aSYN, ACK] Seq=0 Ack=1 Win=2896 Len=0 MSS=1460 TSV=...
    0.136899 HEAD /1b.html?n1Lg=v1 HTTP/1.0 [Packet size limited during capture]
    0.136934 HTTP/1.0 200 OK [Packet size limited during capture]
    0.136945 http > 30755 [FIN, ACK] Seq=187 Ack=207 Win=2690 Len=0 TSV=270521...
    0.136974 30755 > http [ACK] Seq=207 Ack=187 Win=2734 Len=0 TSV=283049 TSER=...
    0.177983 30755 > http [ACK] Seq=207 Ack=188 Win=2733 Len=0 TSV=283089 TSER=...
    0.238618 30755 > http [FIN, ACK] Seq=207 Ack=188 Win=2733 Len=0 TSV=283151...
    0.238625 http > 30755 [RST] Seq=188 Win=0 Len=0

    Say twdr->slot = 1 and we are running inet_twdr_hangman and in this
    instance inet_twdr_do_twkill_work returns 1. At that point we will
    mark slot 1 and schedule inet_twdr_twkill_work. We will also make
    twdr->slot = 2.

    Next, a connection is closed and tcp_time_wait(TCP_FIN_WAIT2, timeo)
    is called which will create a new FIN_WAIT2 time-wait socket and will
    place it in the last to be reached slot, i.e. twdr->slot = 1.

    At this point say inet_twdr_twkill_work will run which will start
    destroying the time-wait sockets in slot 1, including the just added
    TCP_FIN_WAIT2 one.

    To avoid this issue we increment the slot only if all entries in the
    slot have been purged.

    This change may delay the slots cleanup by a time-wait death row
    period but only if the worker thread didn't had the time to run/purge
    the current slot in the next period (6 seconds with default sysctl
    settings). However, on such a busy system even without this change we
    would probably see delays...

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     
  • Here is rework and cleanup of the resize function.

    Some bugs we had. We were using ->parent when we should use
    node_parent(). Also we used ->parent which is not assigned by
    inflate in inflate loop.

    Also a fix to set thresholds to power 2 to fit halve
    and double strategy.

    max_resize is renamed to max_work which better indicates
    it's function.

    Reaching max_work is not an error, so warning is removed.
    max_work only limits amount of work done per resize.
    (limits CPU-usage, outstanding memory etc).

    The clean-up makes it relatively easy to add fixed sized
    root-nodes if we would like to decrease the memory pressure
    on routers with large routing tables and dynamic routing.
    If we'll need that...

    Its been tested with 280k routes.

    Work done together with Robert Olsson.

    Signed-off-by: Jens Låås
    Signed-off-by: Robert Olsson
    Signed-off-by: David S. Miller

    Jens Låås
     
  • if tunnel parameters have frag_off set to IP_DF, pmtudisc on the ipv4 link
    will be performed by deriving the mtu from the ipv4 link and setting the
    DF-Flag of the encapsulating IPv4 Header. If fragmentation is needed on the
    way, the IPv4 pmtu gets adjusted, the ipv6 package will be resent eventually,
    using the new and lower mtu and everyone is happy.

    If the frag_off parameter is unset, the mtu for the tunnel will be derived
    from the tunnel device or the ipv6 pmtu, which might be higher than the ipv4
    pmtu. In that case we must allow the fragmentation of the IPv4 packet because
    the IPv6 mtu wouldn't 'learn' from the adjusted IPv4 pmtu, resulting in
    frequent icmp_frag_needed and package loss on the IPv6 layer.

    This patch allows fragmentation when tunnel was created with parameter
    nopmtudisc, like in ipip/gre tunnels.

    Signed-off-by: Sascha Hlusiak
    Signed-off-by: David S. Miller

    Sascha Hlusiak
     
  • While doing some forwarding benchmarks, I noticed
    ip_rt_send_redirect() is rather expensive, even if send_redirects is
    false for the device.

    Fix is to avoid two atomic ops, we dont really need to take a
    reference on in_dev

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Introduce keepalive_probes(tp) helper, and use it, like
    keepalive_time_when(tp) and keepalive_intvl_when(tp)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This will allow the 10G iSCSI code to reuse the function.

    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • This will allow the 10G iSCSI code to reuse the function.

    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • It looks like after rename device proc entry is unusable,
    because of no ->read_proc or ->proc_fops.

    And create_proc_entry() is deprecated.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     
  • Increase module version, and cleanup module info.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Simpler to have one place that spins and accounts for delays,
    this will also make the last packet be detected faster for more
    repeatable timing.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • This changes how the pktgen thread spins/waits between
    packets if delay is configured. It uses a high res timer to
    wait for time to arrive.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • The kernel ktime_t is a nice generic infrastructure for mananging
    high resolution times, as is done in pktgen.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • If not using delay then no need to update next_tx after
    each packet sent. This allows pktgen to send faster especially
    on systems with slower clock sources.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Handle standard (and non-standard) return values in a switch.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • netdev_alloc_skb is NUMA node aware.
    Also, don't exhaust atomic emergency pool. Don't want pktgen
    to cause OOM behaviour.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • The if statement to test for "should a new packet be used"
    can be simplified.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Do some reorganization of transmit logic path:
    * move transmit queue full idle to separate routine
    * add a cpu_relax()
    * eliminate some of the uneeded goto's
    * if queue is still stopped, go back to main thread loop.
    * don't give up transmitting if quantum is exhausted (be greedy)

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • All the callers were freeing skb after stopping device.
    Remove unneeded forward decl.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Don't force inlining where not needed. Gcc does better job
    of deciding to inline local functions.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • A couple of minor functions can be written more compactly.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • TX completions were running in a workqueue queued by the ISR. This
    patch moves the processing of TX completions to an existing RSS NAPI
    context.
    Now each irq vector runs NAPI for one RSS ring and one or more TX
    completion rings.

    Signed-off-by: Ron Mercer
    Signed-off-by: David S. Miller

    Ron Mercer
     
  • Currently we downshift to MSI/Legacy if we don't get enough vectors for
    cpu_count RSS rings plus cpu_count TX completion rings. This patch
    allows running MSIX with the vector count that the platform provides.

    Signed-off-by: Ron Mercer
    Signed-off-by: David S. Miller

    Ron Mercer
     
  • Currently we have three types of RX rings.

    1) Default ring - services rx_ring for broadcast/multicast, handles
    firmware events, and errors.

    2) TX completion ring - handles only outbound completions.

    3) RSS ring - handles only inbound completions.

    This patch gets rid of the default ring type and moves it's functionality
    into the first RSS ring. This makes better use of MSIX vectors since
    they are a limited resource on some platforms.

    Signed-off-by: Ron Mercer
    Signed-off-by: David S. Miller

    Ron Mercer
     
  • David S. Miller
     
  • bonding: Have bond_check_dev_link examine netif_running

    Some network devices do not call netif_carrier_off when they
    are set administratively down. Have the bonding link check function
    also inspect the netif_running state. Ignore netif_running if the
    bond_check_dev_link function is called with "reporting" set, as in that
    case it's inspecting the capabilities of the non-netif_carrier device
    driver.

    Signed-off-by: Petri Gynther
    Signed-off-by: Jay Vosburgh
    Signed-off-by: David S. Miller

    Petri Gynther
     
  • max_bonds is of type int and cannot be greater than INT_MAX.

    Signed-off-by: Nicolas de Pesloüan
    Signed-off-by: Jay Vosburgh
    Signed-off-by: David S. Miller

    Nicolas de Pesloüan
     
  • Bonding can use compare_ether_addr() in bond_release.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: Jay Vosburgh
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Propogate the vlan_features of the slave devices to the bonding
    master device, using the same logic as for regular features.

    Tested by Or Gerlitz , who also removed
    the debug logic from the original test patch.

    Signed-off-by: Or Gerlitz
    Signed-off-by: Jay Vosburgh
    Signed-off-by: David S. Miller

    Jay Vosburgh
     
  • Most of the places in debugfs.c are missing a NULL check on the return value of
    get_zeroed_page API call. Added required NULL check at appropriate places.

    Signed-off-by: Kiran Divekar
    Signed-off-by: John W. Linville

    Kiran Divekar
     
  • Now that cfg80211 functions are added and wext converted to use wext-compat
    functions, remove wext structures and disabled code.

    Signed-off-by: Jussi Kivilinna
    Signed-off-by: John W. Linville

    Jussi Kivilinna
     
  • Signed-off-by: Jussi Kivilinna
    Signed-off-by: John W. Linville

    Jussi Kivilinna
     
  • Signed-off-by: Jussi Kivilinna
    Signed-off-by: John W. Linville

    Jussi Kivilinna
     
  • Signed-off-by: Jussi Kivilinna
    Signed-off-by: John W. Linville

    Jussi Kivilinna