10 Dec, 2014

1 commit

  • When deploying FQ pacing, one thing we noticed is that CUBIC Hystart
    triggers too soon.

    Having SNMP counters to have an idea of how often the various Hystart
    methods trigger is useful prior to any modifications.

    This patch adds SNMP counters tracking, how many time "ack train" or
    "Delay" based Hystart triggers, and cumulative sum of cwnd at the time
    Hystart decided to end SS (Slow Start)

    myhost:~# nstat -a | grep Hystart
    TcpExtTCPHystartTrainDetect 9 0.0
    TcpExtTCPHystartTrainCwnd 20650 0.0
    TcpExtTCPHystartDelayDetect 10 0.0
    TcpExtTCPHystartDelayCwnd 360 0.0

    ->
    Train detection was triggered 9 times, and average cwnd was
    20650/9=2294,
    Delay detection was triggered 10 times and average cwnd was 36

    Signed-off-by: Eric Dumazet
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Nov, 2014

1 commit

  • As NIC multicast filtering isn't perfect, and some platforms are
    quite content to spew broadcasts, we should not trigger an event
    for skb:kfree_skb when we do not have a match for such an incoming
    datagram. We do though want to avoid sweeping the matter under the
    rug entirely, so increment a suitable statistic.

    This incorporates feedback from David L. Stevens, Karl Neiss and Eric
    Dumazet.

    V3 - use bool per David Miller

    Signed-off-by: Rick Jones
    Signed-off-by: David S. Miller

    Rick Jones
     

05 Nov, 2014

1 commit


28 Jul, 2014

1 commit

  • The 'nqueues' counter is protected by the lru list lock,
    once thats removed this needs to be converted to atomic
    counter. Given this isn't used for anything except for
    reporting it to userspace via /proc, just remove it.

    We still report the memory currently used by fragment
    reassembly queues.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

08 May, 2014

1 commit

  • commit 8f0ea0fe3a036a47767f9c80e (snmp: reduce percpu needs by 50%)
    reduced snmp array size to 1, so technically it doesn't have to be
    an array any more. What's more, after the following commit:

    commit 933393f58fef9963eac61db8093689544e29a600
    Date: Thu Dec 22 11:58:51 2011 -0600

    percpu: Remove irqsafe_cpu_xxx variants

    We simply say that regular this_cpu use must be safe regardless of
    preemption and interrupt state. That has no material change for x86
    and s390 implementations of this_cpu operations. However, arches that
    do not provide their own implementation for this_cpu operations will
    now get code generated that disables interrupts instead of preemption.

    probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
    almost 3 years, no one complains.

    So, just convert the array to a single pointer and remove snmp_mib_init()
    and snmp_mib_free() as well.

    Cc: Christoph Lameter
    Cc: Eric Dumazet
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

04 Mar, 2014

1 commit

  • Add the following snmp stats:

    TCPFastOpenActiveFail: Fast Open attempts (SYN/data) failed beacuse
    the remote does not accept it or the attempts timed out.

    TCPSynRetrans: number of SYN and SYN/ACK retransmits to break down
    retransmissions into SYN, fast-retransmits, timeout retransmits, etc.

    TCPOrigDataSent: number of outgoing packets with original data (excluding
    retransmission but including data-in-SYN). This counter is different from
    TcpOutSegs because TcpOutSegs also tracks pure ACKs. TCPOrigDataSent is
    more useful to track the TCP retransmission rate.

    Change TCPFastOpenActive to track only successful Fast Opens to be symmetric to
    TCPFastOpenPassive.

    Signed-off-by: Yuchung Cheng
    Signed-off-by: Eric Dumazet
    Signed-off-by: Nandita Dukkipati
    Signed-off-by: Lawrence Brakmo
    Signed-off-by: David S. Miller

    Yuchung Cheng
     

27 Feb, 2014

1 commit

  • Three counters are added:
    - one to track when we went from non-zero to zero window
    - one to track the reverse
    - one counter incremented when we want to announce zero window,
    but can't because we would shrink current window.

    Suggested-by: Eric Dumazet
    Signed-off-by: Florian Westphal
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Florian Westphal
     

02 Jan, 2014

1 commit


07 Dec, 2013

1 commit

  • With the introduction of TCP Small Queues, TSO auto sizing, and TCP
    pacing, we can implement Automatic Corking in the kernel, to help
    applications doing small write()/sendmsg() to TCP sockets.

    Idea is to change tcp_push() to check if the current skb payload is
    under skb optimal size (a multiple of MSS bytes)

    If under 'size_goal', and at least one packet is still in Qdisc or
    NIC TX queues, set the TCP Small Queue Throttled bit, so that the push
    will be delayed up to TX completion time.

    This delay might allow the application to coalesce more bytes
    in the skb in following write()/sendmsg()/sendfile() system calls.

    The exact duration of the delay is depending on the dynamics
    of the system, and might be zero if no packet for this flow
    is actually held in Qdisc or NIC TX ring.

    Using FQ/pacing is a way to increase the probability of
    autocorking being triggered.

    Add a new sysctl (/proc/sys/net/ipv4/tcp_autocorking) to control
    this feature and default it to 1 (enabled)

    Add a new SNMP counter : nstat -a | grep TcpExtTCPAutoCorking
    This counter is incremented every time we detected skb was under used
    and its flush was deferred.

    Tested:

    Interesting effects when using line buffered commands under ssh.

    Excellent performance results in term of cpu usage and total throughput.

    lpq83:~# echo 1 >/proc/sys/net/ipv4/tcp_autocorking
    lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
    9410.39

    Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':

    35209.439626 task-clock # 2.901 CPUs utilized
    2,294 context-switches # 0.065 K/sec
    101 CPU-migrations # 0.003 K/sec
    4,079 page-faults # 0.116 K/sec
    97,923,241,298 cycles # 2.781 GHz [83.31%]
    51,832,908,236 stalled-cycles-frontend # 52.93% frontend cycles idle [83.30%]
    25,697,986,603 stalled-cycles-backend # 26.24% backend cycles idle [66.70%]
    102,225,978,536 instructions # 1.04 insns per cycle
    # 0.51 stalled cycles per insn [83.38%]
    18,657,696,819 branches # 529.906 M/sec [83.29%]
    91,679,646 branch-misses # 0.49% of all branches [83.40%]

    12.136204899 seconds time elapsed

    lpq83:~# echo 0 >/proc/sys/net/ipv4/tcp_autocorking
    lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
    6624.89

    Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':
    40045.864494 task-clock # 3.301 CPUs utilized
    171 context-switches # 0.004 K/sec
    53 CPU-migrations # 0.001 K/sec
    4,080 page-faults # 0.102 K/sec
    111,340,458,645 cycles # 2.780 GHz [83.34%]
    61,778,039,277 stalled-cycles-frontend # 55.49% frontend cycles idle [83.31%]
    29,295,522,759 stalled-cycles-backend # 26.31% backend cycles idle [66.67%]
    108,654,349,355 instructions # 0.98 insns per cycle
    # 0.57 stalled cycles per insn [83.34%]
    19,552,170,748 branches # 488.244 M/sec [83.34%]
    157,875,417 branch-misses # 0.81% of all branches [83.34%]

    12.130267788 seconds time elapsed

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 Aug, 2013

1 commit


10 Aug, 2013

1 commit

  • Rename mib counter from "low latency" to "busy poll"

    v1 also moved the counter to the ip MIB (suggested by Shawn Bohrer)
    Eric Dumazet suggested that the current location is better.

    So v2 just renames the counter to fit the new naming convention.

    Signed-off-by: Eliezer Tamir
    Signed-off-by: David S. Miller

    Eliezer Tamir
     

09 Aug, 2013

1 commit

  • With GRO/LRO processing, there is a problem because Ip[6]InReceives SNMP
    counters do not count the number of frames, but number of aggregated
    segments.

    Its probably too late to change this now.

    This patch adds four new counters, tracking number of frames, regardless
    of LRO/GRO, and on a per ECN status basis, for IPv4 and IPv6.

    Ip[6]NoECTPkts : Number of packets received with NOECT
    Ip[6]ECT1Pkts : Number of packets received with ECT(1)
    Ip[6]ECT0Pkts : Number of packets received with ECT(0)
    Ip[6]CEPkts : Number of packets received with Congestion Experienced

    lph37:~# nstat | egrep "Pkts|InReceive"
    IpInReceives 1634137 0.0
    Ip6InReceives 3714107 0.0
    Ip6InNoECTPkts 19205 0.0
    Ip6InECT0Pkts 52651828 0.0
    IpExtInNoECTPkts 33630 0.0
    IpExtInECT0Pkts 15581379 0.0
    IpExtInCEPkts 6 0.0

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

11 Jun, 2013

1 commit

  • Adds an ndo_ll_poll method and the code that supports it.
    This method can be used by low latency applications to busy-poll
    Ethernet device queues directly from the socket code.
    sysctl_net_ll_poll controls how many microseconds to poll.
    Default is zero (disabled).
    Individual protocol support will be added by subsequent patches.

    Signed-off-by: Alexander Duyck
    Signed-off-by: Jesse Brandeburg
    Signed-off-by: Eliezer Tamir
    Acked-by: Eric Dumazet
    Tested-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Eliezer Tamir
     

30 Apr, 2013

1 commit

  • Add MIB counters for checksum errors in IP layer,
    and TCP/UDP/ICMP layers, to help diagnose problems.

    $ nstat -a | grep Csum
    IcmpInCsumErrors 72 0.0
    TcpInCsumErrors 382 0.0
    UdpInCsumErrors 463221 0.0
    Icmp6InCsumErrors 75 0.0
    Udp6InCsumErrors 173442 0.0
    IpExtInCsumErrors 10884 0.0

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

19 Apr, 2013

1 commit

  • Host queues (Qdisc + NIC) can hold packets so long that TCP can
    eventually retransmit a packet before the first transmit even left
    the host.

    Its not clear right now if we could avoid this in the first place :

    - We could arm RTO timer not at the time we enqueue packets, but
    at the time we TX complete them (tcp_wfree())

    - Cancel the sending of the new copy of the packet if prior one
    is still in queue.

    This patch adds instrumentation so that we can at least see how
    often this problem happens.

    TCPSpuriousRtxHostQueues SNMP counter is incremented every time
    we detect the fast clone is not yet freed in tcp_transmit_skb()

    Signed-off-by: Eric Dumazet
    Cc: Yuchung Cheng
    Cc: Neal Cardwell
    Cc: Tom Herbert
    Cc: Willem de Bruijn
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Mar, 2013

2 commits

  • This is the second of the TLP patch series; it augments the basic TLP
    algorithm with a loss detection scheme.

    This patch implements a mechanism for loss detection when a Tail
    loss probe retransmission plugs a hole thereby masking packet loss
    from the sender. The loss detection algorithm relies on counting
    TLP dupacks as outlined in Sec. 3 of:
    http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01

    The basic idea is: Sender keeps track of TLP "episode" upon
    retransmission of a TLP packet. An episode ends when the sender receives
    an ACK above the SND.NXT (tracked by tlp_high_seq) at the time of the
    episode. We want to make sure that before the episode ends the sender
    receives a "TLP dupack", indicating that the TLP retransmission was
    unnecessary, so there was no loss/hole that needed plugging. If the
    sender gets no TLP dupack before the end of the episode, then it reduces
    ssthresh and the congestion window, because the TLP packet arriving at
    the receiver probably plugged a hole.

    Signed-off-by: Nandita Dukkipati
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Nandita Dukkipati
     
  • This patch series implement the Tail loss probe (TLP) algorithm described
    in http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01. The
    first patch implements the basic algorithm.

    TLP's goal is to reduce tail latency of short transactions. It achieves
    this by converting retransmission timeouts (RTOs) occuring due
    to tail losses (losses at end of transactions) into fast recovery.
    TLP transmits one packet in two round-trips when a connection is in
    Open state and isn't receiving any ACKs. The transmitted packet, aka
    loss probe, can be either new or a retransmission. When there is tail
    loss, the ACK from a loss probe triggers FACK/early-retransmit based
    fast recovery, thus avoiding a costly RTO. In the absence of loss,
    there is no change in the connection state.

    PTO stands for probe timeout. It is a timer event indicating
    that an ACK is overdue and triggers a loss probe packet. The PTO value
    is set to max(2*SRTT, 10ms) and is adjusted to account for delayed
    ACK timer when there is only one oustanding packet.

    TLP Algorithm

    On transmission of new data in Open state:
    -> packets_out > 1: schedule PTO in max(2*SRTT, 10ms).
    -> packets_out == 1: schedule PTO in max(2*RTT, 1.5*RTT + 200ms)
    -> PTO = min(PTO, RTO)

    Conditions for scheduling PTO:
    -> Connection is in Open state.
    -> Connection is either cwnd limited or no new data to send.
    -> Number of probes per tail loss episode is limited to one.
    -> Connection is SACK enabled.

    When PTO fires:
    new_segment_exists:
    -> transmit new segment.
    -> packets_out++. cwnd remains same.

    no_new_packet:
    -> retransmit the last segment.
    Its ACK triggers FACK or early retransmit based recovery.

    ACK path:
    -> rearm RTO at start of ACK processing.
    -> reschedule PTO if need be.

    In addition, the patch includes a small variation to the Early Retransmit
    (ER) algorithm, such that ER and TLP together can in principle recover any
    N-degree of tail loss through fast recovery. TLP is controlled by the same
    sysctl as ER, tcp_early_retrans sysctl.
    tcp_early_retrans==0; disables TLP and ER.
    ==1; enables RFC5827 ER.
    ==2; delayed ER.
    ==3; TLP and delayed ER. [DEFAULT]
    ==4; TLP only.

    The TLP patch series have been extensively tested on Google Web servers.
    It is most effective for short Web trasactions, where it reduced RTOs by 15%
    and improved HTTP response time (average by 6%, 99th percentile by 10%).
    The transmitted probes account for
    Acked-by: Neal Cardwell
    Acked-by: Yuchung Cheng
    Signed-off-by: David S. Miller

    Nandita Dukkipati
     

19 Feb, 2013

2 commits

  • proc_net_remove is only used to remove proc entries
    that under /proc/net,it's not a general function for
    removing proc entries of netns. if we want to remove
    some proc entries which under /proc/net/stat/, we still
    need to call remove_proc_entry.

    this patch use remove_proc_entry to replace proc_net_remove.
    we can remove proc_net_remove after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     
  • Right now, some modules such as bonding use proc_create
    to create proc entries under /proc/net/, and other modules
    such as ipv4 use proc_net_fops_create.

    It looks a little chaos.this patch changes all of
    proc_net_fops_create to proc_create. we can remove
    proc_net_fops_create after this patch.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

01 Sep, 2012

1 commit

  • This patch adds all the necessary data structure and support
    functions to implement TFO server side. It also documents a number
    of flags for the sysctl_tcp_fastopen knob, and adds a few Linux
    extension MIBs.

    In addition, it includes the following:

    1. a new TCP_FASTOPEN socket option an application must call to
    supply a max backlog allowed in order to enable TFO on its listener.

    2. A number of key data structures:
    "fastopen_rsk" in tcp_sock - for a big socket to access its
    request_sock for retransmission and ack processing purpose. It is
    non-NULL iff 3WHS not completed.

    "fastopenq" in request_sock_queue - points to a per Fast Open
    listener data structure "fastopen_queue" to keep track of qlen (# of
    outstanding Fast Open requests) and max_qlen, among other things.

    "listener" in tcp_request_sock - to point to the original listener
    for book-keeping purpose, i.e., to maintain qlen against max_qlen
    as part of defense against IP spoofing attack.

    3. various data structure and functions, many in tcp_fastopen.c, to
    support server side Fast Open cookie operations, including
    /proc/sys/net/ipv4/tcp_fastopen_key to allow manual rekeying.

    Signed-off-by: H.K. Jerry Chu
    Cc: Yuchung Cheng
    Cc: Neal Cardwell
    Cc: Eric Dumazet
    Cc: Tom Herbert
    Signed-off-by: David S. Miller

    Jerry Chu
     

20 Jul, 2012

1 commit

  • This patch implements sending SYN-data in tcp_connect(). The data is
    from tcp_sendmsg() with flag MSG_FASTOPEN (implemented in a later patch).

    The length of the cookie in tcp_fastopen_req, init'd to 0, controls the
    type of the SYN. If the cookie is not cached (len==0), the host sends
    data-less SYN with Fast Open cookie request option to solicit a cookie
    from the remote. If cookie is not available (len > 0), the host sends
    a SYN-data with Fast Open cookie option. If cookie length is negative,
    the SYN will not include any Fast Open option (for fall back operations).

    To deal with middleboxes that may drop SYN with data or experimental TCP
    option, the SYN-data is only sent once. SYN retransmits do not include
    data or Fast Open options. The connection will fall back to regular TCP
    handshake.

    Signed-off-by: Yuchung Cheng
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Yuchung Cheng
     

17 Jul, 2012

3 commits

  • Implement the RFC 5691 mitigation against Blind
    Reset attack using SYN bit.

    Section 4.2 of RFC 5961 advises to send a Challenge ACK and drop
    incoming packet, instead of resetting the session.

    Add a new SNMP counter to count number of challenge acks sent
    in response to SYN packets.
    (netstat -s | grep TCPSYNChallenge)

    Remove obsolete TCPAbortOnSyn, since we no longer abort a TCP session
    because of a SYN flag.

    Signed-off-by: Eric Dumazet
    Cc: Kiran Kumar Kella
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Implement the RFC 5691 mitigation against Blind
    Reset attack using RST bit.

    Idea is to validate incoming RST sequence,
    to match RCV.NXT value, instead of previouly accepted
    window : (RCV.NXT < RCV.NXT+RCV.WND)

    If sequence is in window but not an exact match, send
    a "challenge ACK", so that the other part can resend an
    RST with the appropriate sequence.

    Add a new sysctl, tcp_challenge_ack_limit, to limit
    number of challenge ACK sent per second.

    Add a new SNMP counter to count number of challenge acks sent.
    (netstat -s | grep TCPChallengeACK)

    Signed-off-by: Eric Dumazet
    Cc: Kiran Kumar Kella
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Add three SNMP TCP counters, to better track TCP behavior
    at global stage (netstat -s), when packets are received
    Out Of Order (OFO)

    TCPOFOQueue : Number of packets queued in OFO queue

    TCPOFODrop : Number of packets meant to be queued in OFO
    but dropped because socket rcvbuf limit hit.

    TCPOFOMerge : Number of packets in OFO that were merged with
    other packets.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Mar, 2012

1 commit

  • With increasing receive window sizes, but speed of light not improved
    that much, out of order queue can contain a huge number of skbs, waiting
    to be moved to receive_queue when missing packets can fill the holes.

    Some devices happen to use fat skbs (truesize of 4096 + sizeof(struct
    sk_buff)) to store regular (MTU
    Cc: Neal Cardwell
    Cc: Yuchung Cheng
    Cc: H.K. Jerry Chu
    Cc: Tom Herbert
    Cc: Ilpo Järvinen
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     

27 Jan, 2012

1 commit


23 Jan, 2012

1 commit

  • Correctly implement a loss detection heuristic: New sequences (above
    high_seq) sent during the fast recovery are deemed lost when higher
    sequences are SACKed.

    Current code does not catch these losses, because tcp_mark_head_lost()
    does not check packets beyond high_seq. The fix is straight-forward by
    checking packets until the highest sacked packet. In addition, all the
    FLAG_DATA_LOST logic are in-effective and redundant and can be removed.

    Update the loss heuristic comments. The algorithm above is documented
    as heuristic B, but it is redundant too because heuristic A already
    covers B.

    Note that this change only marks some forward-retransmitted packets LOST.
    It does NOT forbid TCP performing further CWR on new losses. A potential
    follow-up patch under preparation is to perform another CWR on "new"
    losses such as
    1) sequence above high_seq is lost (by resetting high_seq to snd_nxt)
    2) retransmission is lost.

    Signed-off-by: Yuchung Cheng
    Signed-off-by: David S. Miller

    Yuchung Cheng
     

13 Dec, 2011

1 commit

  • This patch replaces all uses of struct sock fields' memory_pressure,
    memory_allocated, sockets_allocated, and sysctl_mem to acessor
    macros. Those macros can either receive a socket argument, or a mem_cgroup
    argument, depending on the context they live in.

    Since we're only doing a macro wrapping here, no performance impact at all is
    expected in the case where we don't have cgroups disabled.

    Signed-off-by: Glauber Costa
    Reviewed-by: Hiroyouki Kamezawa
    CC: David S. Miller
    CC: Eric W. Biederman
    CC: Eric Dumazet
    Signed-off-by: David S. Miller

    Glauber Costa
     

10 Nov, 2011

1 commit

  • Reading /proc/net/snmp on a machine with a lot of cpus is very expensive
    (can be ~88000 us).

    This is because ICMPMSG MIB uses 4096 bytes per cpu, and folding values
    for all possible cpus can read 16 Mbytes of memory.

    ICMP messages are not considered as fast path on a typical server, and
    eventually few cpus handle them anyway. We can afford an atomic
    operation instead of using percpu data.

    This saves 4096 bytes per cpu and per network namespace.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Nov, 2011

1 commit


16 Sep, 2011

1 commit

  • "Possible SYN flooding on port xxxx " messages can fill logs on servers.

    Change logic to log the message only once per listener, and add two new
    SNMP counters to track :

    TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client

    TCPReqQFullDrop : number of times a SYN request was dropped because
    syncookies were not enabled.

    Based on a prior patch from Tom Herbert, and suggestions from David.

    Signed-off-by: Eric Dumazet
    CC: Tom Herbert
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Dec, 2010

1 commit


11 Nov, 2010

1 commit

  • Robin Holt tried to boot a 16TB machine and found some limits were
    reached : sysctl_tcp_mem[2], sysctl_udp_mem[2]

    We can switch infrastructure to use long "instead" of "int", now
    atomic_long_t primitives are available for free.

    Signed-off-by: Eric Dumazet
    Reported-by: Robin Holt
    Reviewed-by: Robin Holt
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Jul, 2010

1 commit

  • /proc/net/snmp and /proc/net/netstat expose SNMP counters.

    Width of these counters is either 32 or 64 bits, depending on the size
    of "unsigned long" in kernel.

    This means user program parsing these files must already be prepared to
    deal with 64bit values, regardless of user program being 32 or 64 bit.

    This patch introduces 64bit snmp values for IPSTAT mib, where some
    counters can wrap pretty fast if they are 32bit wide.

    # netstat -s|egrep "InOctets|OutOctets"
    InOctets: 244068329096
    OutOctets: 244069348848

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

03 Jun, 2010

1 commit

  • Christoph Lameter mentioned that packets could be dropped in input path
    because of rp_filter settings, without any SNMP counter being
    incremented. System administrator can have a hard time to track the
    problem.

    This patch introduces a new counter, LINUX_MIB_IPRPFILTER, incremented
    each time we drop a packet because Reverse Path Filter triggers.

    (We receive an IPv4 datagram on a given interface, and find the route to
    send an answer would use another interface)

    netstat -s | grep IPReversePathFilter
    IPReversePathFilter: 21714

    Reported-by: Christoph Lameter
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

22 Mar, 2010

1 commit

  • Its currently hard to diagnose when ACK frames are dropped because an
    application set TCP_DEFER_ACCEPT on its listening socket.

    See http://bugzilla.kernel.org/show_bug.cgi?id=15507

    This patch adds a SNMP value, named TCPDeferAcceptDrop

    netstat -s | grep TCPDeferAcceptDrop
    TCPDeferAcceptDrop: 0

    This counter is incremented every time we drop a pure ACK frame received
    by a socket in SYN_RECV state because its SYNACK retrans count is lower
    than defer_accept value.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Mar, 2010

1 commit

  • Commit 6b03a53a (tcp: use limited socket backlog) added the possibility
    of dropping frames when backlog queue is full.

    Commit d218d111 (tcp: Generalized TTL Security Mechanism) added the
    possibility of dropping frames when TTL is under a given limit.

    This patch adds new SNMP MIB entries, named TCPBacklogDrop and
    TCPMinTTLDrop, published in /proc/net/netstat in TcpExt: line

    netstat -s | egrep "TCPBacklogDrop|TCPMinTTLDrop"
    TCPBacklogDrop: 0
    TCPMinTTLDrop: 0

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 Feb, 2010

1 commit

  • Add __percpu sparse annotations to net.

    These annotations are to make sparse consider percpu variables to be
    in a different address space and warn if accessed without going
    through percpu accessors. This patch doesn't affect normal builds.

    The macro and type tricks around snmp stats make things a bit
    interesting. DEFINE/DECLARE_SNMP_STAT() macros mark the target field
    as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly. All
    snmp_mib_*() users which used to cast the argument to (void **) are
    updated to cast it to (void __percpu **).

    Signed-off-by: Tejun Heo
    Acked-by: David S. Miller
    Cc: Patrick McHardy
    Cc: Arnaldo Carvalho de Melo
    Cc: Vlad Yasevich
    Cc: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    Tejun Heo
     

23 Jan, 2010

1 commit


27 Apr, 2009

1 commit

  • The IP MIB (RFC 4293) defines stats for InOctets, OutOctets, InMcastOctets and
    OutMcastOctets:
    http://tools.ietf.org/html/rfc4293
    But it seems we don't track those in any way that easy to separate from other
    protocols. This patch adds those missing counters to the stats file. Tested
    successfully by me

    With help from Eric Dumazet.

    Signed-off-by: Neil Horman
    Signed-off-by: David S. Miller

    Neil Horman