03 Dec, 2006

40 commits

  • Only change upper-layer checksum from 0 to 0xFFFF for UDP (as RFC 768
    states), not for others as RFC 4443 doesn't require it.

    Signed-off-by: Brian Haley
    Signed-off-by: David S. Miller

    Brian Haley
     
  • Noticed by Al Viro:
    (frh->tos & ~IPV6_FLOWINFO_MASK))
    where IPV6_FLOWINFO_MASK is htonl(0xfffffff) and frh->tos
    is u8, which makes no sense here...

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Account for the netlink message header size directly in nlmsg_new()
    instead of relying on the caller calculate it correctly.

    Replaces error handling of message construction functions when
    constructing notifications with bug traps since a failure implies
    a bug in calculating the size of the skb.

    Signed-off-by: Thomas Graf
    Acked-by: Paul Moore
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • This removes two redundancies:

    1) The test (skb->protocol == htons(ETH_P_IPV6) in tcp_v6_init_sequence()
    is always true, due to
    * tcp_v6_conn_request() is the only function calling this one
    * tcp_v6_conn_request() redirects all skb's with ETH_P_IP protocol to
    tcp_v4_conn_request() [ cf. top of tcp_v6_conn_request()]

    2) The first argument, `struct sock *sk' of tcp_v{4,6}_init_sequence() is
    never used.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This patch does the following:
    a) introduces variable-length checksums as specified in [RFC 4340, sec. 9.2]
    b) provides necessary socket options and documentation as to how to use them
    c) basic support and infrastructure for the Minimum Checksum Coverage feature
    [RFC 4340, sec. 9.2.1]: acceptability tests, user notification and user
    interface

    In addition, it

    (1) fixes two bugs in the DCCPv4 checksum computation:
    * pseudo-header used checksum_len instead of skb->len
    * incorrect checksum coverage calculation based on dccph_x
    (2) removes dccp_v4_verify_checksum() since it reduplicates code of the
    checksum computation; code calling this function is updated accordingly.
    (3) now uses skb_checksum(), which is safer than checksum_partial() if the
    sk_buff has is a non-linear buffer (has pages attached to it).
    (4) fixes an outstanding TODO item:
    * If P.CsCov is too large for the packet size, drop packet and return.

    The code has been tested with applications, the latest version of tcpdump now
    comes with support for partial DCCP checksums.

    Signed-off-by: Gerrit Renker
    Signed-off-by: Arnaldo Carvalho de Melo

    Gerrit Renker
     
  • For IP MIB (RFC4293).

    Signed-off-by: YOSHIFUJI Hideaki

    YOSHIFUJI Hideaki
     
  • Otherwise, we will see a lot of casts...

    Signed-off-by: YOSHIFUJI Hideaki

    YOSHIFUJI Hideaki
     
  • Signed-off-by: YOSHIFUJI Hideaki

    YOSHIFUJI Hideaki
     
  • Signed-off-by: YOSHIFUJI Hideaki

    YOSHIFUJI Hideaki
     
  • Sorts out the comments for processing steps 2,3 in section 8.5 of RFC 4340.
    All comments have been updated against this document, and the reference to step
    2 has been made consistent throughout the files.

    Signed-off-by: Gerrit Renker
    Signed-off-by: Arnaldo Carvalho de Melo

    Gerrit Renker
     
  • This is a code simplification to remove reduplicated code
    by concentrating and abstracting shared code.

    Detailed Changes:

    Gerrit Renker
     
  • This patch fixes data being spewed into the logs continually. As the
    code stood if there was a large queue and long delays timeo would go
    down to zero and never get reset.

    This fixes it by resetting timeo. Put constant into header as well.

    Signed-off-by: Ian McDonald
    Signed-off-by: Gerrit Renker
    Signed-off-by: Arnaldo Carvalho de Melo

    Ian McDonald
     
  • Fixes a typo in Kconfig, patch is by Ian McDonald and is re-sent from
    http://www.mail-archive.com/dccp@vger.kernel.org/msg00579.html

    Signed-off-by: Ian McDonald
    Signed-off-by: Gerrit Renker
    Signed-off-by: Arnaldo Carvalho de Melo

    Ian McDonald
     
  • This does the same for ipv6.c as the preceding one does for ipv4.c: Only the
    inet_connection_sock_af_ops forward declarations remain, since at least
    dccp_ipv6_mapped has a circular dependency to dccp_v6_request_recv_sock.

    No code change, merely re-ordering.

    Signed-off-by: Gerrit Renker
    Signed-off-by: Arnaldo Carvalho de Melo

    Gerrit Renker
     
  • This relates to Arnaldo's announcement in
    http://www.mail-archive.com/dccp@vger.kernel.org/msg00604.html

    Originally this had been part of the Oops fix and is a revised variant of
    http://www.mail-archive.com/dccp@vger.kernel.org/msg00598.html

    No code change, merely reshuffling, with the particular objective of
    having all request_sock_ops close(r) together for more clarity.

    Signed-off-by: Gerrit Renker
    Signed-off-by: Arnaldo Carvalho de Melo

    Gerrit Renker
     
  • This patch removes two functions, the send_ack functions of request_sock,
    which are not called/used by the DCCP code. It is correct that these
    functions are not called, below is a justification why calling these
    functions (on a passive socket in the LISTEN/RESPOND state) would mean
    a DCCP protocol violation.

    A) Background: using request_sock in TCP:

    Gerrit Renker
     
  • Gerrit Renker noticed dccp_tw_deschedule and submitted a patch with a FIXME,
    but as he suggests in the same patch the best thing is to just ditch this
    declaration, while doing that also noticed that tcp_tw_count is as well not
    defined anywhere, so ditch it too.

    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • This is a code simplification and was singled out from the
    DCCPv6 Oops patch on
    http://www.mail-archive.com/dccp@vger.kernel.org/msg00600.html

    It mainly makes the code consistent between ipv{4,6}.c for the functions
    dccp_v4_rcv
    dccp_v6_rcv
    and removes the do_time_wait label to simplify code somewhat.

    Commiter note: fixed up a compile problem, trivial.

    Signed-off-by: Gerrit Renker
    Signed-off-by: Arnaldo Carvalho de Melo

    Gerrit Renker
     
  • This is a code simplification:
    it combines three often recurring operations into one inline function,

    * allocate `len' bytes header space in skb
    * fill these `len' bytes with zeroes
    * cast the start of this header space as dccp_hdr

    Signed-off-by: Gerrit Renker
    Signed-off-by: Arnaldo Carvalho de Melo

    Gerrit Renker
     
  • This refers to the possible memory leak pointed out in
    http://www.mail-archive.com/dccp@vger.kernel.org/msg00574.html,
    fixed by David Miller in
    http://www.mail-archive.com/netdev@vger.kernel.org/msg24881.html

    and adds a FIXME to point out where code is missing.

    Signed-off-by: Gerrit Renker
    Signed-off-by: Arnaldo Carvalho de Melo

    Gerrit Renker
     
  • This is a re-send from
    http://www.mail-archive.com/dccp@vger.kernel.org/msg00553.html

    It is the same patch as before, but I have built in Arnaldo's suggestions
    pointed out in that posting.

    Signed-off-by: Gerrit Renker
    Signed-off-by: Arnaldo Carvalho de Melo

    Gerrit Renker
     
  • The data itself is already charged to the SKB, doing
    the skb_set_owner_w() just generates a lot of noise and
    extra atomics we don't really need.

    Lmbench improvements on lat_tcp are minimal:

    before:
    TCP latency using localhost: 23.2701 microseconds
    TCP latency using localhost: 23.1994 microseconds
    TCP latency using localhost: 23.2257 microseconds

    after:
    TCP latency using localhost: 22.8380 microseconds
    TCP latency using localhost: 22.9465 microseconds
    TCP latency using localhost: 22.8462 microseconds

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Rearrange TCP entries in alpha order.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • If user has permision to load modules, then autoload then attempt
    autoload of TCP congestion module.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Allow normal users to only choose among a restricted set of congestion
    control choices. The default is reno and what ever has been configured
    as default. But the policy can be changed by administrator at any time.

    For example, to allow any choice:
    cp /proc/sys/net/ipv4/tcp_available_congestion_control \
    /proc/sys/net/ipv4/tcp_allowed_congestion_control

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Create /proc/sys/net/ipv4/tcp_available_congestion_control
    that reflects currently available TCP choices.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • An alternate solution would be to make the digest a pointer, allocate
    it in sctp_endpoint_init() and free it in sctp_endpoint_destroy().

    I guess I should have originally done it this way...

    CC [M] net/sctp/sm_make_chunk.o
    net/sctp/sm_make_chunk.c: In function 'sctp_unpack_cookie':
    net/sctp/sm_make_chunk.c:1358: warning: initialization discards qualifiers from pointer target type

    The reason is that sctp_unpack_cookie() takes a const struct
    sctp_endpoint and modifies the digest in it (digest being embedded in
    the struct, not a pointer). Make digest a pointer to fix this
    warning.

    Signed-off-by: Vlad Yasevich
    Acked-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: David S. Miller

    David S. Miller
     
  • We currently allocate a fixed size (TCP_SYNQ_HSIZE=512) slots hash table for
    each LISTEN socket, regardless of various parameters (listen backlog for
    example)

    On x86_64, this means order-1 allocations (might fail), even for 'small'
    sockets, expecting few connections. On the contrary, a huge server wanting a
    backlog of 50000 is slowed down a bit because of this fixed limit.

    This patch makes the sizing of listen hash table a dynamic parameter,
    depending of :
    - net.core.somaxconn tunable (default is 128)
    - net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128)
    - backlog value given by user application (2nd parameter of listen())

    For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of
    kmalloc().

    We still limit memory allocation with the two existing tunables (somaxconn &
    tcp_max_syn_backlog). So for standard setups, this patch actually reduce RAM
    usage.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Based on patch by Patrick McHardy.

    Add a new option, NET_SCH_FIFO, which provides a simple fifo qdisc
    without requiring CONFIG_NET_SCHED.

    The d80211 stack needs a generic fifo qdisc for WME. At present it
    uses net/d80211/fifo_qdisc.c which is functionally equivalent to
    sch_fifo.c. This patch will allow the d80211 stack to remove
    net/d80211/fifo_qdisc.c and use sch_fifo.c instead.

    Signed-off-by: David Kimdon
    Signed-off-by: David S. Miller

    David Kimdon
     
  • Introduces a new flag FIB_RULE_INVERT causing rules to apply
    if the specified selector doesn't match.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Move the attribute policy for the non-specific attributes into
    net/fib_rules.h and include it in the respective protocols.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Move mark selector currently implemented per protocol into
    the protocol independant part.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • For the sake of consistency.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Now that all protocols have been made aware of the mark
    field it can be moved out of the union thus simplyfing
    its usage.

    The config options in the IPv4/IPv6/DECnet subsystems
    to enable respectively disable mark based routing only
    obfuscate the code with ifdefs, the cost for the
    additional comparison in the flow key is insignificant,
    and most distributions have all these options enabled
    by default anyway. Therefore it makes sense to remove
    the config options and enable mark based routing by
    default.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • nfmark is being used in various subsystems and has become
    the defacto mark field for all kinds of packets. Therefore
    it makes sense to rename it to `mark' and remove the
    dependency on CONFIG_NETFILTER.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • When dn_neigh.c was converted from kmalloc to kzalloc in commit
    0da974f4f303a6842516b764507e3c0a03f41e5a it was missed that
    dn_neigh_seq_open was actually clearing the allocation twice was
    missed.

    Signed-off-by: Ralf Baechle
    Signed-off-by: David S. Miller

    Ralf Baechle
     
  • Six callsites, huge.

    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Andrew Morton
     
  • =============================================
    [ INFO: possible recursive locking detected ]
    2.6.18-1.2726.fc6 #1

    Peter Zijlstra