26 Mar, 2008

3 commits


24 Mar, 2008

9 commits


23 Mar, 2008

12 commits

  • The variable cb is initialized but never used otherwise.

    The semantic patch that makes this change is as follows:
    (http://www.emn.fr/x-info/coccinelle/)

    //
    @@
    type T;
    identifier i;
    constant C;
    @@

    (
    extern T i;
    |
    - T i;

    )
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: David S. Miller

    Julia Lawall
     
  • The variable hlen is initialized but never used otherwise.

    The semantic patch that makes this change is as follows:
    (http://www.emn.fr/x-info/coccinelle/)

    //
    @@
    type T;
    identifier i;
    constant C;
    @@

    (
    extern T i;
    |
    - T i;

    )
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: David S. Miller

    Julia Lawall
     
  • This gets rid of a warning caused by the test in rcu_assign_pointer.
    I tried to fix rcu_assign_pointer, but that devolved into a long set
    of discussions about doing it right that came to no real solution.
    Since the test in rcu_assign_pointer for constant NULL would never
    succeed in fib_trie, just open code instead.

    Signed-off-by: Stephen Hemminger
    Acked-by: Paul E. McKenney
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • The route table parameters are set based on system memory and sysctl
    values that almost never change. Also the genid only changes every
    10 minutes.

    RTprint is defined by never used.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Just remove it.

    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Sorry for the patch sequence confusion :| but I found that the similar
    thing can be done for raw sockets easily too late.

    Expand the proto.h union with the raw_hashinfo member and use it in
    raw_prot and rawv6_prot. This allows to drop the protocol specific
    versions of hash and unhash callbacks.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • After this we have only udp_lib_get_port to get the port and two
    stubs for ipv4 and ipv6. No difference in udp and udplite except
    for initialized h.udp_hash member.

    I tried to find a graceful way to drop the only difference between
    udp_v4_get_port and udp_v6_get_port (i.e. the rcv_saddr comparison
    routine), but adding one more callback on the struct proto didn't
    appear such :( Maybe later.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Inspired by the commit ab1e0a13 ([SOCK] proto: Add hashinfo member to
    struct proto) from Arnaldo, I made similar thing for UDP/-Lite IPv4
    and -v6 protocols.

    The result is not that exciting, but it removes some levels of
    indirection in udpxxx_get_port and saves some space in code and text.

    The first step is to union existing hashinfo and new udp_hash on the
    struct proto and give a name to this union, since future initialization
    of tcpxxx_prot, dccp_vx_protinfo and udpxxx_protinfo will cause gcc
    warning about inability to initialize anonymous member this way.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • This makes code a bit more uniform and straigthforward.

    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • ip_options->is_data is assigned only and never checked. The structure is
    not a part of kernel interface to the userspace. So, it is safe to remove
    this field.

    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • There is the only way to reach ip_options compile with opt != NULL:

    ip_options_get_finish
    opt->is_data = 1;
    ip_options_compile(opt, NULL)

    So, checking for is_data inside opt != NULL branch is not needed.

    Signed-off-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • While testing the virtio-net driver on KVM with TSO I noticed
    that TSO performance with a 1500 MTU is significantly worse
    compared to the performance of non-TSO with a 16436 MTU. The
    packet dump shows that most of the packets sent are smaller
    than a page.

    Looking at the code this actually is quite obvious as it always
    stop extending the packet if it's the first packet yet to be
    sent and if it's larger than the MSS. Since each extension is
    bound by the page size, this means that (given a 1500 MTU) we're
    very unlikely to construct packets greater than a page, provided
    that the receiver and the path is fast enough so that packets can
    always be sent immediately.

    The fix is also quite obvious. The push calls inside the loop
    is just an optimisation so that we don't end up doing all the
    sending at the end of the loop. Therefore there is no specific
    reason why it has to do so at MSS boundaries. For TSO, the
    most natural extension of this optimisation is to do the pushing
    once the skb exceeds the TSO size goal.

    This is what the patch does and testing with KVM shows that the
    TSO performance with a 1500 MTU easily surpasses that of a 16436
    MTU and indeed the packet sizes sent are generally larger than
    16436.

    I don't see any obvious downsides for slower peers or connections,
    but it would be prudent to test this extensively to ensure that
    those cases don't regress.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

22 Mar, 2008

7 commits

  • Change TCP_DEFER_ACCEPT implementation so that it transitions a
    connection to ESTABLISHED after handshake is complete instead of
    leaving it in SYN-RECV until some data arrvies. Place connection in
    accept queue when first data packet arrives from slow path.

    Benefits:
    - established connection is now reset if it never makes it
    to the accept queue

    - diagnostic state of established matches with the packet traces
    showing completed handshake

    - TCP_DEFER_ACCEPT timeouts are expressed in seconds and can now be
    enforced with reasonable accuracy instead of rounding up to next
    exponential back-off of syn-ack retry.

    Signed-off-by: Patrick McManus
    Signed-off-by: David S. Miller

    Patrick McManus
     
  • a socket in LISTEN that had completed its 3 way handshake, but not notified
    userspace because of SO_DEFER_ACCEPT, would retransmit the already
    acked syn-ack during the time it was waiting for the first data byte
    from the peer.

    Signed-off-by: Patrick McManus
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Patrick McManus
     
  • timeout associated with SO_DEFER_ACCEPT wasn't being honored if it was
    less than the timeout allowed by the maximum syn-recv queue size
    algorithm. Fix by using the SO_DEFER_ACCEPT value if the ack has
    arrived.

    Signed-off-by: Patrick McManus
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Patrick McManus
     
  • This is a narrow pedantry :) but the dlci_ioctl_hook check and call
    should not be parted with the mutex lock.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Commits f40c81 ([NETNS][IPV4] tcp - make proc handle the network
    namespaces) and a91275 ([NETNS][IPV6] udp - make proc handle the
    network namespace) both introduced bad checks on sockets and tw
    buckets to belong to proper net namespace.

    I.e. when checking for socket to belong to given net and family the

    do {
    sk = sk_next(sk);
    } while (sk && sk->sk_net != net && sk->sk_family != family);

    constructions were used. This is wrong, since as soon as the
    sk->sk_net fits the net the socket is immediately returned, even if it
    belongs to other family.

    As the result four /proc/net/(udp|tcp)[6] entries show wrong info.
    The udp6 entry even oopses when dereferencing inet6_sk(sk) pointer:

    static void udp6_sock_seq_show(struct seq_file *seq, struct sock *sp, int bucket)
    {
    ...
    struct ipv6_pinfo *np = inet6_sk(sp);
    ...

    dest = &np->daddr; /* will be NULL for AF_INET sockets */
    ...
    seq_printf(...
    dest->s6_addr32[0], dest->s6_addr32[1],
    dest->s6_addr32[2], dest->s6_addr32[3],
    ...

    Fix it by converting && to ||.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Daniel Lezcano
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Make socket filters work for netlink unicast and notifications.
    This is useful for applications like Zebra that get overrun with
    messages that are then ignored.

    Note: netlink messages are in host byte order, but packet filter
    state machine operations are done as network byte order.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Been seeing occasional panics in my testing of 2.6.25-rc in ip_defrag.
    Offending line in ip_defrag is here:

    net = skb->dev->nd_net

    where dev is NULL. Bisected the problem down to commit
    ac18e7509e7df327e30d6e073a787d922eaf211d ([NETNS][FRAGS]: Make the
    inet_frag_queue lookup work in namespaces).

    Below patch (idea from Patrick McHardy) fixes the problem for me.

    Signed-off-by: Phil Oester
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Phil Oester
     

21 Mar, 2008

9 commits