11 Nov, 2009

1 commit


09 Nov, 2009

2 commits

  • Extends udp_table to contain a secondary hash table.

    socket anchor for this second hash is free, because UDP
    doesnt use skc_bind_node : We define an union to hold
    both skc_bind_node & a new hlist_nulls_node udp_portaddr_node

    udp_lib_get_port() inserts sockets into second hash chain
    (additional cost of one atomic op)

    udp_lib_unhash() deletes socket from second hash chain
    (additional cost of one atomic op)

    Note : No spinlock lockdep annotation is needed, because
    lock for the secondary hash chain is always get after
    lock for primary hash chain.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Adds a counter in udp_hslot to keep an accurate count
    of sockets present in chain.

    This will permit to upcoming UDP lookup algo to chose
    the shortest chain when secondary hash is added.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Oct, 2009

1 commit

  • UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for
    several setups.

    4000 active UDP sockets -> 32 sockets per chain in average. An
    incoming frame has to lookup all sockets to find best match, so long
    chains hurt latency.

    Instead of a fixed size hash table that cant be perfect for every
    needs, let UDP stack choose its table size at boot time like tcp/ip
    route, using alloc_large_system_hash() helper

    Add an optional boot parameter, uhash_entries=x so that an admin can
    force a size between 256 and 65536 if needed, like thash_entries and
    rhash_entries.

    dmesg logs two new lines :
    [ 0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes)
    [ 0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes)

    Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non
    debugging spinlocks.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Oct, 2009

1 commit

  • This provides safety against negative optlen at the type
    level instead of depending upon (sometimes non-trivial)
    checks against this sprinkled all over the the place, in
    each and every implementation.

    Based upon work done by Arjan van de Ven and feedback
    from Linus Torvalds.

    Signed-off-by: David S. Miller

    David S. Miller
     

13 Jul, 2009

1 commit


11 Apr, 2009

1 commit

  • Commit b2f5e7cd3dee2ed721bf0675e1a1ddebb849aee6
    (ipv6: Fix conflict resolutions during ipv6 binding)
    introduced a regression where time-wait sockets were
    not treated correctly. This resulted in the following:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000062
    IP: [] ipv4_rcv_saddr_equal+0x61/0x70
    ...
    Call Trace:
    [] ipv6_rcv_saddr_equal+0x1bb/0x250 [ipv6]
    [] inet6_csk_bind_conflict+0x88/0xd0 [ipv6]
    [] inet_csk_get_port+0x1ee/0x400
    [] inet6_bind+0x1cf/0x3a0 [ipv6]
    [] ? sockfd_lookup_light+0x3c/0xd0
    [] sys_bind+0x89/0x100
    [] ? trace_hardirqs_on_thunk+0x3a/0x3c
    [] system_call_fastpath+0x16/0x1b

    Tested-by: Brian Haley
    Tested-by: Ed Tomlinson
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

25 Mar, 2009

1 commit


17 Nov, 2008

1 commit

  • This is a straightforward patch, using hlist_nulls infrastructure.

    RCUification already done on UDP two weeks ago.

    Using hlist_nulls permits us to avoid some memory barriers, both
    at lookup time and delete time.

    Patch is large because it adds new macros to include/net/sock.h.
    These macros will be used by TCP & DCCP in next patch.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

29 Oct, 2008

1 commit

  • UDP sockets are hashed in a 128 slots hash table.

    This hash table is protected by *one* rwlock.

    This rwlock is readlocked each time an incoming UDP message is handled.

    This rwlock is writelocked each time a socket must be inserted in
    hash table (bind time), or deleted from this table (close time)

    This is not scalable on SMP machines :

    1) Even in read mode, lock() and unlock() are atomic operations and
    must dirty a contended cache line, shared by all cpus.

    2) A writer might be starved if many readers are 'in flight'. This can
    happen on a machine with some NIC receiving many UDP messages. User
    process can be delayed a long time at socket creation/dismantle time.

    This patch prepares RCU migration, by introducing 'struct udp_table
    and struct udp_hslot', and using one spinlock per chain, to reduce
    contention on central rwlock.

    Introducing one spinlock per chain reduces latencies, for port
    randomization on heavily loaded UDP servers. This also speedup
    bindings to specific ports.

    udp_lib_unhash() was uninlined, becoming to big.

    Some cleanups were done to ease review of following patch
    (RCUification of UDP Unicast lookups)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Oct, 2008

2 commits


01 Oct, 2008

1 commit


18 Jul, 2008

2 commits


06 Jul, 2008

4 commits


13 Jun, 2008

1 commit


05 Jun, 2008

1 commit

  • IPv6 UDP sockets wth IPv4 mapped address use udp_sendmsg to send the data
    actually. In this case ip_flush_pending_frames should be called instead
    of ip6_flush_pending_frames.

    Signed-off-by: Denis V. Lunev
    Signed-off-by: YOSHIFUJI Hideaki

    Denis V. Lunev
     

01 Apr, 2008

1 commit


29 Mar, 2008

4 commits


23 Mar, 2008

1 commit

  • After this we have only udp_lib_get_port to get the port and two
    stubs for ipv4 and ipv6. No difference in udp and udplite except
    for initialized h.udp_hash member.

    I tried to find a graceful way to drop the only difference between
    udp_v4_get_port and udp_v6_get_port (i.e. the rcv_saddr comparison
    routine), but adding one more callback on the struct proto didn't
    appear such :( Maybe later.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

21 Mar, 2008

2 commits


29 Jan, 2008

3 commits

  • 1) Cleanups (all functions are prefixed by sock_prot_inuse)

    sock_prot_inc_use(prot) -> sock_prot_inuse_add(prot,-1)
    sock_prot_dec_use(prot) -> sock_prot_inuse_add(prot,-1)
    sock_prot_inuse() -> sock_prot_inuse_get()

    New functions :

    sock_prot_inuse_init() and sock_prot_inuse_free() to abstract pcounter use.

    2) if CONFIG_PROC_FS=n, we can zap 'inuse' member from "struct proto",
    since nobody wants to read the inuse value.

    This saves 1372 bytes on i386/SMP and some cpu cycles.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Signed-off-by: Takahiro Yasui
    Signed-off-by: Hideo Aoki
    Signed-off-by: David S. Miller

    Hideo Aoki
     
  • The previous move of the the UDP inDatagrams counter caused the
    counting of encapsulated packets, SUNRPC data (as opposed to call)
    packets and RXRPC packets to go missing.

    This patch restores all of these.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

08 Jun, 2007

1 commit

  • This reverts changesets:

    6aaf47fa48d3c44280810b1b470261d340e4ed87
    b7b5f487ab39bc10ed0694af35651a03d9cb97ff
    de34ed91c4ffa4727964a832c46e624dd1495cf5
    fc038410b4b1643766f8033f4940bcdb1dace633

    There are still some correctness issues recently
    discovered which do not have a known fix that doesn't
    involve doing a full hash table scan on port bind.

    So revert for now.

    Signed-off-by: David S. Miller

    David S. Miller
     

11 May, 2007

1 commit


26 Apr, 2007

3 commits

  • When a transmitted packet is looped back directly, CHECKSUM_PARTIAL
    maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should
    treat it as such in the stack.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • For the places where we need a pointer to the transport header, it is
    still legal to touch skb->h.raw directly if just adding to,
    subtracting from or setting it to another layer header.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This patch eliminates some duplicate code for the verification of
    receive checksums between UDP-Lite and UDP. It does this by
    introducing __skb_checksum_complete_head which is identical to
    __skb_checksum_complete_head apart from the fact that it takes
    a length parameter rather than computing the first skb->len bytes.

    As a result UDP-Lite will be able to use hardware checksum offload
    for packets which do not use partial coverage checksums. It also
    means that UDP-Lite loopback no longer does unnecessary checksum
    verification.

    If any NICs start support UDP-Lite this would also start working
    automatically.

    This patch removes the assumption that msg_flags has MSG_TRUNC clear
    upon entry in recvmsg.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

04 Dec, 2006

1 commit


03 Dec, 2006

2 commits

  • This patch consolidates set/getsockopt code between UDP(-Lite) v4 and 6. The
    justification is that UDP(-Lite) is a transport-layer protocol and therefore
    the socket option code (at least in theory) should be AF-independent.

    Furthermore, there is the following code reduplication:
    * do_udp{,v6}_getsockopt is 100% identical between v4 and v6
    * do_udp{,v6}_setsockopt is identical up to the following differerence
    --v4 in contrast to v4 additionally allows the experimental encapsulation
    types UDP_ENCAP_ESPINUDP and UDP_ENCAP_ESPINUDP_NON_IKE
    --the remainder is identical between v4 and v6
    I believe that this difference is of little relevance.

    The advantages in not duplicating twice almost completely identical code.

    The patch further simplifies the interface of udp{,v6}_push_pending_frames,
    since for the second argument (struct udp_sock *up) it always holds that
    up = udp_sk(sk); where sk is the first function argument.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • Signed-off-by: Al Viro
    Signed-off-by: David S. Miller

    Al Viro