10 Dec, 2014

1 commit


09 Dec, 2014

1 commit

  • The compute_score functions are a bit difficult to read.

    Neaten them a bit to reduce object sizes and make them a
    bit more intelligible.

    Return early to avoid indentation and avoid unnecessary
    initializations.

    (allyesconfig, but w/ -O2 and no profiling)

    $ size net/ipv[46]/udp.o.*
    text data bss dec hex filename
    28680 1184 25 29889 74c1 net/ipv4/udp.o.new
    28756 1184 25 29965 750d net/ipv4/udp.o.old
    17600 1010 2 18612 48b4 net/ipv6/udp.o.new
    17632 1010 2 18644 48d4 net/ipv6/udp.o.old

    Signed-off-by: Joe Perches
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Joe Perches
     

24 Nov, 2014

1 commit


13 Nov, 2014

1 commit

  • Standardize function pointer uses.

    Convert calling style from:
    (*foo)(args...);
    to:
    foo(args...);

    Other miscellanea:

    o Add braces around loops with single ifs on multiple lines
    o Realign arguments around these functions
    o Invert logic in if to return immediately.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

12 Nov, 2014

2 commits

  • Use the more common dynamic_debug capable net_dbg_ratelimited
    and remove the LIMIT_NETDEBUG macro.

    All messages are still ratelimited.

    Some KERN_ uses are changed to KERN_DEBUG.

    This may have some negative impact on messages that were
    emitted at KERN_INFO that are not not enabled at all unless
    DEBUG is defined or dynamic_debug is enabled. Even so,
    these messages are now _not_ emitted by default.

    This also eliminates the use of the net_msg_warn sysctl
    "/proc/sys/net/core/warnings". For backward compatibility,
    the sysctl is not removed, but it has no function. The extern
    declaration of net_msg_warn is removed from sock.h and made
    static in net/core/sysctl_net_core.c

    Miscellanea:

    o Update the sysctl documentation
    o Remove the embedded uses of pr_fmt
    o Coalesce format fragments
    o Realign arguments

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • Alternative to RPS/RFS is to use hardware support for multiple
    queues.

    Then split a set of million of sockets into worker threads, each
    one using epoll() to manage events on its own socket pool.

    Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
    know after accept() or connect() on which queue/cpu a socket is managed.

    We normally use one cpu per RX queue (IRQ smp_affinity being properly
    set), so remembering on socket structure which cpu delivered last packet
    is enough to solve the problem.

    After accept(), connect(), or even file descriptor passing around
    processes, applications can use :

    int cpu;
    socklen_t len = sizeof(cpu);

    getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);

    And use this information to put the socket into the right silo
    for optimal performance, as all networking stack should run
    on the appropriate cpu, without need to send IPI (RPS/RFS).

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Nov, 2014

1 commit

  • As NIC multicast filtering isn't perfect, and some platforms are
    quite content to spew broadcasts, we should not trigger an event
    for skb:kfree_skb when we do not have a match for such an incoming
    datagram. We do though want to avoid sweeping the matter under the
    rug entirely, so increment a suitable statistic.

    This incorporates feedback from David L. Stevens, Karl Neiss and Eric
    Dumazet.

    V3 - use bool per David Miller

    Signed-off-by: Rick Jones
    Signed-off-by: David S. Miller

    Rick Jones
     

06 Nov, 2014

1 commit

  • This encapsulates all of the skb_copy_datagram_iovec() callers
    with call argument signature "skb, offset, msghdr->msg_iov, length".

    When we move to iov_iters in the networking, the iov_iter object will
    sit in the msghdr.

    Having a helper like this means there will be less places to touch
    during that transformation.

    Based upon descriptions and patch from Al Viro.

    Signed-off-by: David S. Miller

    David S. Miller
     

05 Nov, 2014

2 commits


06 Sep, 2014

1 commit


02 Sep, 2014

1 commit

  • Add support for doing CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE
    conversion in UDP tunneling path.

    In the normal UDP path, we call skb_checksum_try_convert after locating
    the UDP socket. The check is that checksum conversion is enabled for
    the socket (new flag in UDP socket) and that checksum field is
    non-zero.

    In the UDP GRO path, we call skb_gro_checksum_try_convert after
    checksum is validated and checksum field is non-zero. Since this is
    already in GRO we assume that checksum conversion is always wanted.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

25 Aug, 2014

1 commit

  • Implement GRO for UDPv6. Add UDP checksum verification in gro_receive
    for both UDP4 and UDP6 calling skb_gro_checksum_validate_zero_check.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

24 Aug, 2014

1 commit


24 Jul, 2014

1 commit


17 Jul, 2014

3 commits


15 Jul, 2014

1 commit


12 Jul, 2014

1 commit


27 Jun, 2014

1 commit


14 Jun, 2014

1 commit

  • Its too easy to add thousand of UDP sockets on a particular bucket,
    and slow down an innocent multicast receiver.

    Early demux is supposed to be an optimization, we should avoid spending
    too much time in it.

    It is interesting to note __udp4_lib_demux_lookup() only tries to
    match first socket in the chain.

    10 is the threshold we already have in __udp4_lib_lookup() to switch
    to secondary hash.

    Fixes: 421b3885bf6d5 ("udp: ipv4: Add udp early demux")
    Signed-off-by: Eric Dumazet
    Reported-by: David Held
    Cc: Shawn Bohrer
    Signed-off-by: David S. Miller

    Eric Dumazet
     

05 Jun, 2014

3 commits


24 May, 2014

2 commits

  • RFC 6935 permits zero checksums to be used in IPv6 however this is
    recommended only for certain tunnel protocols, it does not make
    checksums completely optional like they are in IPv4.

    This patch restricts the use of IPv6 zero checksums that was previously
    intoduced. no_check6_tx and no_check6_rx have been added to control
    the use of checksums in UDP6 RX and TX path. The normal
    sk_no_check_{rx,tx} settings are not used (this avoids ambiguity when
    dealing with a dual stack socket).

    A helper function has been added (udp_set_no_check6) which can be
    called by tunnel impelmentations to all zero checksums (send on the
    socket, and accept them as valid).

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • Define separate fields in the sock structure for configuring disabling
    checksums in both TX and RX-- sk_no_check_tx and sk_no_check_rx.
    The SO_NO_CHECK socket option only affects sk_no_check_tx. Also,
    removed UDP_CSUM_* defines since they are no longer necessary.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

15 May, 2014

2 commits

  • Missing a colon on definition use is a bit odd so
    change the macro for the 32 bit case to declare an
    __attribute__((unused)) and __deprecated variable.

    The __deprecated attribute will cause gcc to emit
    an error if the variable is actually used.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • ip_local_port_range is already per netns, so should ip_local_reserved_ports
    be. And since it is none by default we don't actually need it when we don't
    enable CONFIG_SYSCTL.

    By the way, rename inet_is_reserved_local_port() to inet_is_local_reserved_port()

    Cc: "David S. Miller"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

09 May, 2014

1 commit


06 May, 2014

1 commit


20 Feb, 2014

1 commit

  • In case we decide in udp6_sendmsg to send the packet down the ipv4
    udp_sendmsg path because the destination is either of family AF_INET or
    the destination is an ipv4 mapped ipv6 address, we don't honor the
    maybe specified ipv4 mapped ipv6 address in IPV6_PKTINFO.

    We simply can check for this option in ip_cmsg_send because no calls to
    ipv6 module functions are needed to do so.

    Reported-by: Gert Doering
    Cc: Tore Anderson
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

19 Jan, 2014

1 commit

  • This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
    handler msg_name and msg_namelen logic").

    DECLARE_SOCKADDR validates that the structure we use for writing the
    name information to is not larger than the buffer which is reserved
    for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
    consistently in sendmsg code paths.

    Signed-off-by: Steffen Hurrle
    Suggested-by: Hannes Frederic Sowa
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Steffen Hurrle
     

15 Jan, 2014

1 commit


07 Jan, 2014

1 commit


03 Jan, 2014

1 commit

  • VM to VM GSO traffic is broken if it goes through VXLAN or GRE
    tunnel and the physical NIC on the host supports hardware VXLAN/GRE
    GSO offload (e.g. bnx2x and next-gen mlx4).

    Two issues -
    (VXLAN) VM traffic has SKB_GSO_DODGY and SKB_GSO_UDP_TUNNEL with
    SKB_GSO_TCP/UDP set depending on the inner protocol. GSO header
    integrity check fails in udp4_ufo_fragment if inner protocol is
    TCP. Also gso_segs is calculated incorrectly using skb->len that
    includes tunnel header. Fix: robust check should only be applied
    to the inner packet.

    (VXLAN & GRE) Once GSO header integrity check passes, NULL segs
    is returned and the original skb is sent to hardware. However the
    tunnel header is already pulled. Fix: tunnel header needs to be
    restored so that hardware can perform GSO properly on the original
    packet.

    Signed-off-by: Wei-Chun Chao
    Signed-off-by: David S. Miller

    Wei-Chun Chao
     

20 Dec, 2013

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2013-12-19

    1) Use the user supplied policy index instead of a generated one
    if present. From Fan Du.

    2) Make xfrm migration namespace aware. From Fan Du.

    3) Make the xfrm state and policy locks namespace aware. From Fan Du.

    4) Remove ancient sleeping when the SA is in acquire state,
    we now queue packets to the policy instead. This replaces the
    sleeping code.

    5) Remove FLOWI_FLAG_CAN_SLEEP. This was used to notify xfrm about the
    posibility to sleep. The sleeping code is gone, so remove it.

    6) Check user specified spi for IPComp. Thr spi for IPcomp is only
    16 bit wide, so check for a valid value. From Fan Du.

    7) Export verify_userspi_info to check for valid user supplied spi ranges
    with pfkey and netlink. From Fan Du.

    8) RFC3173 states that if the total size of a compressed payload and the IPComp
    header is not smaller than the size of the original payload, the IP datagram
    must be sent in the original non-compressed form. These packets are dropped
    by the inbound policy check because they are not transformed. Document the need
    to set 'level use' for IPcomp to receive such packets anyway. From Fan Du.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Dec, 2013

1 commit

  • Using sk_dst_lock from softirq context is not supported right now.

    Instead of adding BH protection everywhere,
    udp_sk_rx_dst_set() can instead use xchg(), as suggested
    by David.

    Reported-by: Fengguang Wu
    Fixes: 975022310233 ("udp: ipv4: must add synchronization in udp_sk_rx_dst_set()")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Dec, 2013

2 commits

  • Unlike TCP, UDP input path does not hold the socket lock.

    Before messing with sk->sk_rx_dst, we must use a spinlock, otherwise
    multiple cpus could leak a refcount.

    This patch also takes care of renewing a stale dst entry.
    (When the sk->sk_rx_dst would not be used by IP early demux)

    Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux")
    Signed-off-by: Eric Dumazet
    Cc: Shawn Bohrer
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • pskb_may_pull() can reallocate skb->head, we need to move the
    initialization of iph and uh pointers after its call.

    Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux")
    Signed-off-by: Eric Dumazet
    Cc: Shawn Bohrer
    Signed-off-by: David S. Miller

    Eric Dumazet