27 Jun, 2014

1 commit

  • [ Upstream commit 63c6f81cdde58c41da62a8d8a209592e42a0203e ]

    Its too easy to add thousand of UDP sockets on a particular bucket,
    and slow down an innocent multicast receiver.

    Early demux is supposed to be an optimization, we should avoid spending
    too much time in it.

    It is interesting to note __udp4_lib_demux_lookup() only tries to
    match first socket in the chain.

    10 is the threshold we already have in __udp4_lib_lookup() to switch
    to secondary hash.

    Fixes: 421b3885bf6d5 ("udp: ipv4: Add udp early demux")
    Signed-off-by: Eric Dumazet
    Reported-by: David Held
    Cc: Shawn Bohrer
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

19 Jan, 2014

1 commit

  • This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
    handler msg_name and msg_namelen logic").

    DECLARE_SOCKADDR validates that the structure we use for writing the
    name information to is not larger than the buffer which is reserved
    for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
    consistently in sendmsg code paths.

    Signed-off-by: Steffen Hurrle
    Suggested-by: Hannes Frederic Sowa
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Steffen Hurrle
     

15 Jan, 2014

1 commit


07 Jan, 2014

1 commit


03 Jan, 2014

1 commit

  • VM to VM GSO traffic is broken if it goes through VXLAN or GRE
    tunnel and the physical NIC on the host supports hardware VXLAN/GRE
    GSO offload (e.g. bnx2x and next-gen mlx4).

    Two issues -
    (VXLAN) VM traffic has SKB_GSO_DODGY and SKB_GSO_UDP_TUNNEL with
    SKB_GSO_TCP/UDP set depending on the inner protocol. GSO header
    integrity check fails in udp4_ufo_fragment if inner protocol is
    TCP. Also gso_segs is calculated incorrectly using skb->len that
    includes tunnel header. Fix: robust check should only be applied
    to the inner packet.

    (VXLAN & GRE) Once GSO header integrity check passes, NULL segs
    is returned and the original skb is sent to hardware. However the
    tunnel header is already pulled. Fix: tunnel header needs to be
    restored so that hardware can perform GSO properly on the original
    packet.

    Signed-off-by: Wei-Chun Chao
    Signed-off-by: David S. Miller

    Wei-Chun Chao
     

20 Dec, 2013

1 commit

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2013-12-19

    1) Use the user supplied policy index instead of a generated one
    if present. From Fan Du.

    2) Make xfrm migration namespace aware. From Fan Du.

    3) Make the xfrm state and policy locks namespace aware. From Fan Du.

    4) Remove ancient sleeping when the SA is in acquire state,
    we now queue packets to the policy instead. This replaces the
    sleeping code.

    5) Remove FLOWI_FLAG_CAN_SLEEP. This was used to notify xfrm about the
    posibility to sleep. The sleeping code is gone, so remove it.

    6) Check user specified spi for IPComp. Thr spi for IPcomp is only
    16 bit wide, so check for a valid value. From Fan Du.

    7) Export verify_userspi_info to check for valid user supplied spi ranges
    with pfkey and netlink. From Fan Du.

    8) RFC3173 states that if the total size of a compressed payload and the IPComp
    header is not smaller than the size of the original payload, the IP datagram
    must be sent in the original non-compressed form. These packets are dropped
    by the inbound policy check because they are not transformed. Document the need
    to set 'level use' for IPcomp to receive such packets anyway. From Fan Du.

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Dec, 2013

1 commit

  • Using sk_dst_lock from softirq context is not supported right now.

    Instead of adding BH protection everywhere,
    udp_sk_rx_dst_set() can instead use xchg(), as suggested
    by David.

    Reported-by: Fengguang Wu
    Fixes: 975022310233 ("udp: ipv4: must add synchronization in udp_sk_rx_dst_set()")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Dec, 2013

2 commits

  • Unlike TCP, UDP input path does not hold the socket lock.

    Before messing with sk->sk_rx_dst, we must use a spinlock, otherwise
    multiple cpus could leak a refcount.

    This patch also takes care of renewing a stale dst entry.
    (When the sk->sk_rx_dst would not be used by IP early demux)

    Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux")
    Signed-off-by: Eric Dumazet
    Cc: Shawn Bohrer
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • pskb_may_pull() can reallocate skb->head, we need to move the
    initialization of iph and uh pointers after its call.

    Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux")
    Signed-off-by: Eric Dumazet
    Cc: Shawn Bohrer
    Signed-off-by: David S. Miller

    Eric Dumazet
     

11 Dec, 2013

1 commit

  • Dave Jones reported a use after free in UDP stack :

    [ 5059.434216] =========================
    [ 5059.434314] [ BUG: held lock freed! ]
    [ 5059.434420] 3.13.0-rc3+ #9 Not tainted
    [ 5059.434520] -------------------------
    [ 5059.434620] named/863 is freeing memory ffff88005e960000-ffff88005e96061f, with a lock still held there!
    [ 5059.434815] (slock-AF_INET){+.-...}, at: [] udp_queue_rcv_skb+0xd1/0x4b0
    [ 5059.435012] 3 locks held by named/863:
    [ 5059.435086] #0: (rcu_read_lock){.+.+..}, at: [] __netif_receive_skb_core+0x11d/0x940
    [ 5059.435295] #1: (rcu_read_lock){.+.+..}, at: [] ip_local_deliver_finish+0x3e/0x410
    [ 5059.435500] #2: (slock-AF_INET){+.-...}, at: [] udp_queue_rcv_skb+0xd1/0x4b0
    [ 5059.435734]
    stack backtrace:
    [ 5059.435858] CPU: 0 PID: 863 Comm: named Not tainted 3.13.0-rc3+ #9 [loadavg: 0.21 0.06 0.06 1/115 1365]
    [ 5059.436052] Hardware name: /D510MO, BIOS MOPNV10J.86A.0175.2010.0308.0620 03/08/2010
    [ 5059.436223] 0000000000000002 ffff88007e203ad8 ffffffff8153a372 ffff8800677130e0
    [ 5059.436390] ffff88007e203b10 ffffffff8108cafa ffff88005e960000 ffff88007b00cfc0
    [ 5059.436554] ffffea00017a5800 ffffffff8141c490 0000000000000246 ffff88007e203b48
    [ 5059.436718] Call Trace:
    [ 5059.436769] [] dump_stack+0x4d/0x66
    [ 5059.436904] [] debug_check_no_locks_freed+0x15a/0x160
    [ 5059.437037] [] ? __sk_free+0x110/0x230
    [ 5059.437147] [] kmem_cache_free+0x6a/0x150
    [ 5059.437260] [] __sk_free+0x110/0x230
    [ 5059.437364] [] sk_free+0x19/0x20
    [ 5059.437463] [] sock_edemux+0x25/0x40
    [ 5059.437567] [] sock_queue_rcv_skb+0x81/0x280
    [ 5059.437685] [] ? udp_queue_rcv_skb+0xd1/0x4b0
    [ 5059.437805] [] __udp_queue_rcv_skb+0x42/0x240
    [ 5059.437925] [] ? _raw_spin_lock+0x65/0x70
    [ 5059.438038] [] udp_queue_rcv_skb+0x26b/0x4b0
    [ 5059.438155] [] __udp4_lib_rcv+0x152/0xb00
    [ 5059.438269] [] udp_rcv+0x15/0x20
    [ 5059.438367] [] ip_local_deliver_finish+0x10f/0x410
    [ 5059.438492] [] ? ip_local_deliver_finish+0x3e/0x410
    [ 5059.438621] [] ip_local_deliver+0x43/0x80
    [ 5059.438733] [] ip_rcv_finish+0x140/0x5a0
    [ 5059.438843] [] ip_rcv+0x296/0x3f0
    [ 5059.438945] [] __netif_receive_skb_core+0x742/0x940
    [ 5059.439074] [] ? __netif_receive_skb_core+0x11d/0x940
    [ 5059.442231] [] ? trace_hardirqs_on+0xd/0x10
    [ 5059.442231] [] __netif_receive_skb+0x13/0x60
    [ 5059.442231] [] netif_receive_skb+0x1e/0x1f0
    [ 5059.442231] [] napi_gro_receive+0x70/0xa0
    [ 5059.442231] [] rtl8169_poll+0x166/0x700 [r8169]
    [ 5059.442231] [] net_rx_action+0x129/0x1e0
    [ 5059.442231] [] __do_softirq+0xed/0x240
    [ 5059.442231] [] irq_exit+0x125/0x140
    [ 5059.442231] [] do_IRQ+0x51/0xc0
    [ 5059.442231] [] common_interrupt+0x6f/0x6f

    We need to keep a reference on the socket, by using skb_steal_sock()
    at the right place.

    Note that another patch is needed to fix a race in
    udp_sk_rx_dst_set(), as we hold no lock protecting the dst.

    Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux")
    Reported-by: Dave Jones
    Signed-off-by: Eric Dumazet
    Cc: Shawn Bohrer
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Dec, 2013

1 commit


30 Nov, 2013

2 commits

  • In commit c9e9042994d3 ("ipv4: fix possible seqlock deadlock") I left
    another places where IP_INC_STATS_BH() were improperly used.

    udp_sendmsg(), ping_v4_sendmsg() and tcp_v4_connect() are called from
    process context, not from softirq context.

    This was detected by lockdep seqlock support.

    Reported-by: jongman heo
    Fixes: 584bdf8cbdf6 ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP")
    Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
    Signed-off-by: Eric Dumazet
    Cc: Hannes Frederic Sowa
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Commit 35f9c09fe (tcp: tcp_sendpages() should call tcp_push() once)
    added an internal flag MSG_SENDPAGE_NOTLAST, similar to
    MSG_MORE.

    algif_hash, algif_skcipher, and udp used MSG_MORE from tcp_sendpages()
    and need to see the new flag as identical to MSG_MORE.

    This fixes sendfile() on AF_ALG.

    v3: also fix udp

    Cc: Tom Herbert
    Cc: Eric Dumazet
    Cc: David S. Miller
    Cc: # 3.4.x + 3.2.x
    Reported-and-tested-by: Shawn Landden
    Original-patch: Richard Weinberger
    Signed-off-by: Shawn Landden
    Signed-off-by: David S. Miller

    Shawn Landden
     

24 Nov, 2013

1 commit

  • Commit bceaa90240b6019ed73b49965eac7d167610be69 ("inet: prevent leakage
    of uninitialized memory to user in recv syscalls") conditionally updated
    addr_len if the msg_name is written to. The recv_error and rxpmtu
    functions relied on the recvmsg functions to set up addr_len before.

    As this does not happen any more we have to pass addr_len to those
    functions as well and set it to the size of the corresponding sockaddr
    length.

    This broke traceroute and such.

    Fixes: bceaa90240b6 ("inet: prevent leakage of uninitialized memory to user in recv syscalls")
    Reported-by: Brad Spengler
    Reported-by: Tom Labanowski
    Cc: mpb
    Cc: David S. Miller
    Cc: Eric Dumazet
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

20 Nov, 2013

1 commit

  • Pull networking fixes from David Miller:
    "Mostly these are fixes for fallout due to merge window changes, as
    well as cures for problems that have been with us for a much longer
    period of time"

    1) Johannes Berg noticed two major deficiencies in our genetlink
    registration. Some genetlink protocols we passing in constant
    counts for their ops array rather than something like
    ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were
    using fixed IDs for their multicast groups.

    We have to retain these fixed IDs to keep existing userland tools
    working, but reserve them so that other multicast groups used by
    other protocols can not possibly conflict.

    In dealing with these two problems, we actually now use less state
    management for genetlink operations and multicast groups.

    2) When configuring interface hardware timestamping, fix several
    drivers that simply do not validate that the hwtstamp_config value
    is one the driver actually supports. From Ben Hutchings.

    3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.

    4) In dev_forward_skb(), set the skb->protocol in the right order
    relative to skb_scrub_packet(). From Alexei Starovoitov.

    5) Bridge erroneously fails to use the proper wrapper functions to make
    calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki
    Makita.

    6) When detaching a bridge port, make sure to flush all VLAN IDs to
    prevent them from leaking, also from Toshiaki Makita.

    7) Put in a compromise for TCP Small Queues so that deep queued devices
    that delay TX reclaim non-trivially don't have such a performance
    decrease. One particularly problematic area is 802.11 AMPDU in
    wireless. From Eric Dumazet.

    8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
    here. Fix from Eric Dumzaet, reported by Dave Jones.

    9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.

    10) When computing mergeable buffer sizes, virtio-net fails to take the
    virtio-net header into account. From Michael Dalton.

    11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic
    bumping, this one has been with us for a while. From Eric Dumazet.

    12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
    Hugne.

    13) 6lowpan bit used for traffic classification was wrong, from Jukka
    Rissanen.

    14) macvlan has the same issue as normal vlans did wrt. propagating LRO
    disabling down to the real device, fix it the same way. From Michal
    Kubecek.

    15) CPSW driver needs to soft reset all slaves during suspend, from
    Daniel Mack.

    16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.

    17) The xen-netfront RX buffer refill timer isn't properly scheduled on
    partial RX allocation success, from Ma JieYue.

    18) When ipv6 ping protocol support was added, the AF_INET6 protocol
    initialization cleanup path on failure was borked a little. Fix
    from Vlad Yasevich.

    19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
    blocks we can do the wrong thing with the msg_name we write back to
    userspace. From Hannes Frederic Sowa. There is another fix in the
    works from Hannes which will prevent future problems of this nature.

    20) Fix route leak in VTI tunnel transmit, from Fan Du.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
    genetlink: make multicast groups const, prevent abuse
    genetlink: pass family to functions using groups
    genetlink: add and use genl_set_err()
    genetlink: remove family pointer from genl_multicast_group
    genetlink: remove genl_unregister_mc_group()
    hsr: don't call genl_unregister_mc_group()
    quota/genetlink: use proper genetlink multicast APIs
    drop_monitor/genetlink: use proper genetlink multicast APIs
    genetlink: only pass array to genl_register_family_with_ops()
    tcp: don't update snd_nxt, when a socket is switched from repair mode
    atm: idt77252: fix dev refcnt leak
    xfrm: Release dst if this dst is improper for vti tunnel
    netlink: fix documentation typo in netlink_set_err()
    be2net: Delete secondary unicast MAC addresses during be_close
    be2net: Fix unconditional enabling of Rx interface options
    net, virtio_net: replace the magic value
    ping: prevent NULL pointer dereference on write to msg_name
    bnx2x: Prevent "timeout waiting for state X"
    bnx2x: prevent CFC attention
    bnx2x: Prevent panic during DMAE timeout
    ...

    Linus Torvalds
     

19 Nov, 2013

1 commit

  • Only update *addr_len when we actually fill in sockaddr, otherwise we
    can return uninitialized memory from the stack to the caller in the
    recvfrom, recvmmsg and recvmsg syscalls. Drop the the (addr_len == NULL)
    checks because we only get called with a valid addr_len pointer either
    from sock_common_recvmsg or inet_recvmsg.

    If a blocking read waits on a socket which is concurrently shut down we
    now return zero and set msg_msgnamelen to 0.

    Reported-by: mpb
    Suggested-by: Eric Dumazet
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

15 Nov, 2013

1 commit


20 Oct, 2013

2 commits


09 Oct, 2013

4 commits

  • At this point sk might contain garbage.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The since the removal of the routing cache computing
    fib_compute_spec_dst() does a fib_table lookup for each UDP multicast
    packet received. This has introduced a performance regression for some
    UDP workloads.

    This change skips populating the packet info for sockets that do not have
    IP_PKTINFO set.

    Benchmark results from a netperf UDP_RR test:
    Before 89789.68 transactions/s
    After 90587.62 transactions/s

    Benchmark results from a fio 1 byte UDP multicast pingpong test
    (Multicast one way unicast response):
    Before 12.63us RTT
    After 12.48us RTT

    Signed-off-by: Shawn Bohrer
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Shawn Bohrer
     
  • The removal of the routing cache introduced a performance regression for
    some UDP workloads since a dst lookup must be done for each packet.
    This change caches the dst per socket in a similar manner to what we do
    for TCP by implementing early_demux.

    For UDP multicast we can only cache the dst if there is only one
    receiving socket on the host. Since caching only works when there is
    one receiving socket we do the multicast socket lookup using RCU.

    For UDP unicast we only demux sockets with an exact match in order to
    not break forwarding setups. Additionally since the hash chains may be
    long we only check the first socket to see if it is a match and not
    waste extra time searching the whole chain when we might not find an
    exact match.

    Benchmark results from a netperf UDP_RR test:
    Before 87961.22 transactions/s
    After 89789.68 transactions/s

    Benchmark results from a fio 1 byte UDP multicast pingpong test
    (Multicast one way unicast response):
    Before 12.97us RTT
    After 12.63us RTT

    Signed-off-by: Shawn Bohrer
    Signed-off-by: David S. Miller

    Shawn Bohrer
     
  • UDP sockets can receive packets from multiple endpoints and thus may be
    received on multiple receive queues. Since packets packets can arrive
    on multiple receive queues we should not mark the napi_id for all
    packets. This makes busy read/poll only work for connected UDP sockets.

    This additionally enables busy read/poll for UDP multicast packets as
    long as the socket is connected by moving the check into
    __udp_queue_rcv_skb().

    Signed-off-by: Shawn Bohrer
    Suggested-by: Eric Dumazet
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Shawn Bohrer
     

02 Oct, 2013

1 commit

  • Conflicts:
    drivers/net/ethernet/emulex/benet/be.h
    drivers/net/usb/qmi_wwan.c
    drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h
    include/net/netfilter/nf_conntrack_synproxy.h
    include/net/secure_seq.h

    The conflicts are of two varieties:

    1) Conflicts with Joe Perches's 'extern' removal from header file
    function declarations. Usually it's an argument signature change
    or a function being added/removed. The resolutions are trivial.

    2) Some overlapping changes in qmi_wwan.c and be.h, one commit adds
    a new value, another changes an existing value. That sort of
    thing.

    Signed-off-by: David S. Miller

    David S. Miller
     

01 Oct, 2013

1 commit

  • - Move sysctl_local_ports from a global variable into struct netns_ipv4.
    - Modify inet_get_local_port_range to take a struct net, and update all
    of the callers.
    - Move the initialization of sysctl_local_ports into
    sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c

    v2:
    - Ensure indentation used tabs
    - Fixed ip.h so it applies cleanly to todays net-next

    v3:
    - Compile fixes of strange callers of inet_get_local_port_range.
    This patch now successfully passes an allmodconfig build.
    Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range

    Originally-by: Samya
    Acked-by: Nicolas Dichtel
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

29 Sep, 2013

1 commit

  • If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out
    packets with the specified TTL or TOS overriding the socket values specified
    with the traditional setsockopt().

    The struct inet_cork stores the values of TOS, TTL and priority that are
    passed through the struct ipcm_cookie. If there are user-specified TOS
    (tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are
    used to override the per-socket values. In case of TOS also the priority
    is changed accordingly.

    Two helper functions get_rttos and get_rtconn_flags are defined to take
    into account the presence of a user specified TOS value when computing
    RT_TOS and RT_CONN_FLAGS.

    Signed-off-by: Francesco Fusco
    Signed-off-by: David S. Miller

    Francesco Fusco
     

24 Sep, 2013

1 commit


01 Sep, 2013

1 commit


16 Aug, 2013

1 commit


28 Jul, 2013

1 commit

  • UDP checksums are optional, hence pktgen has been omitting them in
    favour of performance. The optional flag UDPCSUM enables UDP
    checksumming. If the output device supports hardware checksumming
    the skb is prepared and marked CHECKSUM_PARTIAL, otherwise the
    checksum is generated in software.

    Signed-off-by: Thomas Graf
    Cc: Eric Dumazet
    Cc: Ben Greear
    Signed-off-by: David S. Miller

    Thomas Graf
     

12 Jul, 2013

1 commit

  • This change makes it so that the GRE and VXLAN tunnels can make use of Tx
    checksum offload support provided by some drivers via the hw_enc_features.
    Without this fix enabling GSO means sacrificing Tx checksum offload and
    this actually leads to a performance regression as shown below:

    Utilization
    Send
    Throughput local GSO
    10^6bits/s % S state
    6276.51 8.39 enabled
    7123.52 8.42 disabled

    To resolve this it was necessary to address two items. First
    netif_skb_features needed to be updated so that it would correctly handle
    the Trans Ether Bridging protocol without impacting the need to check for
    Q-in-Q tagging. To do this it was necessary to update harmonize_features
    so that it used skb_network_protocol instead of just using the outer
    protocol.

    Second it was necessary to update the GRE and UDP tunnel segmentation
    offloads so that they would reset the encapsulation bit and inner header
    offsets after the offload was complete.

    As a result of this change I have seen the following results on a interface
    with Tx checksum enabled for encapsulated frames:

    Utilization
    Send
    Throughput local GSO
    10^6bits/s % S state
    7123.52 8.42 disabled
    8321.75 5.43 enabled

    v2: Instead of replacing refrence to skb->protocol with
    skb_network_protocol just replace the protocol reference in
    harmonize_features to allow for double VLAN tag checks.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

11 Jul, 2013

2 commits


03 Jul, 2013

1 commit

  • We accidentally call down to ip6_push_pending_frames when uncorking
    pending AF_INET data on a ipv6 socket. This results in the following
    splat (from Dave Jones):

    skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:
    ------------[ cut here ]------------
    kernel BUG at net/core/skbuff.c:126!
    invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
    +netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
    CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37
    task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
    RIP: 0010:[] [] skb_panic+0x63/0x65
    RSP: 0018:ffff8801e6431de8 EFLAGS: 00010282
    RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
    RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
    RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
    R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
    FS: 00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
    Stack:
    ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
    ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
    ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
    Call Trace:
    [] skb_push+0x3a/0x40
    [] ip6_push_pending_frames+0x1f6/0x4d0
    [] ? mark_held_locks+0xbb/0x140
    [] udp_v6_push_pending_frames+0x2b9/0x3d0
    [] ? udplite_getfrag+0x20/0x20
    [] udp_lib_setsockopt+0x1aa/0x1f0
    [] ? fget_light+0x387/0x4f0
    [] udpv6_setsockopt+0x34/0x40
    [] sock_common_setsockopt+0x14/0x20
    [] SyS_setsockopt+0x71/0xd0
    [] tracesys+0xdd/0xe2
    Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
    RIP [] skb_panic+0x63/0x65
    RSP

    This patch adds a check if the pending data is of address family AF_INET
    and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
    if that is the case.

    This bug was found by Dave Jones with trinity.

    (Also move the initialization of fl6 below the AF_INET check, even if
    not strictly necessary.)

    Cc: Dave Jones
    Cc: YOSHIFUJI Hideaki
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

13 Jun, 2013

1 commit

  • commit ba418fa357a7b3c ("soreuseport: UDP/IPv4 implementation")
    added following sparse errors :

    net/ipv4/udp.c:433:60: warning: cast from restricted __be16
    net/ipv4/udp.c:433:60: warning: incorrect type in argument 1 (different base types)
    net/ipv4/udp.c:433:60: expected unsigned short [unsigned] [usertype] val
    net/ipv4/udp.c:433:60: got restricted __be16 [usertype] sport
    net/ipv4/udp.c:433:60: warning: cast from restricted __be16
    net/ipv4/udp.c:433:60: warning: cast from restricted __be16
    net/ipv4/udp.c:514:60: warning: cast from restricted __be16
    net/ipv4/udp.c:514:60: warning: incorrect type in argument 1 (different base types)
    net/ipv4/udp.c:514:60: expected unsigned short [unsigned] [usertype] val
    net/ipv4/udp.c:514:60: got restricted __be16 [usertype] sport
    net/ipv4/udp.c:514:60: warning: cast from restricted __be16
    net/ipv4/udp.c:514:60: warning: cast from restricted __be16

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Jun, 2013

1 commit


11 Jun, 2013

1 commit

  • Add upport for busy-polling on UDP sockets.
    In __udp[46]_lib_rcv add a call to sk_mark_ll() to copy the napi_id
    from the skb into the sk.
    This is done at the earliest possible moment, right after we identify
    which socket this skb is for.
    In __skb_recv_datagram When there is no data and the user
    tries to read we busy poll.

    Signed-off-by: Alexander Duyck
    Signed-off-by: Jesse Brandeburg
    Signed-off-by: Eliezer Tamir
    Acked-by: Eric Dumazet
    Tested-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Eliezer Tamir
     

01 Jun, 2013

1 commit

  • The current state of affairs is that read()/write() will setup
    RFS (Receive Flow Steering) for internet protocol sockets while
    poll()/epoll() does not.

    When poll() gets called with a TCP or UDP socket, we should update
    the flow target.

    This permits to RFS (if enabled) to select the appropriate CPU for
    following incoming packets.

    Note: Only connected UDP sockets can benefit from RFS.

    Signed-off-by: David Majnemer
    Signed-off-by: Eric Dumazet
    Cc: Paul Turner
    Cc: Tom Herbert
    Signed-off-by: David S. Miller

    David Majnemer
     

28 May, 2013

1 commit

  • In the case where a non-MPLS packet is received and an MPLS stack is
    added it may well be the case that the original skb is GSO but the
    NIC used for transmit does not support GSO of MPLS packets.

    The aim of this code is to provide GSO in software for MPLS packets
    whose skbs are GSO.

    SKB Usage:

    When an implementation adds an MPLS stack to a non-MPLS packet it should do
    the following to skb metadata:

    * Set skb->inner_protocol to the old non-MPLS ethertype of the packet.
    skb->inner_protocol is added by this patch.

    * Set skb->protocol to the new MPLS ethertype of the packet.

    * Set skb->network_header to correspond to the
    end of the L3 header, including the MPLS label stack.

    I have posted a patch, "[PATCH v3.29] datapath: Add basic MPLS support to
    kernel" which adds MPLS support to the kernel datapath of Open vSwtich.
    That patch sets the above requirements in datapath/actions.c:push_mpls()
    and was used to exercise this code. The datapath patch is against the Open
    vSwtich tree but it is intended that it be added to the Open vSwtich code
    present in the mainline Linux kernel at some point.

    Features:

    I believe that the approach that I have taken is at least partially
    consistent with the handling of other protocols. Jesse, I understand that
    you have some ideas here. I am more than happy to change my implementation.

    This patch adds dev->mpls_features which may be used by devices
    to advertise features supported for MPLS packets.

    A new NETIF_F_MPLS_GSO feature is added for devices which support
    hardware MPLS GSO offload. Currently no devices support this
    and MPLS GSO always falls back to software.

    Alternate Implementation:

    One possible alternate implementation is to teach netif_skb_features()
    and skb_network_protocol() about MPLS, in a similar way to their
    understanding of VLANs. I believe this would avoid the need
    for net/mpls/mpls_gso.c and in particular the calls to
    __skb_push() and __skb_push() in mpls_gso_segment().

    I have decided on the implementation in this patch as it should
    not introduce any overhead in the case where mpls_gso is not compiled
    into the kernel or inserted as a module.

    MPLS GSO suggested by Jesse Gross.
    Based in part on "v4 GRE: Add TCP segmentation offload for GRE"
    by Pravin B Shelar.

    Cc: Jesse Gross
    Cc: Pravin B Shelar
    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Simon Horman
     

09 May, 2013

1 commit