13 Jul, 2013

1 commit


12 Jul, 2013

3 commits

  • This change fixes an MTU sizing issue seen with gretap tunnels when non-gso
    packets are sent from the interface.

    In my case I was able to reproduce the issue by simply sending a ping of
    1421 bytes with the gretap interface created on a device with a standard
    1500 mtu.

    This fix is based on the fact that the tunnel mtu is already adjusted by
    dev->hard_header_len so it would make sense that any packets being compared
    against that mtu should also be adjusted by hard_header_len and the tunnel
    header instead of just the tunnel header.

    Signed-off-by: Alexander Duyck
    Reported-by: Cong Wang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This change makes it so that the GRE and VXLAN tunnels can make use of Tx
    checksum offload support provided by some drivers via the hw_enc_features.
    Without this fix enabling GSO means sacrificing Tx checksum offload and
    this actually leads to a performance regression as shown below:

    Utilization
    Send
    Throughput local GSO
    10^6bits/s % S state
    6276.51 8.39 enabled
    7123.52 8.42 disabled

    To resolve this it was necessary to address two items. First
    netif_skb_features needed to be updated so that it would correctly handle
    the Trans Ether Bridging protocol without impacting the need to check for
    Q-in-Q tagging. To do this it was necessary to update harmonize_features
    so that it used skb_network_protocol instead of just using the outer
    protocol.

    Second it was necessary to update the GRE and UDP tunnel segmentation
    offloads so that they would reset the encapsulation bit and inner header
    offsets after the offload was complete.

    As a result of this change I have seen the following results on a interface
    with Tx checksum enabled for encapsulated frames:

    Utilization
    Send
    Throughput local GSO
    10^6bits/s % S state
    7123.52 8.42 disabled
    8321.75 5.43 enabled

    v2: Instead of replacing refrence to skb->protocol with
    skb_network_protocol just replace the protocol reference in
    harmonize_features to allow for double VLAN tag checks.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • Found using checkpatch.pl

    Signed-off-by: Camelia Groza
    Signed-off-by: David S. Miller

    Camelia Groza
     

11 Jul, 2013

2 commits


10 Jul, 2013

1 commit

  • Pull networking updates from David Miller:
    "This is a re-do of the net-next pull request for the current merge
    window. The only difference from the one I made the other day is that
    this has Eliezer's interface renames and the timeout handling changes
    made based upon your feedback, as well as a few bug fixes that have
    trickeled in.

    Highlights:

    1) Low latency device polling, eliminating the cost of interrupt
    handling and context switches. Allows direct polling of a network
    device from socket operations, such as recvmsg() and poll().

    Currently ixgbe, mlx4, and bnx2x support this feature.

    Full high level description, performance numbers, and design in
    commit 0a4db187a999 ("Merge branch 'll_poll'")

    From Eliezer Tamir.

    2) With the routing cache removed, ip_check_mc_rcu() gets exercised
    more than ever before in the case where we have lots of multicast
    addresses. Use a hash table instead of a simple linked list, from
    Eric Dumazet.

    3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from
    Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski,
    Marek Puzyniak, Michal Kazior, and Sujith Manoharan.

    4) Support reporting the TUN device persist flag to userspace, from
    Pavel Emelyanov.

    5) Allow controlling network device VF link state using netlink, from
    Rony Efraim.

    6) Support GRE tunneling in openvswitch, from Pravin B Shelar.

    7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from
    Daniel Borkmann and Eric Dumazet.

    8) Allow controlling of TCP quickack behavior on a per-route basis,
    from Cong Wang.

    9) Several bug fixes and improvements to vxlan from Stephen
    Hemminger, Pravin B Shelar, and Mike Rapoport. In particular,
    support receiving on multiple UDP ports.

    10) Major cleanups, particular in the area of debugging and cookie
    lifetime handline, to the SCTP protocol code. From Daniel
    Borkmann.

    11) Allow packets to cross network namespaces when traversing tunnel
    devices. From Nicolas Dichtel.

    12) Allow monitoring netlink traffic via AF_PACKET sockets, in a
    manner akin to how we monitor real network traffic via ptype_all.
    From Daniel Borkmann.

    13) Several bug fixes and improvements for the new alx device driver,
    from Johannes Berg.

    14) Fix scalability issues in the netem packet scheduler's time queue,
    by using an rbtree. From Eric Dumazet.

    15) Several bug fixes in TCP loss recovery handling, from Yuchung
    Cheng.

    16) Add support for GSO segmentation of MPLS packets, from Simon
    Horman.

    17) Make network notifiers have a real data type for the opaque
    pointer that's passed into them. Use this to properly handle
    network device flag changes in arp_netdev_event(). From Jiri
    Pirko and Timo Teräs.

    18) Convert several drivers over to module_pci_driver(), from Peter
    Huewe.

    19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a
    O(1) calculation instead. From Eric Dumazet.

    20) Support setting of explicit tunnel peer addresses in ipv6, just
    like ipv4. From Nicolas Dichtel.

    21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet.

    22) Prevent a single high rate flow from overruning an individual cpu
    during RX packet processing via selective flow shedding. From
    Willem de Bruijn.

    23) Don't use spinlocks in TCP md5 signing fast paths, from Eric
    Dumazet.

    24) Don't just drop GSO packets which are above the TBF scheduler's
    burst limit, chop them up so they are in-bounds instead. Also
    from Eric Dumazet.

    25) VLAN offloads are missed when configured on top of a bridge, fix
    from Vlad Yasevich.

    26) Support IPV6 in ping sockets. From Lorenzo Colitti.

    27) Receive flow steering targets should be updated at poll() time
    too, from David Majnemer.

    28) Fix several corner case regressions in PMTU/redirect handling due
    to the routing cache removal, from Timo Teräs.

    29) We have to be mindful of ipv4 mapped ipv6 sockets in
    upd_v6_push_pending_frames(). From Hannes Frederic Sowa.

    30) Fix L2TP sequence number handling bugs, from James Chapman."

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits)
    drivers/net: caif: fix wrong rtnl_is_locked() usage
    drivers/net: enic: release rtnl_lock on error-path
    vhost-net: fix use-after-free in vhost_net_flush
    net: mv643xx_eth: do not use port number as platform device id
    net: sctp: confirm route during forward progress
    virtio_net: fix race in RX VQ processing
    virtio: support unlocked queue poll
    net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit
    Documentation: Fix references to defunct linux-net@vger.kernel.org
    net/fs: change busy poll time accounting
    net: rename low latency sockets functions to busy poll
    bridge: fix some kernel warning in multicast timer
    sfc: Fix memory leak when discarding scattered packets
    sit: fix tunnel update via netlink
    dt:net:stmmac: Add dt specific phy reset callback support.
    dt:net:stmmac: Add support to dwmac version 3.610 and 3.710
    dt:net:stmmac: Allocate platform data only if its NULL.
    net:stmmac: fix memleak in the open method
    ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
    net: ipv6: fix wrong ping_v6_sendmsg return value
    ...

    Linus Torvalds
     

09 Jul, 2013

1 commit

  • Rename functions in include/net/ll_poll.h to busy wait.
    Clarify documentation about expected power use increase.
    Rename POLL_LL to POLL_BUSY_LOOP.
    Add need_resched() testing to poll/select busy loops.

    Note, that in select and poll can_busy_poll is dynamic and is
    updated continuously to reflect the existence of supported
    sockets with valid queue information.

    Signed-off-by: Eliezer Tamir
    Signed-off-by: David S. Miller

    Eliezer Tamir
     

04 Jul, 2013

3 commits

  • The global variable num_physpages is scheduled to be removed, so use
    totalram_pages instead of num_physpages at runtime.

    Signed-off-by: Jiang Liu
    Cc: Miklos Szeredi
    Cc: "David S. Miller"
    Cc: Alexey Kuznetsov
    Cc: James Morris
    Cc: Hideaki YOSHIFUJI
    Cc: Patrick McHardy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Conflicts:
    drivers/net/ethernet/freescale/fec_main.c
    drivers/net/ethernet/renesas/sh_eth.c
    net/ipv4/gre.c

    The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list)
    and the splitting of the gre.c code into seperate files.

    The FEC conflict was two sets of changes adding ethtool support code
    in an "!CONFIG_M5272" CPP protected block.

    Finally the sh_eth.c conflict was between one commit add bits set
    in the .eesr_err_check mask whilst another commit removed the
    .tx_error_check member and assignments.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Similarly to TCP/UDP offloading, move all related GRE functions to
    gre_offload.c to make things more explicit and similar to the rest
    of the code.

    Suggested-by: Eric Dumazet
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

03 Jul, 2013

2 commits

  • In path mtu check, ip header total length works for gre device
    but not for gre-tap device. Use skb len which is consistent
    for all tunneling types. This is old bug in gre.
    This also fixes mtu calculation bug introduced by
    commit c54419321455631079c7d (GRE: Refactor GRE tunneling code).

    Reported-by: Timo Teras
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • We accidentally call down to ip6_push_pending_frames when uncorking
    pending AF_INET data on a ipv6 socket. This results in the following
    splat (from Dave Jones):

    skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:
    ------------[ cut here ]------------
    kernel BUG at net/core/skbuff.c:126!
    invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
    +netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
    CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37
    task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
    RIP: 0010:[] [] skb_panic+0x63/0x65
    RSP: 0018:ffff8801e6431de8 EFLAGS: 00010282
    RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
    RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
    RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
    R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
    FS: 00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
    Stack:
    ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
    ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
    ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
    Call Trace:
    [] skb_push+0x3a/0x40
    [] ip6_push_pending_frames+0x1f6/0x4d0
    [] ? mark_held_locks+0xbb/0x140
    [] udp_v6_push_pending_frames+0x2b9/0x3d0
    [] ? udplite_getfrag+0x20/0x20
    [] udp_lib_setsockopt+0x1aa/0x1f0
    [] ? fget_light+0x387/0x4f0
    [] udpv6_setsockopt+0x34/0x40
    [] sock_common_setsockopt+0x14/0x20
    [] SyS_setsockopt+0x71/0xd0
    [] tracesys+0xdd/0xe2
    Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
    RIP [] skb_panic+0x63/0x65
    RSP

    This patch adds a check if the pending data is of address family AF_INET
    and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
    if that is the case.

    This bug was found by Dave Jones with trinity.

    (Also move the initialization of fl6 below the AF_INET check, even if
    not strictly necessary.)

    Cc: Dave Jones
    Cc: YOSHIFUJI Hideaki
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

02 Jul, 2013

3 commits

  • This is a regression introduced by
    commit fd58156e456d9f68fe0448 (IPIP: Use ip-tunneling code.)

    Similar to GRE tunnel, previously we only check the parameters
    for SIOCADDTUNNEL and SIOCCHGTUNNEL, after that commit, the
    check is moved for all commands.

    So, just check for SIOCADDTUNNEL and SIOCCHGTUNNEL.

    Also, the check for i_key, o_key etc. is suspicious too,
    which did not exist before, reset them before passing
    to ip_tunnel_ioctl().

    Cc: Pravin B Shelar
    Cc: "David S. Miller"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • vti module allocates dev->tstats twice: in vti_fb_tunnel_init()
    and in vti_tunnel_init(), this lead to a memory leak of
    dev->tstats.

    Just remove the duplicated operations in vti_fb_tunnel_init().

    (candidate for -stable)

    Cc: Stephen Hemminger
    Cc: Saurabh Mohan
    Cc: "David S. Miller"
    Signed-off-by: Cong Wang
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Cong Wang
     
  • When testing GRE tunnel, I got:

    # ip tunnel show
    get tunnel gre0 failed: Invalid argument
    get tunnel gre1 failed: Invalid argument

    This is a regression introduced by commit c54419321455631079c7d
    ("GRE: Refactor GRE tunneling code.") because previously we
    only check the parameters for SIOCADDTUNNEL and SIOCCHGTUNNEL,
    after that commit, the check is moved for all commands.

    So, just check for SIOCADDTUNNEL and SIOCCHGTUNNEL.

    After this patch I got:

    # ip tunnel show
    gre0: gre/ip remote any local any ttl inherit nopmtudisc
    gre1: gre/ip remote 192.168.122.101 local 192.168.122.45 ttl inherit

    Cc: Pravin B Shelar
    Cc: "David S. Miller"
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

29 Jun, 2013

1 commit

  • Commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops)
    assmued that "locally destined, and routed packets, never trigger
    PMTU events or redirects that will be processed by us".

    However, it seems that tunnel devices do trigger PMTU events in certain
    cases. At least ip_gre, ip6_gre, sit, and ipip do use the inner flow's
    skb_dst(skb)->ops->update_pmtu to propage mtu information from the
    outer flows. These can cause the inner flow mtu to be decreased. If
    next hop exceptions are not consulted for pmtu, IP fragmentation will
    not be done properly for these routes.

    It also seems that we really need to have the PMTU information always
    for netfilter TCPMSS clamp-to-pmtu feature to work properly.

    So for the time being, cache separate copies of input routes for
    each next hop exception.

    Signed-off-by: Timo Teräs
    Reviewed-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Timo Teräs
     

28 Jun, 2013

2 commits

  • Since (c05cdb1 netlink: allow large data transfers from user-space),
    netlink splats if it invokes skb_clone on large netlink skbs since:

    * skb_shared_info was not correctly initialized.
    * skb->destructor is not set in the cloned skb.

    This was spotted by trinity:

    [ 894.990671] BUG: unable to handle kernel paging request at ffffc9000047b001
    [ 894.991034] IP: [] skb_clone+0x24/0xc0
    [...]
    [ 894.991034] Call Trace:
    [ 894.991034] [] nl_fib_input+0x6a/0x240
    [ 894.991034] [] ? _raw_read_unlock+0x26/0x40
    [ 894.991034] [] netlink_unicast+0x169/0x1e0
    [ 894.991034] [] netlink_sendmsg+0x251/0x3d0

    Fix it by:

    1) introducing a new netlink_skb_clone function that is used in nl_fib_input,
    that sets our special skb->destructor in the cloned skb. Moreover, handle
    the release of the large cloned skb head area in the destructor path.

    2) not allowing large skbuffs in the netlink broadcast path. I cannot find
    any reasonable use of the large data transfer using netlink in that path,
    moreover this helps to skip extra skb_clone handling.

    I found two more netlink clients that are cloning the skbs, but they are
    not in the sendmsg path. Therefore, the sole client cloning that I found
    seems to be the fib frontend.

    Thanks to Eric Dumazet for helping to address this issue.

    Reported-by: Fengguang Wu
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira
     
  • This patch allows to switch the netns when packet is encapsulated or
    decapsulated. In other word, the encapsulated packet is received in a netns,
    where the lookup is done to find the tunnel. Once the tunnel is found, the
    packet is decapsulated and injecting into the corresponding interface which
    stands to another netns.

    When one of the two netns is removed, the tunnel is destroyed.

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

27 Jun, 2013

1 commit

  • Since commit 32b8a8e59c9c "sit: add IPv4 over IPv4 support",
    tunnel->parms.iph.protocol is 0 when both 4in4 and 6in4 are setup, but
    xfrm_lookup() is called only when proto is != 0, thus we need to pass the real
    value.

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

26 Jun, 2013

1 commit

  • commit 68c331631143 ("v4 GRE: Add TCP segmentation offload for GRE")
    added a possible skb leak, because it frees only the head of segment
    list, in case a skb_linearize() call fails.

    This patch adds a kfree_skb_list() helper to fix the bug.

    Signed-off-by: Eric Dumazet
    Cc: Pravin B Shelar
    Cc: Daniel Borkmann
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Jun, 2013

1 commit

  • Pablo Neira Ayuso says:

    ====================
    The following patchset contains five fixes for Netfilter/IPVS, they are:

    * A skb leak fix in fragmentation handling in case that helpers are in place,
    it occurs since the IPV6 NAT infrastructure, from Phil Oester.

    * Fix SCTP port mangling in ICMP packets for IPVS, from Julian Anastasov.

    * Fix event delivery in ctnetlink regarding the new connlabel infrastructure,
    from Florian Westphal.

    * Fix mangling in the SIP NAT helper, from Balazs Peter Odor.

    * Fix crash in ipt_ULOG introduced while adding netnamespace support,
    from Gao Feng.

    I'll take care of passing several of these patches to -stable once they hit
    Linus' tree.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Jun, 2013

1 commit

  • The parameter of setup_timer should be &ulog->nlgroup[i].
    the incorrect parameter will cause kernel panic in
    ulog_timer.

    Bug introducted in commit 355430671ad93546b34b4e91bdf720f3a704efa4
    "netfilter: ipt_ULOG: add net namespace support for ipt_ULOG"

    ebt_ULOG doesn't have this problem.

    [ I have mangled this patch to fix nlgroup != 0 case, we were
    also crashing there --pablo ]

    Tested-by: George Spelvin
    Reported-by: Borislav Petkov
    Signed-off-by: Gao feng
    Signed-off-by: Pablo Neira Ayuso

    Gao feng
     

20 Jun, 2013

11 commits

  • This patch removes an empty ifdef from inet_frag_intern()
    in net/ipv4/inet_fragment.c.

    commit b67bfe0d42cac56c512dd5da4b1b347a23f4b70a
    (hlist: drop the node parameter from iterators) removed hlist from
    net/ipv4/inet_fragment.c, but did not remove the enclosing ifdef command,
    which is now empty.

    Signed-off-by: Rami Rosen
    Signed-off-by: David S. Miller

    Rami Rosen
     
  • In previous discussions, I tried to find some reasonable heuristics
    for delayed ACK, however this seems not possible, according to Eric:

    "ACKS might also be delayed because of bidirectional
    traffic, and is more controlled by the application
    response time. TCP stack can not easily estimate it."

    "ACK can be incredibly useful to recover from losses in
    a short time.

    The vast majority of TCP sessions are small lived, and we
    send one ACK per received segment anyway at beginning or
    retransmits to let the sender smoothly increase its cwnd,
    so an auto-tuning facility wont help them that much."

    and according to David:

    "ACKs are the only information we have to detect loss.

    And, for the same reasons that TCP VEGAS is fundamentally
    broken, we cannot measure the pipe or some other
    receiver-side-visible piece of information to determine
    when it's "safe" to stretch ACK.

    And even if it's "safe", we should not do it so that losses are
    accurately detected and we don't spuriously retransmit.

    The only way to know when the bandwidth increases is to
    "test" it, by sending more and more packets until drops happen.
    That's why all successful congestion control algorithms must
    operate on explicited tested pieces of information.

    Similarly, it's not really possible to universally know if
    it's safe to stretch ACK or not."

    It still makes sense to enable or disable quick ack mode like
    what TCP_QUICK_ACK does.

    Similar to TCP_QUICK_ACK option, but for people who can't
    modify the source code and still wants to control
    TCP delayed ACK behavior. As David suggested, this should belong
    to per-path scope, since different pathes may want different
    behaviors.

    Cc: Eric Dumazet
    Cc: Rick Jones
    Cc: Stephen Hemminger
    Cc: "David S. Miller"
    Cc: Thomas Graf
    CC: David Laight
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Signed-off-by: Weiping Pan
    Signed-off-by: David S. Miller

    Weiping Pan
     
  • MD5 key lookups on a given TCP socket were being performed
    incorrectly. This fix alters parameter inputs to the MD5
    lookup function tcp_md5_do_lookup, which is called by functions
    tcp_md5_do_add and tcp_md5_do_del. Specifically, the change now
    inputs the correct address and address family required to make
    a proper lookup.

    Signed-off-by: Aydin Arik
    Signed-off-by: David S. Miller

    Aydin Arik
     
  • Process skb tunnel header before sending packet to protocol handler.
    this allows code sharing between gre and ovs gre modules.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • Refactor various ip tunnels xmit functions and extend iptunnel_xmit()
    so that there is more code sharing.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • This is required for OVS GRE offloading.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • This is required for ovs gre module.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • Currently there is only one user is allowed to register for gre
    protocol. Following patch adds de-multiplexer. So that multiple
    modules can listen on gre protocol e.g. kernel gre devices and ovs.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • Use cmpxchg() for atomic protocol registration which saves
    code and data space.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • Conflicts:
    drivers/net/wireless/ath/ath9k/Kconfig
    drivers/net/xen-netback/netback.c
    net/batman-adv/bat_iv_ogm.c
    net/wireless/nl80211.c

    The ath9k Kconfig conflict was a change of a Kconfig option name right
    next to the deletion of another option.

    The xen-netback conflict was overlapping changes involving the
    handling of the notify list in xen_netbk_rx_action().

    Batman conflict resolution provided by Antonio Quartulli, basically
    keep everything in both conflict hunks.

    The nl80211 conflict is a little more involved. In 'net' we added a
    dynamic memory allocation to nl80211_dump_wiphy() to fix a race that
    Linus reported. Meanwhile in 'net-next' the handlers were converted
    to use pre and post doit handlers which use a flag to determine
    whether to hold the RTNL mutex around the operation.

    However, the dump handlers to not use this logic. Instead they have
    to explicitly do the locking. There were apparent bugs in the
    conversion of nl80211_dump_wiphy() in that we were not dropping the
    RTNL mutex in all the return paths, and it seems we very much should
    be doing so. So I fixed that whilst handling the overlapping changes.

    To simplify the initial returns, I take the RTNL mutex after we try
    to allocate 'tb'.

    Signed-off-by: David S. Miller

    David S. Miller
     

13 Jun, 2013

6 commits

  • If CONFIG_NET_NS is not set then __net_init is the same as __init and
    __net_exit is the same as __exit. These functions will be removed from
    memory after the module loads or is removed. Functions that are exported
    for use by other functions should never be labeled for removal.

    Bug introduced by commit c54419321455631079c
    ("GRE: Refactor GRE tunneling code.")

    Reported-by: Steinar H. Gunderson
    Signed-off-by: Steven Rostedt
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • If users apply shaper to vti tunnel then it will cause a kernel crash. The
    problem seems to be due to the vti_tunnel_xmit function not clearing
    skb->opt field before passing the packet to xfrm tunneling code.

    Signed-off-by: Saurabh Mohan
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Saurabh Mohan
     
  • Linux sends new unset data during disorder and recovery state if all
    (suspected) lost packets have been retransmitted ( RFC5681, section
    3.2 step 1 & 2, RFC3517 section 4, NexSeg() Rule 2). One requirement
    is to keep the receive window about twice the estimated sender's
    congestion window (tcp_rcv_space_adjust()), assuming the fast
    retransmits repair the losses in the next round trip.

    But currently it's not the case on the first round trip in either
    normal or Fast Open connection, beucase the initial receive window
    is identical to (expected) sender's initial congestion window. The
    fix is to double it.

    Signed-off-by: Yuchung Cheng
    Acked-by: Neal Cardwell
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Yuchung Cheng
     
  • Reduce the uses of this unnecessary typedef.

    Done via perl script:

    $ git grep --name-only -w ctl_table net | \
    xargs perl -p -i -e '\
    sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
    s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'

    Reflow the modified lines that now exceed 80 columns.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • net/ipv4/ping.c:286:5: sparse: symbol 'ping_check_bind_addr' was not declared. Should it be static?
    net/ipv4/ping.c:355:6: sparse: symbol 'ping_set_saddr' was not declared. Should it be static?
    net/ipv4/ping.c:370:6: sparse: symbol 'ping_clear_saddr' was not declared. Should it be static?

    net/ipv6/ping.c:60:5: sparse: symbol 'dummy_ipv6_recv_error' was not declared. Should it be static?
    net/ipv6/ping.c:64:5: sparse: symbol 'dummy_ip6_datagram_recv_ctl' was not declared. Should it be static?
    net/ipv6/ping.c:69:5: sparse: symbol 'dummy_icmpv6_err_convert' was not declared. Should it be static?
    net/ipv6/ping.c:73:6: sparse: symbol 'dummy_ipv6_icmp_error' was not declared. Should it be static?
    net/ipv6/ping.c:75:5: sparse: symbol 'dummy_ipv6_chk_addr' was not declared. Should it be static?
    net/ipv6/ping.c:201:5: sparse: symbol 'ping_v6_seq_show' was not declared. Should it be static?

    Signed-off-by: Fengguang Wu
    Signed-off-by: David S. Miller

    Wu Fengguang
     
  • commit ba418fa357a7b3c ("soreuseport: UDP/IPv4 implementation")
    added following sparse errors :

    net/ipv4/udp.c:433:60: warning: cast from restricted __be16
    net/ipv4/udp.c:433:60: warning: incorrect type in argument 1 (different base types)
    net/ipv4/udp.c:433:60: expected unsigned short [unsigned] [usertype] val
    net/ipv4/udp.c:433:60: got restricted __be16 [usertype] sport
    net/ipv4/udp.c:433:60: warning: cast from restricted __be16
    net/ipv4/udp.c:433:60: warning: cast from restricted __be16
    net/ipv4/udp.c:514:60: warning: cast from restricted __be16
    net/ipv4/udp.c:514:60: warning: incorrect type in argument 1 (different base types)
    net/ipv4/udp.c:514:60: expected unsigned short [unsigned] [usertype] val
    net/ipv4/udp.c:514:60: got restricted __be16 [usertype] sport
    net/ipv4/udp.c:514:60: warning: cast from restricted __be16
    net/ipv4/udp.c:514:60: warning: cast from restricted __be16

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet