14 Oct, 2013

6 commits

  • [ Upstream commit 3e08f4a72f689c6296d336c2aab4bddd60c93ae2 ]

    We might extend the used aera of a skb beyond the total
    headroom when we install the ipip header. Fix this by
    calling skb_cow_head() unconditionally.

    Bug was introduced with commit c544193214
    ("GRE: Refactor GRE tunneling code.")

    Cc: Pravin Shelar
    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Steffen Klassert
     
  • [ Upstream commit e2401654dd0f5f3fb7a8d80dad9554d73d7ca394 ]

    It is possible for the timer handlers to run after the call to
    ip_mc_down so use in_dev_put instead of __in_dev_put in the handler
    function in order to do proper cleanup when the refcnt reaches 0.
    Otherwise, the refcnt can reach zero without the in_device being
    destroyed and we end up leaking a reference to the net_device and
    see messages like the following,

    unregister_netdevice: waiting for eth0 to become free. Usage count = 1

    Tested on linux-3.4.43.

    Signed-off-by: Salam Noureddine
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Salam Noureddine
     
  • [ Upstream commit 9a3bab6b05383f1e4c3716b3615500c51285959e ]

    A host might need net_secret[] and never open a single socket.

    Problem added in commit aebda156a570782
    ("net: defer net_secret[] initialization")

    Based on prior patch from Hannes Frederic Sowa.

    Reported-by: Hannes Frederic Sowa
    Signed-off-by: Eric Dumazet
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 703133de331a7a7df47f31fb9de51dc6f68a9de8 ]

    If local fragmentation is allowed, then ip_select_ident() and
    ip_select_ident_more() need to generate unique IDs to ensure
    correct defragmentation on the peer.

    For example, if IPsec (tunnel mode) has to encrypt large skbs
    that have local_df bit set, then all IP fragments that belonged
    to different ESP datagrams would have used the same identificator.
    If one of these IP fragments would get lost or reordered, then
    peer could possibly stitch together wrong IP fragments that did
    not belong to the same datagram. This would lead to a packet loss
    or data corruption.

    Signed-off-by: Ansis Atteka
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ansis Atteka
     
  • [ Upstream commit 749154aa56b57652a282cbde57a57abc278d1205 ]

    skb->data already points to IP header, but for the sake of
    consistency we can also use ip_hdr() to retrieve it.

    Signed-off-by: Ansis Atteka
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ansis Atteka
     
  • [ Upstream commit e2e5c4c07caf810d7849658dca42f598b3938e21 ]

    Signed-off-by: Dave Jones
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Dave Jones
     

14 Sep, 2013

11 commits

  • [ Upstream commit eb8895debe1baba41fcb62c78a16f0c63c21662a ]

    In commit 90ba9b19 (tcp: tcp_make_synack() can use alloc_skb()), Eric changed
    the call to sock_wmalloc in tcp_make_synack to alloc_skb. In doing so,
    the netfilter owner match lost its ability to block the SYNACK packet on
    outbound listening sockets. Revert the change, restoring the owner match
    functionality.

    This closes netfilter bugzilla #847.

    Signed-off-by: Phil Oester
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Phil Oester
     
  • [ Upstream commit c27c9322d015dc1d9dfdf31724fca71c0476c4d1 ]

    ipv4: raw_sendmsg: don't use header's destination address

    A sendto() regression was bisected and found to start with commit
    f8126f1d5136be1 (ipv4: Adjust semantics of rt->rt_gateway.)

    The problem is that it tries to ARP-lookup the constructed packet's
    destination address rather than the explicitly provided address.

    Fix this using FLOWI_FLAG_KNOWN_NH so that given nexthop is used.

    cf. commit 2ad5b9e4bd314fc685086b99e90e5de3bc59e26b

    Reported-by: Chris Clark
    Bisected-by: Chris Clark
    Tested-by: Chris Clark
    Suggested-by: Julian Anastasov
    Signed-off-by: Chris Clark
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Chris Clark
     
  • [ Upstream commit e3e12028315749b7fa2edbc37328e5847be9ede9 ]

    The zero value means that tsecr is not valid, so it's a special case.

    tsoffset is used to customize tcp_time_stamp for one socket.
    tsoffset is usually zero, it's used when a socket was moved from one
    host to another host.

    Currently this issue affects logic of tcp_rcv_rtt_measure_ts. Due to
    incorrect value of rcv_tsecr, tcp_rcv_rtt_measure_ts sets rto to
    TCP_RTO_MAX.

    Reported-by: Cyrill Gorcunov
    Cc: Pavel Emelyanov
    Cc: Eric Dumazet
    Cc: "David S. Miller"
    Cc: Alexey Kuznetsov
    Cc: James Morris
    Cc: Hideaki YOSHIFUJI
    Cc: Patrick McHardy
    Signed-off-by: Andrey Vagin
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Andrew Vagin
     
  • [ Upstream commit c7781a6e3c4a9a17e144ec2db00ebfea327bd627 ]

    u32 rcv_tstamp; /* timestamp of last received ACK */

    Its value used in tcp_retransmit_timer, which closes socket
    if the last ack was received more then TCP_RTO_MAX ago.

    Currently rcv_tstamp is initialized to zero and if tcp_retransmit_timer
    is called before receiving a first ack, the connection is closed.

    This patch initializes rcv_tstamp to a timestamp, when a socket was
    restored.

    Reported-by: Cyrill Gorcunov
    Cc: Pavel Emelyanov
    Cc: Eric Dumazet
    Cc: "David S. Miller"
    Cc: Alexey Kuznetsov
    Cc: James Morris
    Cc: Hideaki YOSHIFUJI
    Cc: Patrick McHardy
    Signed-off-by: Andrey Vagin
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Andrew Vagin
     
  • [ Upstream commit 7ed5c5ae96d23da22de95e1c7a239537acd378b1 ]

    When the repair mode is turned off, the write queue seqs are
    updated so that the whole queue is considered to be 'already sent.

    The "when" field must be set for such skb. It's used in tcp_rearm_rto
    for example. If the "when" field isn't set, the retransmit timeout can
    be calculated incorrectly and a tcp connected can stop for two minutes
    (TCP_RTO_MAX).

    Acked-by: Pavel Emelyanov
    Cc: "David S. Miller"
    Cc: Alexey Kuznetsov
    Cc: James Morris
    Cc: Hideaki YOSHIFUJI
    Cc: Patrick McHardy
    Signed-off-by: Andrey Vagin
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Andrey Vagin
     
  • [ Upstream commit 4221f40513233fa8edeef7fc82e44163fde03b9b ]

    Using inner-id for tunnel id is not safe in some rare cases.
    E.g. packets coming from multiple sources entering same tunnel
    can have same id. Therefore on tunnel packet receive we
    could have packets from two different stream but with same
    source and dst IP with same ip-id which could confuse ip packet
    reassembly.

    Following patch reverts optimization from commit
    490ab08127 (IP_GRE: Fix IP-Identification.)

    Signed-off-by: Pravin B Shelar
    CC: Jarno Rajahalme
    CC: Ansis Atteka
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Pravin B Shelar
     
  • [ Upstream commit 77a482bdb2e68d13fae87541b341905ba70d572b ]

    Fix ipgre_header() (header_ops->create) to return the correct
    amount of bytes pushed. Most callers of dev_hard_header() seem
    to care only if it was success, but af_packet.c uses it as
    offset to the skb to copy from userspace only once. In practice
    this fixes packet socket sendto()/sendmsg() to gre tunnels.

    Regression introduced in c54419321455631079c7d6e60bc732dd0c5914c5
    ("GRE: Refactor GRE tunneling code.")

    Cc: Pravin B Shelar
    Signed-off-by: Timo Teräs
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Timo Teräs
     
  • [ Upstream commit cd6b423afd3c08b27e1fed52db828ade0addbc6b ]

    While investigating about strange increase of retransmit rates
    on hosts ~24 days after boot, Van found hystart was disabled
    if ca->epoch_start was 0, as following condition is true
    when tcp_time_stamp high order bit is set.

    (s32)(tcp_time_stamp - ca->epoch_start) < HZ

    Quoting Van :

    At initialization & after every loss ca->epoch_start is set to zero so
    I believe that the above line will turn off hystart as soon as the 2^31
    bit is set in tcp_time_stamp & hystart will stay off for 24 days.
    I think we've observed that cubic's restart is too aggressive without
    hystart so this might account for the higher drop rate we observe.

    Diagnosed-by: Van Jacobson
    Signed-off-by: Eric Dumazet
    Cc: Neal Cardwell
    Cc: Yuchung Cheng
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 2ed0edf9090bf4afa2c6fc4f38575a85a80d4b20 ]

    commit 17a6e9f1aa9 ("tcp_cubic: fix clock dependency") added an
    overflow error in bictcp_update() in following code :

    /* change the unit from HZ to bictcp_HZ */
    t = ((tcp_time_stamp + msecs_to_jiffies(ca->delay_min>>3) -
    ca->epoch_start) << BICTCP_HZ) / HZ;

    Because msecs_to_jiffies() being unsigned long, compiler does
    implicit type promotion.

    We really want to constrain (tcp_time_stamp - ca->epoch_start)
    to a signed 32bit value, or else 't' has unexpected high values.

    This bugs triggers an increase of retransmit rates ~24 days after
    boot [1], as the high order bit of tcp_time_stamp flips.

    [1] for hosts with HZ=1000

    Big thanks to Van Jacobson for spotting this problem.

    Diagnosed-by: Van Jacobson
    Signed-off-by: Eric Dumazet
    Cc: Neal Cardwell
    Cc: Yuchung Cheng
    Cc: Stephen Hemminger
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit aab515d7c32a34300312416c50314e755ea6f765 ]

    AddressSanitizer [1] dynamic checker pointed a potential
    out of bound access in leaf_walk_rcu()

    We could allocate one more slot in tnode_new() to leave the prefetch()
    in-place but it looks not worth the pain.

    Bug added in commit 82cfbb008572b ("[IPV4] fib_trie: iterator recode")

    [1] :
    https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel

    Reported-by: Andrey Konovalov
    Signed-off-by: Eric Dumazet
    Cc: Dmitry Vyukov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 446266b0c742a2c9ee8f0dce759a0117bce58a86 ]

    Commit 5c766d642 ("ipv4: introduce address lifetime") leaves the ifa
    resource that was allocated via inet_alloc_ifa() unfreed when returning
    the function with -EINVAL. Thus, free it first via inet_free_ifa().

    Signed-off-by: Daniel Borkmann
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     

12 Aug, 2013

1 commit


29 Jul, 2013

7 commits

  • [ Upstream commit 21d1196a35f5686c4323e42a62fdb4b23b0ab4a3 ]

    commit 45f00f99d6e ("ipv4: tcp: clean up tcp_v4_early_demux()") added a
    performance regression for non GRO traffic, basically disabling
    IP early demux.

    IPv6 stack resets transport header in ip6_rcv() before calling
    IP early demux in ip6_rcv_finish(), while IPv4 does this only in
    ip_local_deliver_finish(), _after_ IP early demux.

    GRO traffic happened to enable IP early demux because transport header
    is also set in inet_gro_receive()

    Instead of reverting the faulty commit, we can make IPv4/IPv6 behave the
    same : transport_header should be set in ip_rcv() instead of
    ip_local_deliver_finish()

    ip_local_deliver_finish() can also use skb_network_header_len() which is
    faster than ip_hdrlen()

    Signed-off-by: Eric Dumazet
    Cc: Neal Cardwell
    Cc: Tom Herbert
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 8c91e162e058bb91b7766f26f4d5823a21941026 ]

    This change fixes an MTU sizing issue seen with gretap tunnels when non-gso
    packets are sent from the interface.

    In my case I was able to reproduce the issue by simply sending a ping of
    1421 bytes with the gretap interface created on a device with a standard
    1500 mtu.

    This fix is based on the fact that the tunnel mtu is already adjusted by
    dev->hard_header_len so it would make sense that any packets being compared
    against that mtu should also be adjusted by hard_header_len and the tunnel
    header instead of just the tunnel header.

    Signed-off-by: Alexander Duyck
    Reported-by: Cong Wang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Alexander Duyck
     
  • [ Upstream commit 8822b64a0fa64a5dd1dfcf837c5b0be83f8c05d1 ]

    We accidentally call down to ip6_push_pending_frames when uncorking
    pending AF_INET data on a ipv6 socket. This results in the following
    splat (from Dave Jones):

    skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:
    ------------[ cut here ]------------
    kernel BUG at net/core/skbuff.c:126!
    invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
    +netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
    CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37
    task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
    RIP: 0010:[] [] skb_panic+0x63/0x65
    RSP: 0018:ffff8801e6431de8 EFLAGS: 00010282
    RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
    RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
    RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
    R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
    FS: 00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
    Stack:
    ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
    ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
    ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
    Call Trace:
    [] skb_push+0x3a/0x40
    [] ip6_push_pending_frames+0x1f6/0x4d0
    [] ? mark_held_locks+0xbb/0x140
    [] udp_v6_push_pending_frames+0x2b9/0x3d0
    [] ? udplite_getfrag+0x20/0x20
    [] udp_lib_setsockopt+0x1aa/0x1f0
    [] ? fget_light+0x387/0x4f0
    [] udpv6_setsockopt+0x34/0x40
    [] sock_common_setsockopt+0x14/0x20
    [] SyS_setsockopt+0x71/0xd0
    [] tracesys+0xdd/0xe2
    Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
    RIP [] skb_panic+0x63/0x65
    RSP

    This patch adds a check if the pending data is of address family AF_INET
    and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
    if that is the case.

    This bug was found by Dave Jones with trinity.

    (Also move the initialization of fl6 below the AF_INET check, even if
    not strictly necessary.)

    Signed-off-by: Hannes Frederic Sowa
    Cc: Dave Jones
    Cc: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Hannes Frederic Sowa
     
  • [ Upstream commit 3b7b514f44bff05d26a6499c4d4fac2a83938e6e ]

    This is a regression introduced by
    commit fd58156e456d9f68fe0448 (IPIP: Use ip-tunneling code.)

    Similar to GRE tunnel, previously we only check the parameters
    for SIOCADDTUNNEL and SIOCCHGTUNNEL, after that commit, the
    check is moved for all commands.

    So, just check for SIOCADDTUNNEL and SIOCCHGTUNNEL.

    Also, the check for i_key, o_key etc. is suspicious too,
    which did not exist before, reset them before passing
    to ip_tunnel_ioctl().

    Signed-off-by: Cong Wang
    Cc: Pravin B Shelar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit 23a3647bc4f93bac3776c66dc2c7f7f68b3cd662 ]

    In path mtu check, ip header total length works for gre device
    but not for gre-tap device. Use skb len which is consistent
    for all tunneling types. This is old bug in gre.
    This also fixes mtu calculation bug introduced by
    commit c54419321455631079c7d (GRE: Refactor GRE tunneling code).

    Reported-by: Timo Teras
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Pravin B Shelar
     
  • [ Upstream commit ab6c7a0a43c2eaafa57583822b619b22637b49c7 ]

    vti module allocates dev->tstats twice: in vti_fb_tunnel_init()
    and in vti_tunnel_init(), this lead to a memory leak of
    dev->tstats.

    Just remove the duplicated operations in vti_fb_tunnel_init().

    (candidate for -stable)

    Signed-off-by: Cong Wang
    Cc: Stephen Hemminger
    Cc: Saurabh Mohan
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit 6c734fb8592f6768170e48e7102cb2f0a1bb9759 ]

    When testing GRE tunnel, I got:

    # ip tunnel show
    get tunnel gre0 failed: Invalid argument
    get tunnel gre1 failed: Invalid argument

    This is a regression introduced by commit c54419321455631079c7d
    ("GRE: Refactor GRE tunneling code.") because previously we
    only check the parameters for SIOCADDTUNNEL and SIOCCHGTUNNEL,
    after that commit, the check is moved for all commands.

    So, just check for SIOCADDTUNNEL and SIOCCHGTUNNEL.

    After this patch I got:

    # ip tunnel show
    gre0: gre/ip remote any local any ttl inherit nopmtudisc
    gre1: gre/ip remote 192.168.122.101 local 192.168.122.45 ttl inherit

    Signed-off-by: Cong Wang
    Cc: Pravin B Shelar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     

26 Jun, 2013

1 commit

  • commit 68c331631143 ("v4 GRE: Add TCP segmentation offload for GRE")
    added a possible skb leak, because it frees only the head of segment
    list, in case a skb_linearize() call fails.

    This patch adds a kfree_skb_list() helper to fix the bug.

    Signed-off-by: Eric Dumazet
    Cc: Pravin B Shelar
    Cc: Daniel Borkmann
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Jun, 2013

1 commit

  • Pablo Neira Ayuso says:

    ====================
    The following patchset contains five fixes for Netfilter/IPVS, they are:

    * A skb leak fix in fragmentation handling in case that helpers are in place,
    it occurs since the IPV6 NAT infrastructure, from Phil Oester.

    * Fix SCTP port mangling in ICMP packets for IPVS, from Julian Anastasov.

    * Fix event delivery in ctnetlink regarding the new connlabel infrastructure,
    from Florian Westphal.

    * Fix mangling in the SIP NAT helper, from Balazs Peter Odor.

    * Fix crash in ipt_ULOG introduced while adding netnamespace support,
    from Gao Feng.

    I'll take care of passing several of these patches to -stable once they hit
    Linus' tree.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Jun, 2013

1 commit

  • The parameter of setup_timer should be &ulog->nlgroup[i].
    the incorrect parameter will cause kernel panic in
    ulog_timer.

    Bug introducted in commit 355430671ad93546b34b4e91bdf720f3a704efa4
    "netfilter: ipt_ULOG: add net namespace support for ipt_ULOG"

    ebt_ULOG doesn't have this problem.

    [ I have mangled this patch to fix nlgroup != 0 case, we were
    also crashing there --pablo ]

    Tested-by: George Spelvin
    Reported-by: Borislav Petkov
    Signed-off-by: Gao feng
    Signed-off-by: Pablo Neira Ayuso

    Gao feng
     

20 Jun, 2013

1 commit

  • MD5 key lookups on a given TCP socket were being performed
    incorrectly. This fix alters parameter inputs to the MD5
    lookup function tcp_md5_do_lookup, which is called by functions
    tcp_md5_do_add and tcp_md5_do_del. Specifically, the change now
    inputs the correct address and address family required to make
    a proper lookup.

    Signed-off-by: Aydin Arik
    Signed-off-by: David S. Miller

    Aydin Arik
     

13 Jun, 2013

2 commits

  • If CONFIG_NET_NS is not set then __net_init is the same as __init and
    __net_exit is the same as __exit. These functions will be removed from
    memory after the module loads or is removed. Functions that are exported
    for use by other functions should never be labeled for removal.

    Bug introduced by commit c54419321455631079c
    ("GRE: Refactor GRE tunneling code.")

    Reported-by: Steinar H. Gunderson
    Signed-off-by: Steven Rostedt
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • If users apply shaper to vti tunnel then it will cause a kernel crash. The
    problem seems to be due to the vti_tunnel_xmit function not clearing
    skb->opt field before passing the packet to xfrm tunneling code.

    Signed-off-by: Saurabh Mohan
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Saurabh Mohan
     

31 May, 2013

1 commit

  • Pablo Neira Ayuso says:

    ====================
    The following patchset contains Netfilter/IPVS fixes for 3.10-rc3,
    they are:

    * fix xt_addrtype with IPv6, from Florian Westphal. This required
    a new hook for IPv6 functions in the netfilter core to avoid
    hard dependencies with the ipv6 subsystem when this match is
    only used for IPv4.

    * fix connection reuse case in IPVS. Currently, if an reused
    connection are directed to the same server. If that server is
    down, those connection would fail. Therefore, clear the
    connection and choose a new server among the available ones.

    * fix possible non-nul terminated string sent to user-space if
    ipt_ULOG is used as the default netfilter logging stub, from
    Chen Gang.

    * fix mark logging of IPv6 packets in xt_LOG, from Michal Kubecek.
    This bug has been there since 2.6.26.

    * Fix breakage ip_vs_sh due to incorrect structure layout for
    RCU, from Jan Beulich.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

28 May, 2013

1 commit

  • Unlike ipv4_redirect() and ipv4_sk_redirect(), ip_do_redirect()
    doesn't call __build_flow_key() directly but via
    ip_rt_build_flow_key() wrapper. This leads to __build_flow_key()
    getting pointer to IPv4 header of the ICMP redirect packet
    rather than pointer to the embedded IPv4 header of the packet
    initiating the redirect.

    As a result, handling of ICMP redirects initiated by TCP packets
    is broken. Issue was introduced by

    4895c771c ("ipv4: Add FIB nexthop exceptions.")

    Signed-off-by: Michal Kubecek
    Signed-off-by: David S. Miller

    Michal Kubecek
     

26 May, 2013

1 commit

  • Daniel Petre reported crashes in icmp_dst_unreach() with following call
    graph:

    #3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
    #4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
    #5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
    #6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
    #7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
    #8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
    #9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
    #10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
    #11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596

    Daniel found a similar problem mentioned in
    http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html

    And indeed this is the root cause : skb->cb[] contains data fooling IP
    stack.

    We must clear IPCB in ip_tunnel_xmit() sooner in case dst_link_failure()
    is called. Or else skb->cb[] might contain garbage from GSO segmentation
    layer.

    A similar fix was tested on linux-3.9, but gre code was refactored in
    linux-3.10. I'll send patches for stable kernels as well.

    Many thanks to Daniel for providing reports, patches and testing !

    Reported-by: Daniel Petre
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 May, 2013

1 commit

  • commit 3853b5841c01a ("xps: Improvements in TX queue selection")
    introduced ooo_okay flag, but the condition to set it is slightly wrong.

    In our traces, we have seen ACK packets being received out of order,
    and RST packets sent in response.

    We should test if we have any packets still in host queue.

    Signed-off-by: Eric Dumazet
    Cc: Tom Herbert
    Cc: Yuchung Cheng
    Cc: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 May, 2013

2 commits

  • If nf_log uses ipt_ULOG as logging output, we can deliver non-null
    terminated strings to user-space since the maximum length of the
    prefix that is passed by nf_log is NF_LOG_PREFIXLEN but pm->prefix
    is 32 bytes long (ULOG_PREFIX_LEN).

    This is actually happening already from nf_conntrack_tcp if ipt_ULOG
    is used, since it is passing strings longer than 32 bytes.

    Signed-off-by: Chen Gang
    Signed-off-by: Pablo Neira Ayuso

    Chen Gang
     
  • This patch is a fix for a bug triggering newly_acked_sacked < 0
    in tcp_ack(.).

    The bug is triggered by sacked_out decreasing relative to prior_sacked,
    but packets_out remaining the same as pior_packets. This is because the
    snapshot of prior_packets is taken after tcp_sacktag_write_queue() while
    prior_sacked is captured before tcp_sacktag_write_queue(). The problem
    is: tcp_sacktag_write_queue (tcp_match_skb_to_sack() -> tcp_fragment)
    adjusts the pcount for packets_out and sacked_out (MSS change or other
    reason). As a result, this delta in pcount is reflected in
    (prior_sacked - sacked_out) but not in (prior_packets - packets_out).

    This patch does the following:
    1) initializes prior_packets at the start of tcp_ack() so as to
    capture the delta in packets_out created by tcp_fragment.
    2) introduces a new "previous_packets_out" variable that snapshots
    packets_out right before tcp_clean_rtx_queue, so pkts_acked can be
    correctly computed as before.
    3) Computes pkts_acked using previous_packets_out, and computes
    newly_acked_sacked using prior_packets.

    Signed-off-by: Nandita Dukkipati
    Acked-by: Yuchung Cheng
    Signed-off-by: David S. Miller

    Nandita Dukkipati
     

20 May, 2013

1 commit


17 May, 2013

2 commits

  • GSO TCP handler has following issues :

    1) ooo_okay from original GSO packet is duplicated to all segments
    2) segments (but the last one) are orphaned, so transmit path can not
    get transmit queue number from the socket. This happens if GSO
    segmentation is done before stacked device for example.

    Result is we can send packets from a given TCP flow to different TX
    queues (if using multiqueue NICS). This generates OOO problems and
    spurious SACK & retransmits.

    Fix this by keeping socket pointer set for all segments.

    This means that every segment must also have a destructor, and the
    original gso skb truesize must be split on all segments, to keep
    precise sk->sk_wmem_alloc accounting.

    Signed-off-by: Eric Dumazet
    Cc: Maciej Żenczykowski
    Cc: Tom Herbert
    Cc: Neal Cardwell
    Cc: Yuchung Cheng
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Pablo Neira Ayuso says:

    ====================
    The following patchset contains three Netfilter fixes and update
    for the MAINTAINER file for your net tree, they are:

    * Fix crash if nf_log_packet is called from conntrack, in that case
    both interfaces are NULL, from Hans Schillstrom. This bug introduced
    with the logging netns support in the previous merge window.

    * Fix compilation of nf_log and nf_queue without CONFIG_PROC_FS,
    from myself. This bug was introduced in the previous merge window
    with the new netns support for the netfilter logging infrastructure.

    * Fix possible crash in xt_TCPOPTSTRIP due to missing sanity
    checkings to validate that the TCP header is well-formed, from
    myself. I can find this bug in 2.6.25, probably it's been there
    since the beginning. I'll pass this to -stable.

    * Update MAINTAINER file to point to new nf trees at git.kernel.org,
    remove Harald and use M: instead of P: (now obsolete tag) to
    keep Jozsef in the list of people.

    Please, consider pulling this. Thanks!
    ====================

    Signed-off-by: David S. Miller

    David S. Miller