14 Sep, 2013

1 commit

  • [ Upstream commit 3a1c756590633c0e86df606e5c618c190926a0df ]

    In tcp_v6_do_rcv() code, when processing pkt options, we soley work
    on our skb clone opt_skb that we've created earlier before entering
    tcp_rcv_established() on our way. However, only in condition ...

    if (np->rxopt.bits.rxtclass)
    np->rcv_tclass = ipv6_get_dsfield(ipv6_hdr(skb));

    ... we work on skb itself. As we extract every other information out
    of opt_skb in ipv6_pktoptions path, this seems wrong, since skb can
    already be released by tcp_rcv_established() earlier on. When we try
    to access it in ipv6_hdr(), we will dereference freed skb.

    [ Bug added by commit 4c507d2897bd9b ("net: implement IP_RECVTOS for
    IP_PKTOPTIONS") ]

    Signed-off-by: Daniel Borkmann
    Cc: Eric Dumazet
    Acked-by: Eric Dumazet
    Acked-by: Jiri Benc
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Daniel Borkmann
     

12 May, 2013

1 commit

  • We have seen multiple NULL dereferences in __inet6_lookup_established()

    After analysis, I found that inet6_sk() could be NULL while the
    check for sk_family == AF_INET6 was true.

    Bug was added in linux-2.6.29 when RCU lookups were introduced in UDP
    and TCP stacks.

    Once an IPv6 socket, using SLAB_DESTROY_BY_RCU is inserted in a hash
    table, we no longer can clear pinet6 field.

    This patch extends logic used in commit fcbdf09d9652c891
    ("net: fix nulls list corruptions in sk_prot_alloc")

    TCP/UDP/UDPLite IPv6 protocols provide their own .clear_sk() method
    to make sure we do not clear pinet6 field.

    At socket clone phase, we do not really care, as cloning the parent (non
    NULL) pinet6 is not adding a fatal race.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Apr, 2013

1 commit

  • Add MIB counters for checksum errors in IP layer,
    and TCP/UDP/ICMP layers, to help diagnose problems.

    $ nstat -a | grep Csum
    IcmpInCsumErrors 72 0.0
    TcpInCsumErrors 382 0.0
    UdpInCsumErrors 463221 0.0
    Icmp6InCsumErrors 75 0.0
    Udp6InCsumErrors 173442 0.0
    IpExtInCsumErrors 10884 0.0

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Apr, 2013

2 commits

  • Conflicts:
    drivers/nfc/microread/mei.c
    net/netfilter/nfnetlink_queue_core.c

    Pull in 'net' to get Eric Biederman's AF_UNIX fix, upon which
    some cleanups are going to go on-top.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Tetja Rediske found that if the host receives an ICMPv6 redirect message
    after sending a SYN+ACK, the connection will be reset.

    He bisected it down to 093d04d (ipv6: Change skb->data before using
    icmpv6_notify() to propagate redirect), but the origin of the bug comes
    from ec18d9a26 (ipv6: Add redirect support to all protocol icmp error
    handlers.). The bug simply did not trigger prior to 093d04d, because
    skb->data did not point to the inner IP header and thus icmpv6_notify
    did not call the correct err_handler.

    This patch adds the missing "goto out;" in tcp_v6_err. After receiving
    an ICMPv6 Redirect, we should not continue processing the ICMP in
    tcp_v6_err, as this may trigger the removal of request-socks or setting
    sk_err(_soft).

    Reported-by: Tetja Rediske
    Signed-off-by: Christoph Paasch
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Christoph Paasch
     

21 Mar, 2013

1 commit


19 Mar, 2013

1 commit

  • When an ICMP ICMP_FRAG_NEEDED (or ICMPV6_PKT_TOOBIG) message finds a
    LISTEN socket, and this socket is currently owned by the user, we
    set TCP_MTU_REDUCED_DEFERRED flag in listener tsq_flags.

    This is bad because if we clone the parent before it had a chance to
    clear the flag, the child inherits the tsq_flags value, and next
    tcp_release_cb() on the child will decrement sk_refcnt.

    Result is that we might free a live TCP socket, as reported by
    Dormando.

    IPv4: Attempt to release TCP socket in state 1

    Fix this issue by testing sk_state against TCP_LISTEN early, so that we
    set TCP_MTU_REDUCED_DEFERRED on appropriate sockets (not a LISTEN one)

    This bug was introduced in commit 563d34d05786
    (tcp: dont drop MTU reduction indications)

    Reported-by: dormando
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Mar, 2013

1 commit

  • TCPCT uses option-number 253, reserved for experimental use and should
    not be used in production environments.
    Further, TCPCT does not fully implement RFC 6013.

    As a nice side-effect, removing TCPCT increases TCP's performance for
    very short flows:

    Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests
    for files of 1KB size.

    before this patch:
    average (among 7 runs) of 20845.5 Requests/Second
    after:
    average (among 7 runs) of 21403.6 Requests/Second

    Signed-off-by: Christoph Paasch
    Signed-off-by: David S. Miller

    Christoph Paasch
     

14 Feb, 2013

1 commit

  • A socket timestamp is a sum of the global tcp_time_stamp and
    a per-socket offset.

    A socket offset is added in places where externally visible
    tcp timestamp option is parsed/initialized.

    Connections in the SYN_RECV state are not supported, global
    tcp_time_stamp is used for them, because repair mode doesn't support
    this state. In a future it can be implemented by the similar way
    as for TIME_WAIT sockets.

    Cc: "David S. Miller"
    Cc: Alexey Kuznetsov
    Cc: James Morris
    Cc: Hideaki YOSHIFUJI
    Cc: Patrick McHardy
    Cc: Eric Dumazet
    Cc: Pavel Emelyanov
    Signed-off-by: Andrey Vagin
    Signed-off-by: David S. Miller

    Andrey Vagin
     

06 Feb, 2013

1 commit

  • Conflicts:
    drivers/net/ethernet/intel/e1000e/ethtool.c
    drivers/net/vmxnet3/vmxnet3_drv.c
    drivers/net/wireless/iwlwifi/dvm/tx.c
    net/ipv6/route.c

    The ipv6 route.c conflict is simple, just ignore the 'net' side change
    as we fixed the same problem in 'net-next' by eliminating cached
    neighbours from ipv6 routes.

    The e1000e conflict is an addition of a new statistic in the ethtool
    code, trivial.

    The vmxnet3 conflict is about one change in 'net' removing a guarding
    conditional, whilst in 'net-next' we had a netdev_info() conversion.

    The iwlwifi conflict is dealing with a WARN_ON() conversion in
    'net-next' vs. a revert happening in 'net'.

    Signed-off-by: David S. Miller

    David S. Miller
     

05 Feb, 2013

1 commit

  • This patch updates LINUX_MIB_LISTENDROPS and LINUX_MIB_LISTENOVERFLOWS in
    tcp_v6_conn_request() and tcp_v6_err(). tcp_v6_conn_request() in particular can
    drop SYNs for various reasons which are not currently tracked.

    Signed-off-by: Vijay Subramanian
    Signed-off-by: David S. Miller

    Vijay Subramanian
     

24 Jan, 2013

1 commit

  • Motivation for soreuseport would be something like a web server
    binding to port 80 running with multiple threads, where each thread
    might have it's own listener socket. This could be done as an
    alternative to other models: 1) have one listener thread which
    dispatches completed connections to workers. 2) accept on a single
    listener socket from multiple threads. In case #1 the listener thread
    can easily become the bottleneck with high connection turn-over rate.
    In case #2, the proportion of connections accepted per thread tends
    to be uneven under high connection load (assuming simple event loop:
    while (1) { accept(); process() }, wakeup does not promote fairness
    among the sockets. We have seen the disproportion to be as high
    as 3:1 ratio between thread accepting most connections and the one
    accepting the fewest. With so_reusport the distribution is
    uniform.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

14 Jan, 2013

1 commit


07 Jan, 2013

1 commit

  • As per suggestion from Eric Dumazet this patch makes tcp_ecn sysctl
    namespace aware. The reason behind this patch is to ease the testing
    of ecn problems on the internet and allows applications to tune their
    own use of ecn.

    Cc: Eric Dumazet
    Cc: David Miller
    Cc: Stephen Hemminger
    Signed-off-by: Hannes Frederic Sowa
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

15 Dec, 2012

1 commit

  • If in either of the above functions inet_csk_route_child_sock() or
    __inet_inherit_port() fails, the newsk will not be freed:

    unreferenced object 0xffff88022e8a92c0 (size 1592):
    comm "softirq", pid 0, jiffies 4294946244 (age 726.160s)
    hex dump (first 32 bytes):
    0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00 ................
    02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] kmemleak_alloc+0x21/0x3e
    [] kmem_cache_alloc+0xb5/0xc5
    [] sk_prot_alloc.isra.53+0x2b/0xcd
    [] sk_clone_lock+0x16/0x21e
    [] inet_csk_clone_lock+0x10/0x7b
    [] tcp_create_openreq_child+0x21/0x481
    [] tcp_v4_syn_recv_sock+0x3a/0x23b
    [] tcp_check_req+0x29f/0x416
    [] tcp_v4_do_rcv+0x161/0x2bc
    [] tcp_v4_rcv+0x6c9/0x701
    [] ip_local_deliver_finish+0x70/0xc4
    [] ip_local_deliver+0x4e/0x7f
    [] ip_rcv_finish+0x1fc/0x233
    [] ip_rcv+0x217/0x267
    [] __netif_receive_skb+0x49e/0x553
    [] netif_receive_skb+0x50/0x82

    This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus
    a single sock_put() is not enough to free the memory. Additionally, things
    like xfrm, memcg, cookie_values,... may have been initialized.
    We have to free them properly.

    This is fixed by forcing a call to tcp_done(), ending up in
    inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary,
    because it ends up doing all the cleanup on xfrm, memcg, cookie_values,
    xfrm,...

    Before calling tcp_done, we have to set the socket to SOCK_DEAD, to
    force it entering inet_csk_destroy_sock. To avoid the warning in
    inet_csk_destroy_sock, inet_num has to be set to 0.
    As inet_csk_destroy_sock does a dec on orphan_count, we first have to
    increase it.

    Calling tcp_done() allows us to remove the calls to
    tcp_clear_xmit_timer() and tcp_cleanup_congestion_control().

    A similar approach is taken for dccp by calling dccp_done().

    This is in the kernel since 093d282321 (tproxy: fix hash locking issue
    when using port redirection in __inet_inherit_port()), thus since
    version >= 2.6.37.

    Signed-off-by: Christoph Paasch
    Signed-off-by: David S. Miller

    Christoph Paasch
     

23 Nov, 2012

1 commit

  • This is work the same as for ipv4.

    All other hacks about tcp repair are in common code for ipv4 and ipv6,
    so this patch is enough for repairing ipv6 connections.

    Cc: "David S. Miller"
    Cc: Alexey Kuznetsov
    Cc: James Morris
    Cc: Hideaki YOSHIFUJI
    Cc: Patrick McHardy
    Cc: Pavel Emelyanov
    Signed-off-by: Andrey Vagin
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Andrey Vagin
     

16 Nov, 2012

4 commits


04 Nov, 2012

1 commit

  • For passive TCP connections using TCP_DEFER_ACCEPT facility,
    we incorrectly increment req->retrans each time timeout triggers
    while no SYNACK is sent.

    SYNACK are not sent for TCP_DEFER_ACCEPT that were established (for
    which we received the ACK from client). Only the last SYNACK is sent
    so that we can receive again an ACK from client, to move the req into
    accept queue. We plan to change this later to avoid the useless
    retransmit (and potential problem as this SYNACK could be lost)

    TCP_INFO later gives wrong information to user, claiming imaginary
    retransmits.

    Decouple req->retrans field into two independent fields :

    num_retrans : number of retransmit
    num_timeout : number of timeouts

    num_timeout is the counter that is incremented at each timeout,
    regardless of actual SYNACK being sent or not, and used to
    compute the exponential timeout.

    Introduce inet_rtx_syn_ack() helper to increment num_retrans
    only if ->rtx_syn_ack() succeeded.

    Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans
    when we re-send a SYNACK in answer to a (retransmitted) SYN.
    Prior to this patch, we were not counting these retransmits.

    Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS
    only if a synack packet was successfully queued.

    Reported-by: Yuchung Cheng
    Signed-off-by: Eric Dumazet
    Cc: Julian Anastasov
    Cc: Vijay Subramanian
    Cc: Elliott Hughes
    Cc: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Oct, 2012

1 commit

  • Remove an icsk variable, which by convention should refer to an
    inet_connection_sock rather than an inet_sock. In the process, make
    the tcp_v6_early_demux() code and formatting a bit more like
    tcp_v4_early_demux(), to ease comparisons and maintenance.

    Signed-off-by: Neal Cardwell
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neal Cardwell
     

13 Oct, 2012

1 commit

  • After commit e2446eaa ("tcp_v4_send_reset: binding oif to iif in no
    sock case").. tcp resets are always lost, when routing is asymmetric.
    Yes, backing out that patch will result in misrouting of resets for
    dead connections which used interface binding when were alive, but we
    actually cannot do anything here. What's died that's died and correct
    handling normal unbound connections is obviously a priority.

    Comment to comment:
    > This has few benefits:
    > 1. tcp_v6_send_reset already did that.

    It was done to route resets for IPv6 link local addresses. It was a
    mistake to do so for global addresses. The patch fixes this as well.

    Actually, the problem appears to be even more serious than guaranteed
    loss of resets. As reported by Sergey Soloviev , those
    misrouted resets create a lot of arp traffic and huge amount of
    unresolved arp entires putting down to knees NAT firewalls which use
    asymmetric routing.

    Signed-off-by: Alexey Kuznetsov

    Alexey Kuznetsov
     

02 Oct, 2012

1 commit

  • skb with CHECKSUM_NONE cant currently be handled by GRO, and
    we notice this deep in GRO stack in tcp[46]_gro_receive()

    But there are cases where GRO can be a benefit, even with a lack
    of checksums.

    This preliminary work is needed to add GRO support
    to tunnels.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Sep, 2012

2 commits

  • When taking SYNACK RTT samples for servers using TCP Fast Open, fix
    the code to ensure that we only call tcp_valid_rtt_meas() after we
    receive the ACK that completes the 3-way handshake.

    Previously we were always taking an RTT sample in
    tcp_v4_syn_recv_sock(). However, for TCP Fast Open connections
    tcp_v4_conn_req_fastopen() calls tcp_v4_syn_recv_sock() at the time we
    receive the SYN. So for TFO we must wait until tcp_rcv_state_process()
    to take the RTT sample.

    To fix this, we wait until after TFO calls tcp_v4_syn_recv_sock()
    before we set the snt_synack timestamp, since tcp_synack_rtt_meas()
    already ensures that we only take a SYNACK RTT sample if snt_synack is
    non-zero. To be careful, we only take a snt_synack timestamp when
    a SYNACK transmit or retransmit succeeds.

    Signed-off-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Neal Cardwell
     
  • In preparation for adding another spot where we compute the SYNACK
    RTT, extract this code so that it can be shared.

    Signed-off-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Neal Cardwell
     

15 Sep, 2012

1 commit

  • Conflicts:
    net/netfilter/nfnetlink_log.c
    net/netfilter/xt_LOG.c

    Rather easy conflict resolution, the 'net' tree had bug fixes to make
    sure we checked if a socket is a time-wait one or not and elide the
    logging code if so.

    Whereas on the 'net-next' side we are calculating the UID and GID from
    the creds using different interfaces due to the user namespace changes
    from Eric Biederman.

    Signed-off-by: David S. Miller

    David S. Miller
     

06 Sep, 2012

1 commit

  • commit 144d56e91044181ec0ef67aeca91e9a8b5718348
    ("tcp: fix possible socket refcount problem") is missing
    the IPv6 part. As tcp_release_cb is shared by both protocols
    we should hold sock reference for the TCP_MTU_REDUCED_DEFERRED
    bit.

    Signed-off-by: Julian Anastasov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Julian Anastasov
     

01 Sep, 2012

1 commit

  • This patch builds on top of the previous patch to add the support
    for TFO listeners. This includes -

    1. allocating, properly initializing, and managing the per listener
    fastopen_queue structure when TFO is enabled

    2. changes to the inet_csk_accept code to support TFO. E.g., the
    request_sock can no longer be freed upon accept(), not until 3WHS
    finishes

    3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg()
    if it's a TFO socket

    4. properly closing a TFO listener, and a TFO socket before 3WHS
    finishes

    5. supporting TCP_FASTOPEN socket option

    6. modifying tcp_check_req() to use to check a TFO socket as well
    as request_sock

    7. supporting TCP's TFO cookie option

    8. adding a new SYN-ACK retransmit handler to use the timer directly
    off the TFO socket rather than the listener socket. Note that TFO
    server side will not retransmit anything other than SYN-ACK until
    the 3WHS is completed.

    The patch also contains an important function
    "reqsk_fastopen_remove()" to manage the somewhat complex relation
    between a listener, its request_sock, and the corresponding child
    socket. See the comment above the function for the detail.

    Signed-off-by: H.K. Jerry Chu
    Cc: Yuchung Cheng
    Cc: Neal Cardwell
    Cc: Eric Dumazet
    Cc: Tom Herbert
    Signed-off-by: David S. Miller

    Jerry Chu
     

25 Aug, 2012

1 commit


23 Aug, 2012

1 commit


20 Aug, 2012

1 commit

  • This commit removes the sk_rx_dst_set calls from
    tcp_create_openreq_child(), because at that point the icsk_af_ops
    field of ipv6_mapped TCP sockets has not been set to its proper final
    value.

    Instead, to make sure we get the right sk_rx_dst_set variant
    appropriate for the address family of the new connection, we have
    tcp_v{4,6}_syn_recv_sock() directly call the appropriate function
    shortly after the call to tcp_create_openreq_child() returns.

    This also moves inet6_sk_rx_dst_set() to avoid a forward declaration
    with the new approach.

    Signed-off-by: Neal Cardwell
    Reported-by: Artem Savkov
    Cc: Eric Dumazet
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neal Cardwell
     

15 Aug, 2012

1 commit


10 Aug, 2012

2 commits

  • commit 5d299f3d3c8a2fb (net: ipv6: fix TCP early demux) added a
    regression for ipv6_mapped case.

    [ 67.422369] SELinux: initialized (dev autofs, type autofs), uses
    genfs_contexts
    [ 67.449678] SELinux: initialized (dev autofs, type autofs), uses
    genfs_contexts
    [ 92.631060] BUG: unable to handle kernel NULL pointer dereference at
    (null)
    [ 92.631435] IP: [< (null)>] (null)
    [ 92.631645] PGD 0
    [ 92.631846] Oops: 0010 [#1] SMP
    [ 92.632095] Modules linked in: autofs4 sunrpc ipv6 dm_mirror
    dm_region_hash dm_log dm_multipath dm_mod video sbs sbshc battery ac lp
    parport sg snd_hda_intel snd_hda_codec snd_seq_oss snd_seq_midi_event
    snd_seq snd_seq_device pcspkr snd_pcm_oss snd_mixer_oss snd_pcm
    snd_timer serio_raw button floppy snd i2c_i801 i2c_core soundcore
    snd_page_alloc shpchp ide_cd_mod cdrom microcode ehci_hcd ohci_hcd
    uhci_hcd
    [ 92.634294] CPU 0
    [ 92.634294] Pid: 4469, comm: sendmail Not tainted 3.6.0-rc1 #3
    [ 92.634294] RIP: 0010:[] [< (null)>]
    (null)
    [ 92.634294] RSP: 0018:ffff880245fc7cb0 EFLAGS: 00010282
    [ 92.634294] RAX: ffffffffa01985f0 RBX: ffff88024827ad00 RCX:
    0000000000000000
    [ 92.634294] RDX: 0000000000000218 RSI: ffff880254735380 RDI:
    ffff88024827ad00
    [ 92.634294] RBP: ffff880245fc7cc8 R08: 0000000000000001 R09:
    0000000000000000
    [ 92.634294] R10: 0000000000000000 R11: ffff880245fc7bf8 R12:
    ffff880254735380
    [ 92.634294] R13: ffff880254735380 R14: 0000000000000000 R15:
    7fffffffffff0218
    [ 92.634294] FS: 00007f4516ccd6f0(0000) GS:ffff880256600000(0000)
    knlGS:0000000000000000
    [ 92.634294] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 92.634294] CR2: 0000000000000000 CR3: 0000000245ed1000 CR4:
    00000000000007f0
    [ 92.634294] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
    0000000000000000
    [ 92.634294] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
    0000000000000400
    [ 92.634294] Process sendmail (pid: 4469, threadinfo ffff880245fc6000,
    task ffff880254b8cac0)
    [ 92.634294] Stack:
    [ 92.634294] ffffffff813837a7 ffff88024827ad00 ffff880254b6b0e8
    ffff880245fc7d68
    [ 92.634294] ffffffff81385083 00000000001d2680 ffff8802547353a8
    ffff880245fc7d18
    [ 92.634294] ffffffff8105903a ffff88024827ad60 0000000000000002
    00000000000000ff
    [ 92.634294] Call Trace:
    [ 92.634294] [] ? tcp_finish_connect+0x2c/0xfa
    [ 92.634294] [] tcp_rcv_state_process+0x2b6/0x9c6
    [ 92.634294] [] ? sched_clock_cpu+0xc3/0xd1
    [ 92.634294] [] ? local_clock+0x2b/0x3c
    [ 92.634294] [] tcp_v4_do_rcv+0x63a/0x670
    [ 92.634294] [] release_sock+0x128/0x1bd
    [ 92.634294] [] __inet_stream_connect+0x1b1/0x352
    [ 92.634294] [] ? lock_sock_nested+0x74/0x7f
    [ 92.634294] [] ? wake_up_bit+0x25/0x25
    [ 92.634294] [] ? lock_sock_nested+0x74/0x7f
    [ 92.634294] [] ? inet_stream_connect+0x22/0x4b
    [ 92.634294] [] inet_stream_connect+0x33/0x4b
    [ 92.634294] [] sys_connect+0x78/0x9e
    [ 92.634294] [] ? sysret_check+0x1b/0x56
    [ 92.634294] [] ? __audit_syscall_entry+0x195/0x1c8
    [ 92.634294] [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [ 92.634294] [] system_call_fastpath+0x16/0x1b
    [ 92.634294] Code: Bad RIP value.
    [ 92.634294] RIP [< (null)>] (null)
    [ 92.634294] RSP
    [ 92.634294] CR2: 0000000000000000
    [ 92.648982] ---[ end trace 24e2bed94314c8d9 ]---
    [ 92.649146] Kernel panic - not syncing: Fatal exception in interrupt

    Fix this using inet_sk_rx_dst_set(), and export this function in case
    IPv6 is modular.

    Reported-by: Andrew Morton
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Various /proc/net files sometimes report crazy timer values, expressed
    in clock_t units.

    This happens when an expired timer delta (expires - jiffies) is passed
    to jiffies_to_clock_t().

    This function has an overflow in :

    return div_u64((u64)x * TICK_NSEC, NSEC_PER_SEC / USER_HZ);

    commit cbbc719fccdb8cb (time: Change jiffies_to_clock_t() argument type
    to unsigned long) only got around the problem.

    As we cant output negative values in /proc/net/tcp without breaking
    various tools, I suggest adding a jiffies_delta_to_clock_t() wrapper
    that caps the negative delta to a 0 value.

    Signed-off-by: Eric Dumazet
    Reported-by: Maciej Żenczykowski
    Cc: Thomas Gleixner
    Cc: Paul Gortmaker
    Cc: Andrew Morton
    Cc: hank
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Aug, 2012

1 commit

  • IPv6 needs a cookie in dst_check() call.

    We need to add rx_dst_cookie and provide a family independent
    sk_rx_dst_set(sk, skb) method to properly support IPv6 TCP early demux.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Aug, 2012

2 commits

  • Introduce sk_gfp_atomic(), this function allows to inject sock specific
    flags to each sock related allocation. It is only used on allocation
    paths that may be required for writing pages back to network storage.

    [davem@davemloft.net: Use sk_gfp_atomic only when necessary]
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Mel Gorman
    Acked-by: David S. Miller
    Cc: Neil Brown
    Cc: Mike Christie
    Cc: Eric B Munson
    Cc: Eric Dumazet
    Cc: Sebastian Andrzej Siewior
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Sanity:

    CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG
    CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP
    CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED
    CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM

    [mhocko@suse.cz: fix missed bits]
    Cc: Glauber Costa
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: Hugh Dickins
    Cc: Tejun Heo
    Cc: Aneesh Kumar K.V
    Cc: David Rientjes
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

27 Jul, 2012

1 commit


23 Jul, 2012

1 commit

  • ICMP messages generated in output path if frame length is bigger than
    mtu are actually lost because socket is owned by user (doing the xmit)

    One example is the ipgre_tunnel_xmit() calling
    icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));

    We had a similar case fixed in commit a34a101e1e6 (ipv6: disable GSO on
    sockets hitting dst_allfrag).

    Problem of such fix is that it relied on retransmit timers, so short tcp
    sessions paid a too big latency increase price.

    This patch uses the tcp_release_cb() infrastructure so that MTU
    reduction messages (ICMP messages) are not lost, and no extra delay
    is added in TCP transmits.

    Reported-by: Maciej Żenczykowski
    Diagnosed-by: Neal Cardwell
    Signed-off-by: Eric Dumazet
    Cc: Nandita Dukkipati
    Cc: Tom Herbert
    Cc: Tore Anderson
    Signed-off-by: David S. Miller

    Eric Dumazet