16 Sep, 2011

1 commit

  • "Possible SYN flooding on port xxxx " messages can fill logs on servers.

    Change logic to log the message only once per listener, and add two new
    SNMP counters to track :

    TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client

    TCPReqQFullDrop : number of times a SYN request was dropped because
    syncookies were not enabled.

    Based on a prior patch from Tom Herbert, and suggestions from David.

    Signed-off-by: Eric Dumazet
    CC: Tom Herbert
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Jan, 2010

1 commit

  • Currently we don't increment SYN-ACK timeouts & retransmissions
    although we do increment the same stats for SYN. We seem to have lost
    the SYN-ACK accounting with the introduction of tcp_syn_recv_timer
    (commit 2248761e in the netdev-vger-cvs tree).

    This patch fixes this issue. In the process we also rename the v4/v6
    syn/ack retransmit functions for clarity. We also add a new
    request_socket operations (syn_ack_timeout) so we can keep code in
    inet_connection_sock.c protocol agnostic.

    Signed-off-by: Octavian Purdila
    Signed-off-by: David S. Miller

    Octavian Purdila
     

03 Dec, 2009

1 commit

  • Add optional function parameters associated with sending SYNACK.
    These parameters are not needed after sending SYNACK, and are not
    used for retransmission. Avoids extending struct tcp_request_sock,
    and avoids allocating kernel memory.

    Also affects DCCP as it uses common struct request_sock_ops,
    but this parameter is currently reserved for future use.

    Signed-off-by: William.Allen.Simpson@gmail.com
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    William Allen Simpson
     

22 Nov, 2008

1 commit

  • If the slub allocator is used, kmem_cache_create() may merge two or more
    kmem_cache's into one but the cache name pointer is not updated and
    kmem_cache_name() is no longer guaranteed to return the pointer passed
    to the former function. This patch stores the kmalloc'ed pointers in the
    corresponding request_sock_ops and timewait_sock_ops structures.

    Signed-off-by: Catalin Marinas
    Acked-by: Arnaldo Carvalho de Melo
    Reviewed-by: Christoph Lameter
    Signed-off-by: David S. Miller

    Catalin Marinas
     

07 Aug, 2008

1 commit

  • If the following packet flow happen, kernel will panic.
    MathineA MathineB
    SYN
    ---------------------->
    SYN+ACK

    When a bad seq ACK is received, tcp_v4_md5_do_lookup(skb->sk, ip_hdr(skb)->daddr))
    is finally called by tcp_v4_reqsk_send_ack(), but the first parameter(skb->sk) is
    NULL at that moment, so kernel panic happens.
    This patch fixes this bug.

    OOPS output is as following:
    [ 302.812793] IP: [] tcp_v4_md5_do_lookup+0x12/0x42
    [ 302.817075] Oops: 0000 [#1] SMP
    [ 302.819815] Modules linked in: ipv6 loop dm_multipath rtc_cmos rtc_core rtc_lib pcspkr pcnet32 mii i2c_piix4 parport_pc i2c_core parport ac button ata_piix libata dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
    [ 302.849946]
    [ 302.851198] Pid: 0, comm: swapper Not tainted (2.6.27-rc1-guijf #5)
    [ 302.855184] EIP: 0060:[] EFLAGS: 00010296 CPU: 0
    [ 302.858296] EIP is at tcp_v4_md5_do_lookup+0x12/0x42
    [ 302.861027] EAX: 0000001e EBX: 00000000 ECX: 00000046 EDX: 00000046
    [ 302.864867] ESI: ceb69e00 EDI: 1467a8c0 EBP: cf75f180 ESP: c0792e54
    [ 302.868333] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    [ 302.871287] Process swapper (pid: 0, ti=c0792000 task=c0712340 task.ti=c0746000)
    [ 302.875592] Stack: c06f413a 00000000 cf75f180 ceb69e00 00000000 c05d0d86 000016d0 ceac5400
    [ 302.883275] c05d28f8 000016d0 ceb69e00 ceb69e20 681bf6e3 00001000 00000000 0a67a8c0
    [ 302.890971] ceac5400 c04250a3 c06f413a c0792eb0 c0792edc cf59a620 cf59a620 cf59a634
    [ 302.900140] Call Trace:
    [ 302.902392] [] tcp_v4_reqsk_send_ack+0x17/0x35
    [ 302.907060] [] tcp_check_req+0x156/0x372
    [ 302.910082] [] printk+0x14/0x18
    [ 302.912868] [] tcp_v4_do_rcv+0x1d3/0x2bf
    [ 302.917423] [] tcp_v4_rcv+0x563/0x5b9
    [ 302.920453] [] ip_local_deliver_finish+0xe8/0x183
    [ 302.923865] [] ip_rcv_finish+0x286/0x2a3
    [ 302.928569] [] dev_alloc_skb+0x11/0x25
    [ 302.931563] [] netif_receive_skb+0x2d6/0x33a
    [ 302.934914] [] pcnet32_poll+0x333/0x680 [pcnet32]
    [ 302.938735] [] net_rx_action+0x5c/0xfe
    [ 302.941792] [] __do_softirq+0x5d/0xc1
    [ 302.944788] [] __do_softirq+0x0/0xc1
    [ 302.948999] [] do_softirq+0x55/0x88
    [ 302.951870] [] handle_fasteoi_irq+0x0/0xa4
    [ 302.954986] [] irq_exit+0x35/0x69
    [ 302.959081] [] do_IRQ+0x99/0xae
    [ 302.961896] [] common_interrupt+0x23/0x28
    [ 302.966279] [] default_idle+0x2a/0x3d
    [ 302.969212] [] cpu_idle+0xb2/0xd2
    [ 302.972169] =======================
    [ 302.974274] Code: fc ff 84 d2 0f 84 df fd ff ff e9 34 fe ff ff 83 c4 0c 5b 5e 5f 5d c3 90 90 57 89 d7 56 53 89 c3 50 68 3a 41 6f c0 e8 e9 55 e5 ff 93 9c 04 00 00 58 85 d2 59 74 1e 8b 72 10 31 db 31 c9 85 f6
    [ 303.011610] EIP: [] tcp_v4_md5_do_lookup+0x12/0x42 SS:ESP 0068:c0792e54
    [ 303.018360] Kernel panic - not syncing: Fatal exception in interrupt

    Signed-off-by: Gui Jianfeng
    Signed-off-by: David S. Miller

    Gui Jianfeng
     

26 Jul, 2008

1 commit

  • Removes legacy reinvent-the-wheel type thing. The generic
    machinery integrates much better to automated debugging aids
    such as kerneloops.org (and others), and is unambiguous due to
    better naming. Non-intuively BUG_TRAP() is actually equal to
    WARN_ON() rather than BUG_ON() though some might actually be
    promoted to BUG_ON() but I left that to future.

    I could make at least one BUILD_BUG_ON conversion.

    Signed-off-by: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Ilpo Järvinen
     

13 Jun, 2008

1 commit

  • This reverts two changesets, ec3c0982a2dd1e671bad8e9d26c28dcba0039d87
    ("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
    the follow-on bug fix 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38
    ("tcp: Fix slab corruption with ipv6 and tcp6fuzz").

    This change causes several problems, first reported by Ingo Molnar
    as a distcc-over-loopback regression where connections were getting
    stuck.

    Ilpo Järvinen first spotted the locking problems. The new function
    added by this code, tcp_defer_accept_check(), only has the
    child socket locked, yet it is modifying state of the parent
    listening socket.

    Fixing that is non-trivial at best, because we can't simply just grab
    the parent listening socket lock at this point, because it would
    create an ABBA deadlock. The normal ordering is parent listening
    socket --> child socket, but this code path would require the
    reverse lock ordering.

    Next is a problem noticed by Vitaliy Gusev, he noted:

    ----------------------------------------
    >--- a/net/ipv4/tcp_timer.c
    >+++ b/net/ipv4/tcp_timer.c
    >@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
    > goto death;
    > }
    >
    >+ if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
    >+ tcp_send_active_reset(sk, GFP_ATOMIC);
    >+ goto death;

    Here socket sk is not attached to listening socket's request queue. tcp_done()
    will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
    release this sk) as socket is not DEAD. Therefore socket sk will be lost for
    freeing.
    ----------------------------------------

    Finally, Alexey Kuznetsov argues that there might not even be any
    real value or advantage to these new semantics even if we fix all
    of the bugs:

    ----------------------------------------
    Hiding from accept() sockets with only out-of-order data only
    is the only thing which is impossible with old approach. Is this really
    so valuable? My opinion: no, this is nothing but a new loophole
    to consume memory without control.
    ----------------------------------------

    So revert this thing for now.

    Signed-off-by: David S. Miller

    David S. Miller
     

10 Apr, 2008

1 commit

  • Allow the use of SACK and window scaling when syncookies are used
    and the client supports tcp timestamps. Options are encoded into
    the timestamp sent in the syn-ack and restored from the timestamp
    echo when the ack is received.

    Based on earlier work by Glenn Griffin.
    This patch avoids increasing the size of structs by encoding TCP
    options into the least significant bits of the timestamp and
    by not using any 'timestamp offset'.

    The downside is that the timestamp sent in the packet after the synack
    will increase by several seconds.

    changes since v1:
    don't duplicate timestamp echo decoding function, put it into ipv4/syncookie.c
    and have ipv6/syncookies.c use it.
    Feedback from Glenn Griffin: fix line indented with spaces, kill redundant if ()

    Reviewed-by: Hagen Paul Pfeifer
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

22 Mar, 2008

1 commit

  • Change TCP_DEFER_ACCEPT implementation so that it transitions a
    connection to ESTABLISHED after handshake is complete instead of
    leaving it in SYN-RECV until some data arrvies. Place connection in
    accept queue when first data packet arrives from slow path.

    Benefits:
    - established connection is now reset if it never makes it
    to the accept queue

    - diagnostic state of established matches with the packet traces
    showing completed handshake

    - TCP_DEFER_ACCEPT timeouts are expressed in seconds and can now be
    enforced with reasonable accuracy instead of rounding up to next
    exponential back-off of syn-ack retry.

    Signed-off-by: Patrick McManus
    Signed-off-by: David S. Miller

    Patrick McManus
     

01 Mar, 2008

1 commit


15 Nov, 2007

1 commit

  • The request_sock_queue's listen_opt is either vmalloc-ed or
    kmalloc-ed depending on the number of table entries. Thus it
    is expected to be handled properly on free, which is done in
    the reqsk_queue_destroy().

    However the error path in inet_csk_listen_start() calls
    the lite version of reqsk_queue_destroy, called
    __reqsk_queue_destroy, which calls the kfree unconditionally.

    Fix this and move the __reqsk_queue_destroy into a .c file as
    it looks too big to be inline.

    As David also noticed, this is an error recovery path only,
    so no locking is required and the lopt is known to be not NULL.

    reqsk_queue_yank_listen_sk is also now only used in
    net/core/request_sock.c so we should move it there too.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

08 Dec, 2006

2 commits

  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_ATOMIC is an alias of GFP_ATOMIC

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

03 Dec, 2006

3 commits

  • Based on implementation by Rick Payne.

    Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki
     
  • We currently allocate a fixed size (TCP_SYNQ_HSIZE=512) slots hash table for
    each LISTEN socket, regardless of various parameters (listen backlog for
    example)

    On x86_64, this means order-1 allocations (might fail), even for 'small'
    sockets, expecting few connections. On the contrary, a huge server wanting a
    backlog of 50000 is slowed down a bit because of this fixed limit.

    This patch makes the sizing of listen hash table a dynamic parameter,
    depending of :
    - net.core.somaxconn tunable (default is 128)
    - net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128)
    - backlog value given by user application (2nd parameter of listen())

    For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of
    kmalloc().

    We still limit memory allocation with the two existing tunables (somaxconn &
    tcp_max_syn_backlog). So for standard setups, this patch actually reduce RAM
    usage.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Fix SO_PEERSEC for tcp sockets to return the security context of
    the peer (as represented by the SA from the peer) as opposed to the
    SA used by the local/source socket.

    Signed-off-by: Venkat Yekkirala
    Signed-off-by: James Morris

    Venkat Yekkirala
     

23 Sep, 2006

1 commit

  • This automatically labels the TCP, Unix stream, and dccp child sockets
    as well as openreqs to be at the same MLS level as the peer. This will
    result in the selection of appropriately labeled IPSec Security
    Associations.

    This also uses the sock's sid (as opposed to the isec sid) in SELinux
    enforcement of secmark in rcv_skb and postroute_last hooks.

    Signed-off-by: Venkat Yekkirala
    Signed-off-by: David S. Miller

    Venkat Yekkirala
     

27 Mar, 2006

1 commit

  • Just noticed that request_sock.[ch] contain a useless assignment of
    rskq_accept_head to itself. I assume this is a typo and the 2nd one
    was supposed to be _tail. However, setting _tail to NULL is not
    needed, so the patch below just drops the 2nd assignment.

    Signed-off-By: Norbert Kiesel
    Signed-off-by: Adrian Bunk
    Signed-off-by: David S. Miller

    Norbert Kiesel
     

04 Jan, 2006

1 commit


30 Aug, 2005

4 commits


19 Jun, 2005

4 commits

  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This chunks out the accept_queue and tcp_listen_opt code and moves
    them to net/core/request_sock.c and include/net/request_sock.h, to
    make it useful for other transport protocols, DCCP being the first one
    to use it.

    Next patches will rename tcp_listen_opt to accept_sock and remove the
    inline tcp functions that just call a reqsk_queue_ function.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Ok, this one just renames some stuff to have a better namespace and to
    dissassociate it from TCP:

    struct open_request -> struct request_sock
    tcp_openreq_alloc -> reqsk_alloc
    tcp_openreq_free -> reqsk_free
    tcp_openreq_fastfree -> __reqsk_free

    With this most of the infrastructure closely resembles a struct
    sock methods subset.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Kept this first changeset minimal, without changing existing names to
    ease peer review.

    Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn
    has two new members:

    ->slab, that replaces tcp_openreq_cachep
    ->obj_size, to inform the size of the openreq descendant for
    a specific protocol

    The protocol specific fields in struct open_request were moved to a
    class hierarchy, with the things that are common to all connection
    oriented PF_INET protocols in struct inet_request_sock, the TCP ones
    in tcp_request_sock, that is an inet_request_sock, that is an
    open_request.

    I.e. this uses the same approach used for the struct sock class
    hierarchy, with sk_prot indicating if the protocol wants to use the
    open_request infrastructure by filling in sk_prot->rsk_prot with an
    or_calltable.

    Results? Performance is improved and TCP v4 now uses only 64 bytes per
    open request minisock, down from 96 without this patch :-)

    Next changeset will rename some of the structs, fields and functions
    mentioned above, struct or_calltable is way unclear, better name it
    struct request_sock_ops, s/struct open_request/struct request_sock/g,
    etc.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo