18 May, 2010

2 commits

  • This patch removes from net/ (but not any netfilter files)
    all the unnecessary return; statements that precede the
    last closing brace of void functions.

    It does not remove the returns that are immediately
    preceded by a label as gcc doesn't like that.

    Done via:
    $ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
    xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • commit 5fa782c2f5ef6c2e4f04d3e228412c9b4a4c8809
    sctp: Fix skb_over_panic resulting from multiple invalid \
    parameter errors (CVE-2010-1173) (v4)

    cause 'error cause' never be add the the ERROR chunk due to
    some typo when check valid length in sctp_init_cause_fixed().

    Signed-off-by: Wei Yongjun
    Reviewed-by: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Wei Yongjun
     

17 May, 2010

1 commit


16 May, 2010

2 commits

  • transport may be free before ICMP proto unreachable timer expire, so
    we should delete active ICMP proto unreachable timer when transport
    is going away.

    Signed-off-by: Wei Yongjun
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • (Dropped the infiniband part, because Tetsuo modified the related code,
    I will send a separate patch for it once this is accepted.)

    This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which
    allows users to reserve ports for third-party applications.

    The reserved ports will not be used by automatic port assignments
    (e.g. when calling connect() or bind() with port number 0). Explicit
    port allocation behavior is unchanged.

    Signed-off-by: Octavian Purdila
    Signed-off-by: WANG Cong
    Cc: Neil Horman
    Cc: Eric Dumazet
    Cc: Eric W. Biederman
    Signed-off-by: David S. Miller

    Amerigo Wang
     

12 May, 2010

1 commit


06 May, 2010

1 commit

  • ICMP protocol unreachable handling completely disregarded
    the fact that the user may have locked the socket. It proceeded
    to destroy the association, even though the user may have
    held the lock and had a ref on the association. This resulted
    in the following:

    Attempt to release alive inet socket f6afcc00

    =========================
    [ BUG: held lock freed! ]
    -------------------------
    somenu/2672 is freeing memory f6afcc00-f6afcfff, with a lock still held
    there!
    (sk_lock-AF_INET){+.+.+.}, at: [] sctp_connect+0x13/0x4c
    1 lock held by somenu/2672:
    #0: (sk_lock-AF_INET){+.+.+.}, at: [] sctp_connect+0x13/0x4c

    stack backtrace:
    Pid: 2672, comm: somenu Not tainted 2.6.32-telco #55
    Call Trace:
    [] ? printk+0xf/0x11
    [] debug_check_no_locks_freed+0xce/0xff
    [] kmem_cache_free+0x21/0x66
    [] __sk_free+0x9d/0xab
    [] sk_free+0x1c/0x1e
    [] sctp_association_put+0x32/0x89
    [] __sctp_connect+0x36d/0x3f4
    [] ? sctp_connect+0x13/0x4c
    [] ? autoremove_wake_function+0x0/0x33
    [] sctp_connect+0x31/0x4c
    [] inet_dgram_connect+0x4b/0x55
    [] sys_connect+0x54/0x71
    [] ? lock_release_non_nested+0x88/0x239
    [] ? might_fault+0x42/0x7c
    [] ? might_fault+0x42/0x7c
    [] sys_socketcall+0x6d/0x178
    [] ? trace_hardirqs_on_thunk+0xc/0x10
    [] syscall_call+0x7/0xb

    This was because the sctp_wait_for_connect() would aqcure the socket
    lock and then proceed to release the last reference count on the
    association, thus cause the fully destruction path to finish freeing
    the socket.

    The simplest solution is to start a very short timer in case the socket
    is owned by user. When the timer expires, we can do some verification
    and be able to do the release properly.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

04 May, 2010

1 commit


03 May, 2010

1 commit


02 May, 2010

1 commit

  • sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
    need two atomic operations (and associated dirtying) per incoming
    packet.

    RCU conversion is pretty much needed :

    1) Add a new structure, called "struct socket_wq" to hold all fields
    that will need rcu_read_lock() protection (currently: a
    wait_queue_head_t and a struct fasync_struct pointer).

    [Future patch will add a list anchor for wakeup coalescing]

    2) Attach one of such structure to each "struct socket" created in
    sock_alloc_inode().

    3) Respect RCU grace period when freeing a "struct socket_wq"

    4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
    socket_wq"

    5) Change sk_sleep() function to use new sk->sk_wq instead of
    sk->sk_sleep

    6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
    a rcu_read_lock() section.

    7) Change all sk_has_sleeper() callers to :
    - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
    - Use wq_has_sleeper() to eventually wakeup tasks.
    - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)

    8) sock_wake_async() is modified to use rcu protection as well.

    9) Exceptions :
    macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
    instead of dynamically allocated ones. They dont need rcu freeing.

    Some cleanups or followups are probably needed, (possible
    sk_callback_lock conversion to a spinlock for example...).

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 May, 2010

18 commits

  • When we create the sctp_datamsg and fragment the user data,
    we know exactly if we are sending full segments or not and
    how they might be bundled. During this time, we can mark
    messages a Nagle capable or not. This makes the check at
    transmit time much simpler.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • Right now, if the highest tsn in the SACK doesn't change, we'll
    end up scanning the transmitted lists on the transports twice:
    once for locating the highest _new_ tsn, and once for actually
    tagging chunks as acked. This is a waste, since we can record
    the highest _new_ tsn at the same time as tagging chunks. Long
    ago this was not possible because we would try to mark chunks
    as missing at the same time as tagging them acked and this approach
    didn't work. Now that the two steps are separate, we can re-use
    the old approach.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • According to RFC 4960 Section 7.2.4:
    If an endpoint is in Fast
    Recovery and a SACK arrives that advances the Cumulative TSN Ack
    Point, the miss indications are incremented for all TSNs reported
    missing in the SACK.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • rwnd_press tracks the pressure on the recieve window. Every
    timer the receive buffer overlows, we truncate the receive
    window and then grow it back. However, if we don't track
    the cumulative presser, it's possible to reach a situation
    when receive buffer is empty, but rwnd stays truncated.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • SCTP fast recovery algorithm really applies per association
    and impacts all transports.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • Right now, sctp transports are not fully initialized and when
    adding any new fields, they have to be explicitely initialized.
    This is prone to mistakes. So we switch to calling kzalloc()
    which makes things much simpler.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • We don't need to force the T3 timer any more and it's
    actually wrong to do as it causes too long of a delay.
    The timer will be started if one is not running, but if
    one is running, we leave it alone.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • The 'resent' bit is used to make sure that we don't update
    rto estimate based on retransmitted chunks. However, we already
    have the 'rto_pending' bit that we test when need to update rto,
    so 'resent' bit is just extra. Additionally, we currently have
    a bug in that we always set a 'resent' bit and thus rto estimate
    is only updated by Heartbeats.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • commit 4951feda0c60d1ef681f1a270afdd617924ab041
    sctp: Do no select unconfirmed transports for retransmissions

    added code to make sure that we do not select unconfirmed paths
    for data transmission. This caused a problem when there are only
    2 paths, 1 unconfirmed and 1 unreachable. In that case, the next
    retransmit path returned is NULL and that causes a kernel crash.

    The solution is to only change retransmit paths if we found one to use.

    Reported-by: Frank Schuster
    Signed-off-b: Vlad Yasevich

    Vlad Yasevich
     
  • This assignment isn't needed because we did it earlier already.

    Also another reason to delete the assignment is because it triggers a
    Smatch warning about checking for NULL pointers after a dereference.

    Reported-by: Vlad Yasevich
    Signed-off-by: Dan Carpenter
    Signed-off-by: Vlad Yasevich

    Dan Carpenter
     
  • This patch implement sctp association probing module, the module
    will be called sctp_probe.

    This module allows for capturing the changes to SCTP association
    state in response to incoming packets. It is used for debugging
    SCTP congestion control algorithms.

    Usage:
    $ modprobe sctp_probe [full=n] [port=n] [bufsize=n]
    $ cat /proc/net/sctpprobe

    The output format is:
    TIME ASSOC LPORT RPORT MTU RWND UNACK ...

    The output will be like this:
    9.226086 c4064c48 9000 8000 1500 53352 1 *192.168.0.19 1 4380 54784 1252 0 1500
    9.287195 c4064c48 9000 8000 1500 45144 5 *192.168.0.19 1 5880 54784 6500 0 1500
    9.289130 c4064c48 9000 8000 1500 42724 5 *192.168.0.19 1 7380 54784 6500 0 1500
    9.620332 c4064c48 9000 8000 1500 48284 4 *192.168.0.19 1 8880 54784 5200 0 1500
    ......

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich

    Wei Yongjun
     
  • sctp_chunk_is_data macro is defined to decide that
    whether a chunk is data chunk or not.

    Signed-off-by: Shan Wei
    Signed-off-by: Vlad Yasevich

    Shan Wei
     
  • An unconfirmed transport is one that we have not been
    able to reach since the beginning. There is no point in
    trying to retrasnmit data on those transports. Also, the
    specification forbids it due to security issues.

    Reported-by: Frank Schuster

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • While doing retranmit, if control chunk exists, such as
    FORWARD TSN chunk, and the DATA chunk can not be bundled with
    this control chunk because of PMTU limit, no DATA chunk
    will be retranmitted in the current implementation. This
    patch makes sure to retranmit at least one DATA chunk in this case.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich

    Wei Yongjun
     
  • While lookup the output route, we do not set the src and dest
    port. This will cause we got a wrong route if we had set the
    outbund transport to IPsec with src or dst port.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich

    Wei Yongjun
     
  • PR-SCTP extension section 3.5 Sender Side Implementation of PR-SCTP:
    C5) If a FORWARD TSN is sent, the sender MUST assure that at
    least one T3-rtx timer is running.

    So this patch fix to assure at least one T3-rtx timer is running
    if a FORWARD TSN is or will to sent.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich

    Wei Yongjun
     
  • SHUTDOWN-ACK is alaways sent to the primary path at the first time,
    but should better transmit SHUTDOWN-ACK chunk to the same destination
    transport address from which it received the SHUTDOWN chunk.
    Based on the work from Wei Yongjun .

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • The function should use the address family of the address when
    trying to determine the length of the structure.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     

29 Apr, 2010

6 commits

  • Ok, version 4

    Change Notes:
    1) Minor cleanups, from Vlads notes

    Summary:

    Hey-
    Recently, it was reported to me that the kernel could oops in the
    following way:

    kernel BUG at net/core/skbuff.c:91!
    invalid operand: 0000 [#1]
    Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
    ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
    vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
    ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
    snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
    pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
    mptbase sd_mod scsi_mod
    CPU: 0
    EIP: 0060:[] Not tainted VLI
    EFLAGS: 00010216 (2.6.9-89.0.25.EL)
    EIP is at skb_over_panic+0x1f/0x2d
    eax: 0000002c ebx: c033f461 ecx: c0357d96 edx: c040fd44
    esi: c033f461 edi: df653280 ebp: 00000000 esp: c040fd40
    ds: 007b es: 007b ss: 0068
    Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
    Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
    e0c2947d
    00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
    df653490
    00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
    00000004
    Call Trace:
    [] sctp_addto_chunk+0xb0/0x128 [sctp]
    [] sctp_addto_chunk+0xb5/0x128 [sctp]
    [] sctp_init_cause+0x3f/0x47 [sctp]
    [] sctp_process_unk_param+0xac/0xb8 [sctp]
    [] sctp_verify_init+0xcc/0x134 [sctp]
    [] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
    [] sctp_do_sm+0x41/0x77 [sctp]
    [] cache_grow+0x140/0x233
    [] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
    [] sctp_inq_push+0xe/0x10 [sctp]
    [] sctp_rcv+0x454/0x509 [sctp]
    [] ipt_hook+0x17/0x1c [iptable_filter]
    [] nf_iterate+0x40/0x81
    [] ip_local_deliver_finish+0x0/0x151
    [] ip_local_deliver_finish+0xc6/0x151
    [] nf_hook_slow+0x83/0xb5
    [] ip_local_deliver+0x1a2/0x1a9
    [] ip_local_deliver_finish+0x0/0x151
    [] ip_rcv+0x334/0x3b4
    [] netif_receive_skb+0x320/0x35b
    [] init_stall_timer+0x67/0x6a [uhci_hcd]
    [] process_backlog+0x6c/0xd9
    [] net_rx_action+0xfe/0x1f8
    [] __do_softirq+0x35/0x79
    [] handle_IRQ_event+0x0/0x4f
    [] do_softirq+0x46/0x4d

    Its an skb_over_panic BUG halt that results from processing an init chunk in
    which too many of its variable length parameters are in some way malformed.

    The problem is in sctp_process_unk_param:
    if (NULL == *errp)
    *errp = sctp_make_op_error_space(asoc, chunk,
    ntohs(chunk->chunk_hdr->length));

    if (*errp) {
    sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
    WORD_ROUND(ntohs(param.p->length)));
    sctp_addto_chunk(*errp,
    WORD_ROUND(ntohs(param.p->length)),
    param.v);

    When we allocate an error chunk, we assume that the worst case scenario requires
    that we have chunk_hdr->length data allocated, which would be correct nominally,
    given that we call sctp_addto_chunk for the violating parameter. Unfortunately,
    we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
    chunk, so the worst case situation in which all parameters are in violation
    requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.

    The result of this error is that a deliberately malformed packet sent to a
    listening host can cause a remote DOS, described in CVE-2010-1173:
    http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173

    I've tested the below fix and confirmed that it fixes the issue. We move to a
    strategy whereby we allocate a fixed size error chunk and ignore errors we don't
    have space to report. Tested by me successfully

    Signed-off-by: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Neil Horman
     
  • When we finish processing ASCONF_ACK chunk, we try to send
    the next queued ASCONF. This action runs the sctp state
    machine recursively and it's not prepared to do so.

    kernel BUG at kernel/timer.c:790!
    invalid opcode: 0000 [#1] SMP
    last sysfs file: /sys/module/ipv6/initstate
    Modules linked in: sha256_generic sctp libcrc32c ipv6 dm_multipath
    uinput 8139too i2c_piix4 8139cp mii i2c_core pcspkr virtio_net joydev
    floppy virtio_blk virtio_pci [last unloaded: scsi_wait_scan]

    Pid: 0, comm: swapper Not tainted 2.6.34-rc4 #15 /Bochs
    EIP: 0060:[] EFLAGS: 00010286 CPU: 0
    EIP is at add_timer+0xd/0x1b
    EAX: cecbab14 EBX: 000000f0 ECX: c0957b1c EDX: 03595cf4
    ESI: cecba800 EDI: cf276f00 EBP: c0957aa0 ESP: c0957aa0
    DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    Process swapper (pid: 0, ti=c0956000 task=c0988ba0 task.ti=c0956000)
    Stack:
    c0957ae0 d1851214 c0ab62e4 c0ab5f26 0500ffff 00000004 00000005 00000004
    00000000 d18694fd 00000004 1666b892 cecba800 cecba800 c0957b14
    00000004
    c0957b94 d1851b11 ceda8b00 cecba800 cf276f00 00000001 c0957b14
    000000d0
    Call Trace:
    [] ? sctp_side_effects+0x607/0xdfc [sctp]
    [] ? sctp_do_sm+0x108/0x159 [sctp]
    [] ? sctp_pname+0x0/0x1d [sctp]
    [] ? sctp_primitive_ASCONF+0x36/0x3b [sctp]
    [] ? sctp_process_asconf_ack+0x2a4/0x2d3 [sctp]
    [] ? sctp_sf_do_asconf_ack+0x1dd/0x2b4 [sctp]
    [] ? sctp_do_sm+0xb8/0x159 [sctp]
    [] ? sctp_cname+0x0/0x52 [sctp]
    [] ? sctp_assoc_bh_rcv+0xac/0xe1 [sctp]
    [] ? sctp_inq_push+0x2d/0x30 [sctp]
    [] ? sctp_rcv+0x797/0x82e [sctp]

    Tested-by: Wei Yongjun
    Signed-off-by: Yuansong Qiao
    Signed-off-by: Shuaijun Zhang
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • When calculating the INIT/INIT-ACK chunk length, we should not
    only account the length of parameters, but also the parameters
    zero padding length, such as AUTH HMACS parameter and CHUNKS
    parameter. Without the parameters zero padding length we may get
    following oops.

    skb_over_panic: text:ce2068d2 len:130 put:6 head:cac3fe00 data:cac3fe00 tail:0xcac3fe82 end:0xcac3fe80 dev:
    ------------[ cut here ]------------
    kernel BUG at net/core/skbuff.c:127!
    invalid opcode: 0000 [#2] SMP
    last sysfs file: /sys/module/aes_generic/initstate
    Modules linked in: authenc ......

    Pid: 4102, comm: sctp_darn Tainted: G D 2.6.34-rc2 #6
    EIP: 0060:[] EFLAGS: 00010282 CPU: 0
    EIP is at skb_over_panic+0x37/0x3e
    EAX: 00000078 EBX: c07c024b ECX: c07c02b9 EDX: cb607b78
    ESI: 00000000 EDI: cac3fe7a EBP: 00000002 ESP: cb607b74
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    Process sctp_darn (pid: 4102, ti=cb607000 task=cabdc990 task.ti=cb607000)
    Stack:
    c07c02b9 ce2068d2 00000082 00000006 cac3fe00 cac3fe00 cac3fe82 cac3fe80
    c07c024b cac3fe7c cac3fe7a c0608dec ca986e80 ce2068d2 00000006 0000007a
    cb8120ca ca986e80 cb812000 00000003 cb8120c4 ce208a25 cb8120ca cadd9400
    Call Trace:
    [] ? sctp_addto_chunk+0x45/0x85 [sctp]
    [] ? skb_put+0x2e/0x32
    [] ? sctp_addto_chunk+0x45/0x85 [sctp]
    [] ? sctp_make_init+0x279/0x28c [sctp]
    [] ? apic_timer_interrupt+0x2a/0x30
    [] ? sctp_sf_do_prm_asoc+0x2b/0x7b [sctp]
    [] ? sctp_do_sm+0xa0/0x14a [sctp]
    [] ? sctp_pname+0x0/0x14 [sctp]
    [] ? sctp_primitive_ASSOCIATE+0x2b/0x31 [sctp]
    [] ? sctp_sendmsg+0x7a0/0x9eb [sctp]
    [] ? inet_sendmsg+0x3b/0x43
    [] ? task_tick_fair+0x2d/0xd9
    [] ? sock_sendmsg+0xa7/0xc1
    [] ? smp_apic_timer_interrupt+0x6b/0x75
    [] ? dequeue_task_fair+0x34/0x19b
    [] ? sched_clock_local+0x17/0x11e
    [] ? _copy_from_user+0x2b/0x10c
    [] ? verify_iovec+0x3c/0x6a
    [] ? sys_sendmsg+0x186/0x1e2
    [] ? __wake_up_common+0x34/0x5b
    [] ? __wake_up+0x2c/0x3b
    [] ? tty_wakeup+0x43/0x47
    [] ? remove_wait_queue+0x16/0x24
    [] ? n_tty_read+0x5b8/0x65e
    [] ? default_wake_function+0x0/0x8
    [] ? sys_socketcall+0x17f/0x1cd
    [] ? sysenter_do_call+0x12/0x22
    Code: 0f 45 de 53 ff b0 98 00 00 00 ff b0 94 ......
    EIP: [] skb_over_panic+0x37/0x3e SS:ESP 0068:cb607b74

    To reproduce:

    # modprobe sctp
    # echo 1 > /proc/sys/net/sctp/addip_enable
    # echo 1 > /proc/sys/net/sctp/auth_enable
    # sctp_test -H 3ffe:501:ffff:100:20c:29ff:fe4d:f37e -P 800 -l
    # sctp_darn -H 3ffe:501:ffff:100:20c:29ff:fe4d:f37e -P 900 -h 192.168.0.21 -p 800 -I -s -t
    sctp_darn ready to send...
    3ffe:501:ffff:100:20c:29ff:fe4d:f37e:900-192.168.0.21:800 Interactive mode> bindx-add=192.168.0.21
    3ffe:501:ffff:100:20c:29ff:fe4d:f37e:900-192.168.0.21:800 Interactive mode> bindx-add=192.168.1.21
    3ffe:501:ffff:100:20c:29ff:fe4d:f37e:900-192.168.0.21:800 Interactive mode> snd=10

    ------------------------------------------------------------------
    eth0 has addresses: 3ffe:501:ffff:100:20c:29ff:fe4d:f37e and 192.168.0.21
    eth1 has addresses: 192.168.1.21
    ------------------------------------------------------------------

    Reported-by: George Cheimonidis
    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • Since the change of the atomics to percpu variables, we now
    have to disable BH in process context when touching percpu variables.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • When sctp attempts to update an assocition, it removes any
    addresses that were not in the updated INITs. However, the loop
    may attempt to refrence a transport with address after removing it.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • sk->sk_data_ready() of sctp socket can be called from both BH and non-BH
    contexts, but the default sk->sk_data_ready(), sock_def_readable(), can
    not be used in this case. Therefore, we have to make a new function
    sctp_data_ready() to grab sk->sk_data_ready() with BH disabling.

    =========================================================
    [ INFO: possible irq lock inversion dependency detected ]
    2.6.33-rc6 #129
    ---------------------------------------------------------
    sctp_darn/1517 just changed the state of lock:
    (clock-AF_INET){++.?..}, at: [] sock_def_readable+0x20/0x80
    but this lock took another, SOFTIRQ-unsafe lock in the past:
    (slock-AF_INET){+.-...}

    and interrupts could create inverse lock ordering between them.

    other info that might help us debug this:
    1 lock held by sctp_darn/1517:
    #0: (sk_lock-AF_INET){+.+.+.}, at: [] sctp_sendmsg+0x23d/0xc00 [sctp]

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Wei Yongjun
     

28 Apr, 2010

1 commit

  • Current socket backlog limit is not enough to really stop DDOS attacks,
    because user thread spend many time to process a full backlog each
    round, and user might crazy spin on socket lock.

    We should add backlog size and receive_queue size (aka rmem_alloc) to
    pace writers, and let user run without being slow down too much.

    Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in
    stress situations.

    Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
    receiver can now process ~200.000 pps (instead of ~100 pps before the
    patch) on a 8 core machine.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Apr, 2010

1 commit

  • Define a new function to return the waitqueue of a "struct sock".

    static inline wait_queue_head_t *sk_sleep(struct sock *sk)
    {
    return sk->sk_sleep;
    }

    Change all read occurrences of sk_sleep by a call to this function.

    Needed for a future RCU conversion. sk_sleep wont be a field directly
    available.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Apr, 2010

1 commit

  • As Herbert Xu said: we should be able to simply replace ipfragok
    with skb->local_df. commit f88037(sctp: Drop ipfargok in sctp_xmit function)
    has droped ipfragok and set local_df value properly.

    The patch kills the ipfragok parameter of .queue_xmit().

    Signed-off-by: Shan Wei
    Signed-off-by: David S. Miller

    Shan Wei
     

12 Apr, 2010

1 commit


04 Apr, 2010

1 commit


31 Mar, 2010

1 commit