09 Nov, 2011

1 commit


17 Sep, 2011

1 commit

  • Attempt to reduce the number of IP packets emitted in response to single
    SCTP packet (2e3216cd) introduced a complication - if a packet contains
    two COOKIE_ECHO chunks and nothing else then SCTP state machine corks the
    socket while processing first COOKIE_ECHO and then loses the association
    and forgets to uncork the socket. To deal with the issue add new SCTP
    command which can be used to set association explictly. Use this new
    command when processing second COOKIE_ECHO chunk to restore the context
    for SCTP state machine.

    Signed-off-by: Max Matveev
    Signed-off-by: David S. Miller

    Max Matveev
     

14 Jul, 2011

1 commit


08 Jul, 2011

1 commit

  • When initiating a graceful shutdown while having data chunks
    on the retransmission queue with a peer which is in zero
    window mode the shutdown is never completed because the
    retransmission error count is reset periodically by the
    following two rules:

    - Do not timeout association while doing zero window probe.
    - Reset overall error count when a heartbeat request has
    been acknowledged.

    The graceful shutdown will wait for all outstanding TSN to
    be acknowledged before sending the SHUTDOWN request. This
    never happens due to the peer's zero window not acknowledging
    the continuously retransmitted data chunks. Although the
    error counter is incremented for each failed retransmission,
    the receiving of the SACK announcing the zero window clears
    the error count again immediately. Also heartbeat requests
    continue to be sent periodically. The peer acknowledges these
    requests causing the error counter to be reset as well.

    This patch changes behaviour to only reset the overall error
    counter for the above rules while not in shutdown. After
    reaching the maximum number of retransmission attempts, the
    T5 shutdown guard timer is scheduled to give the receiver
    some additional time to recover. The timer is stopped as soon
    as the receiver acknowledges any data.

    The issue can be easily reproduced by establishing a sctp
    association over the loopback device, constantly queueing
    data at the sender while not reading any at the receiver.
    Wait for the window to reach zero, then initiate a shutdown
    by killing both processes simultaneously. The association
    will never be freed and the chunks on the retransmission
    queue will be retransmitted indefinitely.

    Signed-off-by: Thomas Graf
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Thomas Graf
     

17 Jun, 2011

1 commit

  • Unnecessary casts of void * clutter the code.

    These are the remainder casts after several specific
    patches to remove netdev_priv and dev_priv.

    Done via coccinelle script:

    $ cat cast_void_pointer.cocci
    @@
    type T;
    T *pt;
    void *pv;
    @@

    - pt = (T *)pv;
    + pt = pv;

    Signed-off-by: Joe Perches
    Acked-by: Paul Moore
    Signed-off-by: David S. Miller

    Joe Perches
     

01 Jun, 2011

1 commit


20 Apr, 2011

2 commits

  • SCTP does not check whether the source address of COOKIE-ECHO
    chunk is the original address of INIT chunk or part of the any
    address parameters saved in COOKIE in CLOSED state. So even if
    the COOKIE-ECHO chunk is from any address but with correct COOKIE,
    the COOKIE-ECHO chunk still be accepted. If the COOKIE is not from
    a valid address, the assoc should not be established.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Wei Yongjun
     
  • Remove SCTP_CMD_TRANSMIT command as it never be used.

    Signed-off-by: Shan Wei
    Signed-off-by: Vlad Yasevich
    Signed-off-by: Wei Yongjun
    Signed-off-by: David S. Miller

    Shan Wei
     

31 Mar, 2011

1 commit


27 Aug, 2010

1 commit

  • Change SCTP_DEBUG_PRINTK and SCTP_DEBUG_PRINTK_IPADDR to
    use do { print } while (0) guards.
    Add SCTP_DEBUG_PRINTK_CONT to fix errors in log when
    lines were continued.
    Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
    Add a missing newline in "Failed bind hash alloc"

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

18 May, 2010

1 commit

  • This patch removes from net/ (but not any netfilter files)
    all the unnecessary return; statements that precede the
    last closing brace of void functions.

    It does not remove the returns that are immediately
    preceded by a label as gcc doesn't like that.

    Done via:
    $ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
    xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

12 May, 2010

1 commit


06 May, 2010

1 commit

  • ICMP protocol unreachable handling completely disregarded
    the fact that the user may have locked the socket. It proceeded
    to destroy the association, even though the user may have
    held the lock and had a ref on the association. This resulted
    in the following:

    Attempt to release alive inet socket f6afcc00

    =========================
    [ BUG: held lock freed! ]
    -------------------------
    somenu/2672 is freeing memory f6afcc00-f6afcfff, with a lock still held
    there!
    (sk_lock-AF_INET){+.+.+.}, at: [] sctp_connect+0x13/0x4c
    1 lock held by somenu/2672:
    #0: (sk_lock-AF_INET){+.+.+.}, at: [] sctp_connect+0x13/0x4c

    stack backtrace:
    Pid: 2672, comm: somenu Not tainted 2.6.32-telco #55
    Call Trace:
    [] ? printk+0xf/0x11
    [] debug_check_no_locks_freed+0xce/0xff
    [] kmem_cache_free+0x21/0x66
    [] __sk_free+0x9d/0xab
    [] sk_free+0x1c/0x1e
    [] sctp_association_put+0x32/0x89
    [] __sctp_connect+0x36d/0x3f4
    [] ? sctp_connect+0x13/0x4c
    [] ? autoremove_wake_function+0x0/0x33
    [] sctp_connect+0x31/0x4c
    [] inet_dgram_connect+0x4b/0x55
    [] sys_connect+0x54/0x71
    [] ? lock_release_non_nested+0x88/0x239
    [] ? might_fault+0x42/0x7c
    [] ? might_fault+0x42/0x7c
    [] sys_socketcall+0x6d/0x178
    [] ? trace_hardirqs_on_thunk+0xc/0x10
    [] syscall_call+0x7/0xb

    This was because the sctp_wait_for_connect() would aqcure the socket
    lock and then proceed to release the last reference count on the
    association, thus cause the fully destruction path to finish freeing
    the socket.

    The simplest solution is to start a very short timer in case the socket
    is owned by user. When the timer expires, we can do some verification
    and be able to do the release properly.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

04 May, 2010

1 commit


01 May, 2010

1 commit


29 Apr, 2010

1 commit

  • When we finish processing ASCONF_ACK chunk, we try to send
    the next queued ASCONF. This action runs the sctp state
    machine recursively and it's not prepared to do so.

    kernel BUG at kernel/timer.c:790!
    invalid opcode: 0000 [#1] SMP
    last sysfs file: /sys/module/ipv6/initstate
    Modules linked in: sha256_generic sctp libcrc32c ipv6 dm_multipath
    uinput 8139too i2c_piix4 8139cp mii i2c_core pcspkr virtio_net joydev
    floppy virtio_blk virtio_pci [last unloaded: scsi_wait_scan]

    Pid: 0, comm: swapper Not tainted 2.6.34-rc4 #15 /Bochs
    EIP: 0060:[] EFLAGS: 00010286 CPU: 0
    EIP is at add_timer+0xd/0x1b
    EAX: cecbab14 EBX: 000000f0 ECX: c0957b1c EDX: 03595cf4
    ESI: cecba800 EDI: cf276f00 EBP: c0957aa0 ESP: c0957aa0
    DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    Process swapper (pid: 0, ti=c0956000 task=c0988ba0 task.ti=c0956000)
    Stack:
    c0957ae0 d1851214 c0ab62e4 c0ab5f26 0500ffff 00000004 00000005 00000004
    00000000 d18694fd 00000004 1666b892 cecba800 cecba800 c0957b14
    00000004
    c0957b94 d1851b11 ceda8b00 cecba800 cf276f00 00000001 c0957b14
    000000d0
    Call Trace:
    [] ? sctp_side_effects+0x607/0xdfc [sctp]
    [] ? sctp_do_sm+0x108/0x159 [sctp]
    [] ? sctp_pname+0x0/0x1d [sctp]
    [] ? sctp_primitive_ASCONF+0x36/0x3b [sctp]
    [] ? sctp_process_asconf_ack+0x2a4/0x2d3 [sctp]
    [] ? sctp_sf_do_asconf_ack+0x1dd/0x2b4 [sctp]
    [] ? sctp_do_sm+0xb8/0x159 [sctp]
    [] ? sctp_cname+0x0/0x52 [sctp]
    [] ? sctp_assoc_bh_rcv+0xac/0xe1 [sctp]
    [] ? sctp_inq_push+0x2d/0x30 [sctp]
    [] ? sctp_rcv+0x797/0x82e [sctp]

    Tested-by: Wei Yongjun
    Signed-off-by: Yuansong Qiao
    Signed-off-by: Shuaijun Zhang
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

09 Feb, 2010

1 commit

  • In particular, several occurances of funny versions of 'success',
    'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
    'beginning', 'desirable', 'separate' and 'necessary' are fixed.

    Signed-off-by: Daniel Mack
    Cc: Joe Perches
    Cc: Junio C Hamano
    Signed-off-by: Jiri Kosina

    Daniel Mack
     

10 Dec, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
    tree-wide: fix misspelling of "definition" in comments
    reiserfs: fix misspelling of "journaled"
    doc: Fix a typo in slub.txt.
    inotify: remove superfluous return code check
    hdlc: spelling fix in find_pvc() comment
    doc: fix regulator docs cut-and-pasteism
    mtd: Fix comment in Kconfig
    doc: Fix IRQ chip docs
    tree-wide: fix assorted typos all over the place
    drivers/ata/libata-sff.c: comment spelling fixes
    fix typos/grammos in Documentation/edac.txt
    sysctl: add missing comments
    fs/debugfs/inode.c: fix comment typos
    sgivwfb: Make use of ARRAY_SIZE.
    sky2: fix sky2_link_down copy/paste comment error
    tree-wide: fix typos "couter" -> "counter"
    tree-wide: fix typos "offest" -> "offset"
    fix kerneldoc for set_irq_msi()
    spidev: fix double "of of" in comment
    comment typo fix: sybsystem -> subsystem
    ...

    Linus Torvalds
     

08 Dec, 2009

1 commit


04 Dec, 2009

1 commit

  • That is "success", "unknown", "through", "performance", "[re|un]mapping"
    , "access", "default", "reasonable", "[con]currently", "temperature"
    , "channel", "[un]used", "application", "example","hierarchy", "therefore"
    , "[over|under]flow", "contiguous", "threshold", "enough" and others.

    Signed-off-by: André Goddard Rosa
    Signed-off-by: Jiri Kosina

    André Goddard Rosa
     

29 Nov, 2009

2 commits

  • Conflicts:
    drivers/ieee802154/fakehard.c
    drivers/net/e1000e/ich8lan.c
    drivers/net/e1000e/phy.c
    drivers/net/netxen/netxen_nic_init.c
    drivers/net/wireless/ath/ath9k/main.c

    David S. Miller
     
  • When retransmitting due to T3 timeout, retransmit all the
    in-flight chunks for the corresponding transport/path, including
    chunks sent less then 1 rto ago.
    This is the correct behaviour according to rfc4960 section 6.3.3
    E3 and
    "Note: Any DATA chunks that were sent to the address for which the
    T3-rtx timer expired but did not fit in one MTU (rule E3 above)
    should be marked for retransmission and sent as soon as cwnd
    allows (normally, when a SACK arrives). ".

    This fixes problems when more then one path is present and the T3
    retransmission of the first chunk that timeouts stops the T3 timer
    for the initial active path, leaving all the other in-flight
    chunks waiting forever or until a new chunk is transmitted on the
    same path and timeouts (and this will happen only if the cwnd
    allows sending new chunks, but since cwnd was dropped to MTU by
    the timeout => it will wait until the first heartbeat).

    Example: 10 packets in flight, sent at 0.1 s intervals on the
    primary path. The primary path is down and the first packet
    timeouts. The first packet is retransmitted on another path, the
    T3 timer for the primary path is stopped and cwnd is set to MTU.
    All the other 9 in-flight packets will not be retransmitted
    (unless more new packets are sent on the primary path which depend
    on cwnd allowing it, and even in this case the 9 packets will be
    retransmitted only after a new packet timeouts which even in the
    best case would be more then RTO).

    This commit reverts d0ce92910bc04e107b2f3f2048f07e94f570035d and
    also removes the now unused transport->last_rto, introduced in
    b6157d8e03e1e780660a328f7183bcbfa4a93a19.

    p.s The problem is not only when multiple paths are there. It
    can happen in a single homed environment. If the application
    stops sending data, it possible to have a hung association.

    Signed-off-by: Andrei Pelinescu-Onciul
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Andrei Pelinescu-Onciul
     

24 Nov, 2009

2 commits

  • We currently send window update SACKs every time we free up 1 PMTU
    worth of data. That a lot more SACKs then necessary. Instead, we'll
    now send back the actuall window every time we send a sack, and do
    window-update SACKs when a fraction of the receive buffer has been
    opened. The fraction is controlled with a sysctl.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • When sctp_connectx() is used, we pick the first address as
    primary, even though it may not have worked. This results
    in excessive retransmits and poor performance. We should
    select the address that the association was established with.

    Reported-by: Thomas Dreibholz
    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     

05 Sep, 2009

3 commits

  • We currently set a_rwnd to 0 when faking a SACK from SHUTDOWN.
    This results in an hung association if the remote only uses
    SHUTDOWNs (which it's allowed to do) to acknowlege DATA when
    closing. The reason for that is that we simply honor the a_rwnd
    from the sack, but since we faked it to be 0, we enter 0-window
    probing. The fix is to use the peers old rwnd and add our flight
    size to it.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • SCTP RFC 4960 states that unacknowledged HEARTBEATS count as
    errors agains a given transport or endpoint. As such, we
    should increment the error counts for only for unacknowledged
    HB, otherwise we detect failure too soon. This goes for both
    the overall error count and the path error count.

    Now, there is a difference in how the detection is done
    between the two. The path error detection is done after
    the increment, so to detect it properly, we actually need
    to exceed the path threshold. The overall error detection
    is done _BEFORE_ the increment. Thus to detect the failure,
    it's enough for the error count to match the threshold.
    This is why all the state functions use '>=' to detect failure,
    while path detection uses '>'.

    Thanks goes to Chunbo Luo who first
    proposed patches to fix this issue and made me re-read the spec
    and the code to figure out how this cruft really works.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     
  • Currenlty, sctp breaks up user messages into fragments and
    sends each fragment to the lower layer by itself. This means
    that for each fragment we go all the way down the stack
    and back up. This also discourages bundling of multiple
    fragments when they can fit into a sigle packet (ex: due
    to user setting a low fragmentation threashold).

    We introduce a new command SCTP_CMD_SND_MSG and hand the
    whole message down state machine. The state machine and
    the side-effect parser will cork the queue, add all chunks
    from the message to the queue, and then un-cork the queue
    thus causing the chunks to get transmitted.

    Signed-off-by: Vlad Yasevich

    Vlad Yasevich
     

03 Jun, 2009

1 commit

  • RFC 5061 Section 5.1 ASCONF Chunk Procedures said:

    B4) Re-transmit the ASCONF Chunk last sent and if possible choose an
    alternate destination address (please refer to [RFC4960],
    Section 6.4.1). An endpoint MUST NOT add new parameters to this
    chunk; it MUST be the same (including its Sequence Number) as
    the last ASCONF sent. An endpoint MAY, however, bundle an
    additional ASCONF with new ASCONF parameters with the next
    Sequence Number. For details, see Section 5.5.

    This patch fix to choose an alternate destination address when
    re-transmit the ASCONF chunk, with some dup codes cleanup.

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich

    Wei Yongjun
     

05 Mar, 2009

1 commit


03 Mar, 2009

2 commits

  • Commit faee47cdbfe8d74a1573c2f81ea6dbb08d735be6
    (sctp: Fix the RTO-doubling on idle-link heartbeats)
    broke the RTO doubling for data retransmits. If the
    heartbeat was sent before the data T3-rtx time, the
    the RTO will not double upon the T3-rtx expiration.
    Distingish between the operations by passing an argument
    to the function.

    Additionally, Wei Youngjun pointed out that our treatment
    of requested HEARTBEATS and timer HEARTBEATS is the same
    wrt resetting congestion window. That needs to be separated,
    since user requested HEARTBEATS should not treat the link
    as idle.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • If ERROR chunk is received with too many error causes in ESTABLISHED
    state, the kernel get panic.

    This is because sctp limit the max length of cmds to 14, but while
    ERROR chunk is received, one error cause will add around 2 cmds by
    sctp_add_cmd_sf(). So many error causes will fill the limit of cmds
    and panic.

    This patch fixed the problem.

    This bug can be test by SCTP Conformance Test Suite
    .

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Wei Yongjun
     

16 Feb, 2009

1 commit

  • SCTP incorrectly doubles rto ever time a Hearbeat chunk
    is generated. However RFC 4960 states:

    On an idle destination address that is allowed to heartbeat, it is
    recommended that a HEARTBEAT chunk is sent once per RTO of that
    destination address plus the protocol parameter 'HB.interval', with
    jittering of +/- 50% of the RTO value, and exponential backoff of the
    RTO if the previous HEARTBEAT is unanswered.

    Essentially, of if the heartbean is unacknowledged, do we double the RTO.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

09 Oct, 2008

1 commit

  • The tsn map currently use is 4K large and is stuck inside
    the sctp_association structure making memory references REALLY
    expensive. What we really need is at most 4K worth of bits
    so the biggest map we would have is 512 bytes. Also, the
    map is only really usefull when we have gaps to store and
    report. As such, starting with minimal map of say 32 TSNs (bits)
    should be enough for normal low-loss operations. We can grow
    the map by some multiple of 32 along with some extra room any
    time we receive the TSN which would put us outside of the map
    boundry. As we close gaps, we can shift the map to rebase
    it on the latest TSN we've seen. This saves 4088 bytes per
    association just in the map alone along savings from the now
    unnecessary structure members.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

01 Oct, 2008

1 commit


20 Jun, 2008

1 commit

  • RFC 4960, Section 11.4. Protection of Non-SCTP-Capable Hosts

    When an SCTP stack receives a packet containing multiple control or
    DATA chunks and the processing of the packet requires the sending of
    multiple chunks in response, the sender of the response chunk(s) MUST
    NOT send more than one packet. If bundling is supported, multiple
    response chunks that fit into a single packet MAY be bundled together
    into one single response packet. If bundling is not supported, then
    the sender MUST NOT send more than one response chunk and MUST
    discard all other responses. Note that this rule does NOT apply to a
    SACK chunk, since a SACK chunk is, in itself, a response to DATA and
    a SACK does not require a response of more DATA.

    We implement this by not servicing our outqueue until we reach the end
    of the packet. This enables maximum bundling. We also identify
    'response' chunks and make sure that we only send 1 packet when sending
    such chunks.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

10 May, 2008

1 commit


14 Apr, 2008

1 commit


13 Apr, 2008

2 commits