07 Aug, 2011

1 commit

  • Computers have become a lot faster since we compromised on the
    partial MD4 hash which we use currently for performance reasons.

    MD5 is a much safer choice, and is inline with both RFC1948 and
    other ISS generators (OpenBSD, Solaris, etc.)

    Furthermore, only having 24-bits of the sequence number be truly
    unpredictable is a very serious limitation. So the periodic
    regeneration and 8-bit counter have been removed. We compute and
    use a full 32-bit sequence number.

    For ipv6, DCCP was found to use a 32-bit truncated initial sequence
    number (it needs 43-bits) and that is fixed here as well.

    Reported-by: Dan Kaminsky
    Tested-by: Willy Tarreau
    Signed-off-by: David S. Miller

    David S. Miller
     

05 Jul, 2011

6 commits

  • CCID-2's cwnd increases like TCP during slow-start, which has implications for
    * the local Sequence Window value (should be > cwnd),
    * the Ack Ratio value.
    Hence an exponential growth, if it does not reflect the actual network
    conditions, can quickly lead to instability.

    This patch adds congestion-window validation (RFC2861) to CCID-2:
    * cwnd is constrained if the sender is application limited;
    * cwnd is reduced after a long idle period, as suggested in the '90 paper
    by Van Jacobson, in RFC 2581 (sec. 4.1);
    * cwnd is never reduced below the RFC 3390 initial window.

    As marked in the comments, the code is actually almost a direct copy of the
    TCP congestion-window-validation algorithms. By continuing this work, it may
    in future be possible to use the TCP code (not possible at the moment).

    The mechanism can be turned off using a module parameter. Sampling of the
    currently-used window (moving-maximum) is however done constantly; this is
    used to determine the expected window, which can be exploited to regulate
    DCCP's Sequence Window value.

    This patch also sets slow-start-after-idle (RFC 4341, 5.1), i.e. it behaves like
    TCP when net.ipv4.tcp_slow_start_after_idle = 1.

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     
  • This replaces a switch statement with a test, using the equivalent
    function dccp_data_packet(skb). It also doubles the range of the field
    `rx_num_data_pkts' by changing the type from `int' to `u32', avoiding
    signed/unsigned comparison with the u16 field `dccps_r_ack_ratio'.

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     
  • This moves CCID-2's initial window function into the header file, since several
    parts throughout the CCID-2 code need to call it (CCID-2 still uses RFC 3390).

    Signed-off-by: Gerrit Renker
    Acked-by: Leandro Melo de Sales

    Gerrit Renker
     
  • Change the CCID (de)activation message to start with the
    protocol name, as 'CCID' is already in there.

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     
  • Realising the following call pattern,
    * first dccp_entail() is called to enqueue a new skb and
    * then skb_clone() is called to transmit a clone of that skb,
    this patch integrates both into the same function.

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     
  • This patch rearranges the order of statements of the slow-path input processing
    (i.e. any other state than OPEN), to resolve the following issues.

    1. Dependencies: the order of statements now better matches RFC 4340, 8.5, i.e.
    step 7 is before step 9 (previously 9 was before 7), and parsing options in
    step 8 (which may consume resources) now comes after step 7.
    2. Sequence number checks are omitted if in state LISTEN/REQUEST, due to the
    note underneath the table in RFC 4340, 7.5.3.
    As a result, CCID processing is now indeed confined to OPEN/PARTOPEN states,
    i.e. congestion control is performed only on the flow of data packets. This
    avoids pathological cases of doing congestion control on those messages
    which set up and terminate the connection.
    3. Packets are now passed on to Ack Vector / CCID processing only after
    - step 7 (receive unexpected packets),
    - step 9 (receive Reset),
    - step 13 (receive CloseReq),
    - step 14 (receive Close)
    and only if the state is PARTOPEN. This simplifies CCID processing:
    - in LISTEN/CLOSED the CCIDs are non-existent;
    - in RESPOND/REQUEST the CCIDs have not yet been negotiated;
    - in CLOSEREQ and active-CLOSING the node has already closed this socket;
    - in passive-CLOSING the client is waiting for its Reset.
    In the last case, RFC 4340, 8.3 leaves it open to ignore further incoming
    data, which is the approach taken here.

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     

19 May, 2011

1 commit


12 May, 2011

1 commit


09 May, 2011

3 commits


07 May, 2011

1 commit

  • A length of zero (after subtracting two for the type and len fields) for
    the DCCPO_{CHANGE,CONFIRM}_{L,R} options will cause an underflow due to
    the subtraction. The subsequent code may read past the end of the
    options value buffer when parsing. I'm unsure of what the consequences
    of this might be, but it's probably not good.

    Signed-off-by: Dan Rosenberg
    Cc: stable@kernel.org
    Acked-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Dan Rosenberg
     

04 May, 2011

1 commit


29 Apr, 2011

2 commits

  • Now that output route lookups update the flow with
    destination address selection, we can fetch it from
    fl4->daddr instead of rt->rt_dst

    Signed-off-by: David S. Miller

    David S. Miller
     
  • We lack proper synchronization to manipulate inet->opt ip_options

    Problem is ip_make_skb() calls ip_setup_cork() and
    ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
    without any protection against another thread manipulating inet->opt.

    Another thread can change inet->opt pointer and free old one under us.

    Use RCU to protect inet->opt (changed to inet->inet_opt).

    Instead of handling atomic refcounts, just copy ip_options when
    necessary, to avoid cache line dirtying.

    We cant insert an rcu_head in struct ip_options since its included in
    skb->cb[], so this patch is large because I had to introduce a new
    ip_options_rcu structure.

    Signed-off-by: Eric Dumazet
    Cc: Herbert Xu
    Signed-off-by: David S. Miller

    Eric Dumazet
     

28 Apr, 2011

1 commit

  • These functions are used together as a unit for route resolution
    during connect(). They address the chicken-and-egg problem that
    exists when ports need to be allocated during connect() processing,
    yet such port allocations require addressing information from the
    routing code.

    It's currently more heavy handed than it needs to be, and in
    particular we allocate and initialize a flow object twice.

    Let the callers provide the on-stack flow object. That way we only
    need to initialize it once in the ip_route_connect() call.

    Later, if ip_route_newports() needs to do anything, it re-uses that
    flow object as-is except for the ports which it updates before the
    route re-lookup.

    Also, describe why this set of facilities are needed and how it works
    in a big comment.

    Signed-off-by: David S. Miller
    Reviewed-by: Eric Dumazet

    David S. Miller
     

23 Apr, 2011

1 commit


31 Mar, 2011

1 commit


13 Mar, 2011

6 commits


04 Mar, 2011

1 commit


03 Mar, 2011

1 commit


02 Mar, 2011

5 commits

  • This fixes a bug in the order of dccp_rcv_state_process() that still permitted
    reception even after closing the socket. A Reset after close thus causes a NULL
    pointer dereference by not preventing operations on an already torn-down socket.

    dccp_v4_do_rcv()
    |
    | state other than OPEN
    v
    dccp_rcv_state_process()
    |
    | DCCP_PKT_RESET
    v
    dccp_rcv_reset()
    |
    v
    dccp_time_wait()

    WARNING: at net/ipv4/inet_timewait_sock.c:141 __inet_twsk_hashdance+0x48/0x128()
    Modules linked in: arc4 ecb carl9170 rt2870sta(C) mac80211 r8712u(C) crc_ccitt ah
    [] (unwind_backtrace+0x0/0xec) from [] (warn_slowpath_common)
    [] (warn_slowpath_common+0x4c/0x64) from [] (warn_slowpath_n)
    [] (warn_slowpath_null+0x1c/0x24) from [] (__inet_twsk_hashd)
    [] (__inet_twsk_hashdance+0x48/0x128) from [] (dccp_time_wai)
    [] (dccp_time_wait+0x40/0xc8) from [] (dccp_rcv_state_proces)
    [] (dccp_rcv_state_process+0x120/0x538) from [] (dccp_v4_do_)
    [] (dccp_v4_do_rcv+0x11c/0x14c) from [] (release_sock+0xac/0)
    [] (release_sock+0xac/0x110) from [] (dccp_close+0x28c/0x380)
    [] (dccp_close+0x28c/0x380) from [] (inet_release+0x64/0x70)

    The fix is by testing the socket state first. Receiving a packet in Closed state
    now also produces the required "No connection" Reset reply of RFC 4340, 8.3.1.

    Reported-and-tested-by: Johan Hovold
    Cc: stable@kernel.org
    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • This boolean state is now available in the flow flags.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Since that is what the current vague "flags" argument means.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Since that's what the current vague "flags" thing means.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Route lookups follow a general pattern in the ipv6 code wherein
    we first find the non-IPSEC route, potentially override the
    flow destination address due to ipv6 options settings, and then
    finally make an IPSEC search using either xfrm_lookup() or
    __xfrm_lookup().

    __xfrm_lookup() is used when we want to generate a blackhole route
    if the key manager needs to resolve the IPSEC rules (in this case
    -EREMOTE is returned and the original 'dst' is left unchanged).

    Otherwise plain xfrm_lookup() is used and when asynchronous IPSEC
    resolution is necessary, we simply fail the lookup completely.

    All of these cases are encapsulated into two routines,
    ip6_dst_lookup_flow and ip6_sk_dst_lookup_flow. The latter of which
    handles unconnected UDP datagram sockets.

    Signed-off-by: David S. Miller

    David S. Miller
     

26 Feb, 2011

1 commit


25 Feb, 2011

1 commit

  • ip_route_newports() is the only place in the entire kernel that
    cares about the port members in the routing cache entry's lookup
    flow key.

    Therefore the only reason we store an entire flow inside of the
    struct rtentry is for this one special case.

    Rewrite ip_route_newports() such that:

    1) The caller passes in the original port values, so we don't need
    to use the rth->fl.fl_ip_{s,d}port values to remember them.

    2) The lookup flow is constructed by hand instead of being copied
    from the routing cache entry's flow.

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Feb, 2011

1 commit


14 Jan, 2011

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
    Documentation/trace/events.txt: Remove obsolete sched_signal_send.
    writeback: fix global_dirty_limits comment runtime -> real-time
    ppc: fix comment typo singal -> signal
    drivers: fix comment typo diable -> disable.
    m68k: fix comment typo diable -> disable.
    wireless: comment typo fix diable -> disable.
    media: comment typo fix diable -> disable.
    remove doc for obsolete dynamic-printk kernel-parameter
    remove extraneous 'is' from Documentation/iostats.txt
    Fix spelling milisec -> ms in snd_ps3 module parameter description
    Fix spelling mistakes in comments
    Revert conflicting V4L changes
    i7core_edac: fix typos in comments
    mm/rmap.c: fix comment
    sound, ca0106: Fix assignment to 'channel'.
    hrtimer: fix a typo in comment
    init/Kconfig: fix typo
    anon_inodes: fix wrong function name in comment
    fix comment typos concerning "consistent"
    poll: fix a typo in comment
    ...

    Fix up trivial conflicts in:
    - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
    - fs/ext4/ext4.h

    Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.

    Linus Torvalds
     

07 Jan, 2011

3 commits

  • The 'seq_window' sysctl sets the initial value for the DCCP Sequence Window,
    which may range from 32..2^46-1 (RFC 4340, 7.5.2). The patch sets the upper
    bound consistently to 2^32-1 on both 32 and 64 bit systems, which should be
    sufficient - with a RTT of 1sec and 1-byte packets, a seq_window of 2^32-1
    corresponds to a link speed of 34 Gbps.

    Signed-off-by: Gerrit Renker

    Gerrit Renker
     
  • Currently dccp_check_seqno allows any valid packet to update the Greatest
    Sequence Number Received, even if that packet's sequence number is less than
    the current GSR. This patch adds a check to make sure that the new packet's
    sequence number is greater than GSR.

    Signed-off-by: Samuel Jero
    Signed-off-by: Gerrit Renker

    Samuel Jero
     
  • Currently dccp_check_seqno returns 0 (indicating a valid packet) if the
    acknowledgment number is out of bounds and the sync that RFC 4340 mandates at
    this point is currently being rate-limited. This function should return -1,
    indicating an invalid packet.

    Signed-off-by: Samuel Jero
    Acked-by: Gerrit Renker

    Samuel Jero
     

23 Dec, 2010

1 commit