29 Apr, 2007

1 commit


26 Apr, 2007

19 commits

  • Spring cleaning time...

    There seems to be a lot of places in the network code that have
    extra bogus semicolons after conditionals. Most commonly is a
    bogus semicolon after: switch() { }

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • When a transmitted packet is looped back directly, CHECKSUM_PARTIAL
    maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should
    treat it as such in the stack.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • As stated in the sctp socket api draft:

    sac_info: variable

    If the sac_state is SCTP_COMM_LOST and an ABORT chunk was received
    for this association, sac_info[] contains the complete ABORT chunk as
    defined in the SCTP specification RFC2960 [RFC2960] section 3.3.7.

    We now save received ABORT chunks into the sac_info field and pass that
    to the user.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • Parameters only take effect when a corresponding flag bit is set
    and a value is specified. This means we need to check the flags
    in addition to checking for non-zero value.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • This option induces partial delivery to run as soon
    as the specified amount of data has been accumulated on
    the association. However, we give preference to fully
    reassembled messages over PD messages. In any case,
    window and buffer is freed up.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • This option was introduced in draft-ietf-tsvwg-sctpsocket-13. It
    prevents head-of-line blocking in the case of one-to-many endpoint.
    Applications enabling this option really must enable SCTP_SNDRCV event
    so that they would know where the data belongs. Based on an
    earlier patch by Ivan Skytte Jørgensen.

    Additionally, this functionality now permits multiple associations
    on the same endpoint to enter Partial Delivery. Applications should
    be extra careful, when using this functionality, to track EOR indicators.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
    on 64bit architectures, allowing us to combine the 4 bytes hole left by the
    layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
    64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
    :-)

    Many calculations that previously required that skb->{transport,network,
    mac}_header be first converted to a pointer now can be done directly, being
    meaningful as offsets or pointers.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • With this we save 8 bytes per network packet, leaving a 4 bytes hole to be used
    in further shrinking work, likely with the offsetization of other pointers,
    such as ->{data,tail,end}, at the cost of adds, that were minimized by the
    usual practice of setting skb->{mac,nh,n}.raw to a local variable that is then
    accessed multiple times in each function, it also is not more expensive than
    before with regards to most of the handling of such headers, like setting one
    of these headers to another (transport to network, etc), or subtracting, adding
    to/from it, comparing them, etc.

    Now we have this layout for sk_buff on a x86_64 machine:

    [acme@mica net-2.6.22]$ pahole vmlinux sk_buff
    struct sk_buff {
    struct sk_buff * next; /* 0 8 */
    struct sk_buff * prev; /* 8 8 */
    struct rb_node rb; /* 16 24 */
    struct sock * sk; /* 40 8 */
    ktime_t tstamp; /* 48 8 */
    struct net_device * dev; /* 56 8 */
    /* --- cacheline 1 boundary (64 bytes) --- */
    struct net_device * input_dev; /* 64 8 */
    sk_buff_data_t transport_header; /* 72 4 */
    sk_buff_data_t network_header; /* 76 4 */
    sk_buff_data_t mac_header; /* 80 4 */

    /* XXX 4 bytes hole, try to pack */

    struct dst_entry * dst; /* 88 8 */
    struct sec_path * sp; /* 96 8 */
    char cb[48]; /* 104 48 */
    /* cacheline 2 boundary (128 bytes) was 24 bytes ago*/
    unsigned int len; /* 152 4 */
    unsigned int data_len; /* 156 4 */
    unsigned int mac_len; /* 160 4 */
    union {
    __wsum csum; /* 4 */
    __u32 csum_offset; /* 4 */
    }; /* 164 4 */
    __u32 priority; /* 168 4 */
    __u8 local_df:1; /* 172 1 */
    __u8 cloned:1; /* 172 1 */
    __u8 ip_summed:2; /* 172 1 */
    __u8 nohdr:1; /* 172 1 */
    __u8 nfctinfo:3; /* 172 1 */
    __u8 pkt_type:3; /* 173 1 */
    __u8 fclone:2; /* 173 1 */
    __u8 ipvs_property:1; /* 173 1 */

    /* XXX 2 bits hole, try to pack */

    __be16 protocol; /* 174 2 */
    void (*destructor)(struct sk_buff *); /* 176 8 */
    struct nf_conntrack * nfct; /* 184 8 */
    /* --- cacheline 3 boundary (192 bytes) --- */
    struct sk_buff * nfct_reasm; /* 192 8 */
    struct nf_bridge_info *nf_bridge; /* 200 8 */
    __u16 tc_index; /* 208 2 */
    __u16 tc_verd; /* 210 2 */
    dma_cookie_t dma_cookie; /* 212 4 */
    __u32 secmark; /* 216 4 */
    __u32 mark; /* 220 4 */
    unsigned int truesize; /* 224 4 */
    atomic_t users; /* 228 4 */
    unsigned char * head; /* 232 8 */
    unsigned char * data; /* 240 8 */
    unsigned char * tail; /* 248 8 */
    /* --- cacheline 4 boundary (256 bytes) --- */
    unsigned char * end; /* 256 8 */
    }; /* size: 264, cachelines: 5 */
    /* sum members: 260, holes: 1, sum holes: 4 */
    /* bit holes: 1, sum bit holes: 2 bits */
    /* last cacheline: 8 bytes */

    On 32 bits nothing changes, and pointers continue to be used with the compiler
    turning all this abstraction layer into dust. But there are some sk_buff
    validation tricks that are now possible, humm... :-)

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and
    skb->mac to skb->mac_header, to match the names of the associated helpers
    (skb[_[re]set]_{transport,network,mac}_header).

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • For consistency with all the other skb->h.raw accessors.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • For the quite common 'skb->h.raw - skb->data' sequence.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Now the skb->nh union has just one member, .raw, i.e. it is just like the
    skb->mac union, strange, no? I'm just leaving it like that till the transport
    layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or
    ->mac_header_offset?), ditto for ->{h,nh}.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Now related to this form:

    skb->nh.ipv6h = (struct ipv6hdr *)skb_put(skb, length);

    That, as the others, is done when skb->tail is still equal to skb->data, making
    the conversion to skb_reset_network_header possible.

    Also one more case equivalent to skb->nh.raw = skb->data, of this form:

    iph = (struct ipv6hdr *)skb->data;

    skb->nh.ipv6h = iph;

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This time of the type:

    skb->nh.iph = (struct iphdr *)skb->data;

    That is completely equivalent to:

    skb->nh.raw = skb->data;

    Wonder why people love casts... :-)

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

19 Apr, 2007

1 commit

  • The way partial delivery is currently implemnted, it is possible to
    intereleave a message (either from another steram, or unordered) that
    is not part of partial delivery process. The only way to this is for
    a message to not be a fragment and be 'in order' or unorderd for a
    given stream. This will result in bypassing the reassembly/ordering
    queues where things live duing partial delivery, and the
    message will be delivered to the socket in the middle of partial delivery.

    This is a two-fold problem, in that:
    1. the app now must check the stream-id and flags which it may not
    be doing.
    2. this clearing partial delivery state from the association and results
    in ulp hanging.

    This patch is a band-aid over a much bigger problem in that we
    don't do stream interleave.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

18 Apr, 2007

2 commits

  • During the sctp_bindx() call to add additional addresses to the
    endpoint, any v4mapped addresses are converted and stored as regular
    v4 addresses. However, when trying to remove these addresses, the
    v4mapped addresses are not converted and the operation fails. This
    patch unmaps the addresses on during the remove operation as well.

    Signed-off-by: Paolo Galtieri
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Paolo Galtieri
     
  • In current implementation, LKSCTP does receive buffer accounting for
    data in sctp_receive_queue and pd_lobby. However, LKSCTP don't do
    accounting for data in frag_list when data is fragmented. In addition,
    LKSCTP doesn't do accounting for data in reasm and lobby queue in
    structure sctp_ulpq.
    When there are date in these queue, assertion failed message is printed
    in inet_sock_destruct because sk_rmem_alloc of oldsk does not become 0
    when socket is destroyed.

    Signed-off-by: Tsutomu Fujii
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Tsutomu Fujii
     

23 Mar, 2007

1 commit


20 Mar, 2007

3 commits

  • If the association has been restarted, we need to reset the
    transport congestion variables as well as accumulated error
    counts and CACC variables. If we do not, the association
    will use the wrong values and may terminate prematurely.

    This was found with a scenario where the peer restarted
    the association when lksctp was in the last HB timeout for
    its association. The restart happened, but the error counts
    have not been reset and when the timeout occurred, a newly
    restarted association was terminated due to excessive
    retransmits.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • 2960bis states (Section 8.3):

    D) Request an on-demand HEARTBEAT on a specific destination transport
    address of a given association.

    The endpoint should increment the respective error counter of the
    destination transport address each time a HEARTBEAT is sent to that
    address and not acknowledged within one RTO.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • During association restart we may have stale data sitting
    on the ULP queue waiting for ordering or reassembly. This
    data may cause severe problems if not cleaned up. In particular
    stale data pending ordering may cause problems with receive
    window exhaustion if our peer has decided to restart the
    association.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

09 Mar, 2007

1 commit


27 Feb, 2007

2 commits

  • Once we reach a point where we exceed the max.path.retrans, strike the
    transport before updating the rto. This will force transport switch at
    the right time, instead of 1 retransmit too late.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • The problem that this patch corrects happens when all of the following
    conditions are satisfisfied:
    1. PR-SCTP is used and the timeout on the chunks is set below RTO.Max.
    2. One of the paths on a multihomed associations is brought down.

    In this scenario, data will expire within the rto of the initial
    transmission and will never be retransmitted. However this data still
    fills the send buffer and is counted against the association as outstanding
    data. This causes any new data not to be sent and retransmission to not
    happen.

    The fix is to discount the abandoned data from the outstanding count and
    peers rwnd estimation. This allows new data to be sent and a retransmission
    timer restarted. Even though this new data will most likely expire within
    the rto, the timer still counts as a strike against the transport and forces
    the FORWARD-TSN chunk to be retransmitted as well.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

15 Feb, 2007

2 commits

  • The semantic effect of insert_at_head is that it would allow new registered
    sysctl entries to override existing sysctl entries of the same name. Which is
    pain for caching and the proc interface never implemented.

    I have done an audit and discovered that none of the current users of
    register_sysctl care as (excpet for directories) they do not register
    duplicate sysctl entries.

    So this patch simply removes the support for overriding existing entries in
    the sys_sysctl interface since no one uses it or cares and it makes future
    enhancments harder.

    Signed-off-by: Eric W. Biederman
    Acked-by: Ralf Baechle
    Acked-by: Martin Schwidefsky
    Cc: Russell King
    Cc: David Howells
    Cc: "Luck, Tony"
    Cc: Ralf Baechle
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Andi Kleen
    Cc: Jens Axboe
    Cc: Corey Minyard
    Cc: Neil Brown
    Cc: "John W. Linville"
    Cc: James Bottomley
    Cc: Jan Kara
    Cc: Trond Myklebust
    Cc: Mark Fasheh
    Cc: David Chinner
    Cc: "David S. Miller"
    Cc: Patrick McHardy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • After Al Viro (finally) succeeded in removing the sched.h #include in module.h
    recently, it makes sense again to remove other superfluous sched.h includes.
    There are quite a lot of files which include it but don't actually need
    anything defined in there. Presumably these includes were once needed for
    macros that used to live in sched.h, but moved to other header files in the
    course of cleaning it up.

    To ease the pain, this time I did not fiddle with any header files and only
    removed #includes from .c-files, which tend to cause less trouble.

    Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
    arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
    allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
    configs in arch/arm/configs on arm. I also checked that no new warnings were
    introduced by the patch (actually, some warnings are removed that were emitted
    by unnecessarily included header files).

    Signed-off-by: Tim Schmielau
    Acked-by: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Schmielau
     

13 Feb, 2007

1 commit

  • Many struct file_operations in the kernel can be "const". Marking them const
    moves these to the .rodata section, which avoids false sharing with potential
    dirty data. In addition it'll catch accidental writes at compile time to
    these shared resources.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

12 Feb, 2007

2 commits

  • * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (45 commits)
    [IPV4]: Restore multipath routing after rt_next changes.
    [XFRM] IPV6: Fix outbound RO transformation which is broken by IPsec tunnel patch.
    [NET]: Reorder fields of struct dst_entry
    [DECNET]: Convert decnet route to use the new dst_entry 'next' pointer
    [IPV6]: Convert ipv6 route to use the new dst_entry 'next' pointer
    [IPV4]: Convert ipv4 route to use the new dst_entry 'next' pointer
    [NET]: Introduce union in struct dst_entry to hold 'next' pointer
    [DECNET]: fix misannotation of linkinfo_dn
    [DECNET]: FRA_{DST,SRC} are le16 for decnet
    [UDP]: UDP can use sk_hash to speedup lookups
    [NET]: Fix whitespace errors.
    [NET] XFRM: Fix whitespace errors.
    [NET] X25: Fix whitespace errors.
    [NET] WANROUTER: Fix whitespace errors.
    [NET] UNIX: Fix whitespace errors.
    [NET] TIPC: Fix whitespace errors.
    [NET] SUNRPC: Fix whitespace errors.
    [NET] SCTP: Fix whitespace errors.
    [NET] SCHED: Fix whitespace errors.
    [NET] RXRPC: Fix whitespace errors.
    ...

    Linus Torvalds
     
  • Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
    corresponding "kmem_cache_zalloc()" call.

    Signed-off-by: Robert P. J. Day
    Cc: "Luck, Tony"
    Cc: Andi Kleen
    Cc: Roland McGrath
    Cc: James Bottomley
    Cc: Greg KH
    Acked-by: Joel Becker
    Cc: Steven Whitehouse
    Cc: Jan Kara
    Cc: Michael Halcrow
    Cc: "David S. Miller"
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: Chris Wright
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     

11 Feb, 2007

1 commit


31 Jan, 2007

1 commit

  • When processing a HEARTBEAT-ACK it's possible that the transport rto
    timers will not be updated because a prior T3-RTX processing would
    have cleared the rto_pending flag on the transport. However, if
    we received a valid HEARTBEAT-ACK, we want to force update the
    rto variables, so re-set the rto_pending flag before calling
    sctp_transport_update_rto().

    Signed-off-by: Vlad Yasevich
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Vlad Yasevich
     

26 Jan, 2007

1 commit


24 Jan, 2007

2 commits

  • > --- a/net/sctp/sm_statefuns.c
    > +++ b/net/sctp/sm_statefuns.c
    > @@ -462,24 +461,6 @@ sctp_disposition_t sctp_sf_do_5_1C_ack(const struct sctp_endpoint *ep,

    > - if (!init_tag) {
    > - struct sctp_chunk *reply = sctp_make_abort(asoc, chunk, 0);
    > - if (!reply)
    > - goto nomem;

    This introduced a compiler warning, easily fixed.

    Signed-off-by: Brian Haley
    Signed-off-by: David S. Miller

    Brian Haley
     
  • Currently, when association enters SHUTDOWN state,the
    implementation will SACK any DATA first and then transmit
    the SHUTDOWN chunk. This is against the order required by
    2960bis spec. SHUTDOWN must always be first, followed by
    SACK. This change forces this order and also enables bundling.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Vlad Yasevich