20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

19 Jul, 2007

1 commit


11 Jul, 2007

1 commit


06 Jul, 2007

3 commits


26 Jun, 2007

1 commit

  • sctp_sock_migrate() grabs the socket lock on a newly allocated socket while
    holding the socket lock on an old socket. lockdep worries that this might
    be a recursive lock attempt.

    task/3026 is trying to acquire lock:
    (sk_lock-AF_INET){--..}, at: [] sctp_sock_migrate+0x2e3/0x327 [sctp]
    but task is already holding lock:
    (sk_lock-AF_INET){--..}, at: [] sctp_accept+0xdf/0x1e3 [sctp]

    This patch tells lockdep that this locking is safe by using
    lock_sock_nested().

    Signed-off-by: Zach Brown
    Signed-off-by: Vlad Yasevich

    Zach Brown
     

19 Jun, 2007

2 commits

  • This is the split out of the patch that we agreed I should split
    out from my last patch. It changes space_left to be computed in the same
    way the to variable is. I know we talked about changing space_left to an
    int, but I think size_t is more appropriate, since we should never have
    negative space in our buffer, and computing using offsetof means space_left
    should now never drop below zero.

    Signed-off-by: Neil Horman
    Acked-by: Sridhar Samudrala
    Signed-off-by: Vlad Yasevich

    Neil Horman
     
  • I noted the other day while looking at a bug that was ostensibly
    in some perl networking library, that we strictly avoid allowing getsockopt
    operations to complete if we pass in oversized buffers. This seems to make
    libraries like Perl::NET malfunction since it seems to allocate oversized
    buffers for use in several operations. It also seems to be out of line with
    the way udp, tcp and ip getsockopt routines handle buffer input (since the
    *optlen pointer in both an input and an output and gets set to the length
    of the data that we copy into the buffer). This patch brings our getsockopt
    helpers into line with other protocols, and allows us to accept oversized
    buffers for our getsockopt operations. Tested by me with good results.

    Signed-off-by: Neil Horman
    Acked-by: Sridhar Samudrala
    Signed-off-by: Vlad Yasevich

    Neil Horman
     

14 Jun, 2007

6 commits


04 Jun, 2007

1 commit


25 May, 2007

1 commit


11 May, 2007

3 commits


09 May, 2007

1 commit


05 May, 2007

4 commits


04 May, 2007

1 commit

  • Cleanup of dev_base list use, with the aim to simplify making device
    list per-namespace. In almost every occasion, use of dev_base variable
    and dev->next pointer could be easily replaced by for_each_netdev
    loop. A few most complicated places were converted to using
    first_netdev()/next_netdev().

    Signed-off-by: Pavel Emelianov
    Acked-by: Kirill Korotaev
    Signed-off-by: David S. Miller

    Pavel Emelianov
     

29 Apr, 2007

1 commit


26 Apr, 2007

13 commits

  • Spring cleaning time...

    There seems to be a lot of places in the network code that have
    extra bogus semicolons after conditionals. Most commonly is a
    bogus semicolon after: switch() { }

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • When a transmitted packet is looped back directly, CHECKSUM_PARTIAL
    maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should
    treat it as such in the stack.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • As stated in the sctp socket api draft:

    sac_info: variable

    If the sac_state is SCTP_COMM_LOST and an ABORT chunk was received
    for this association, sac_info[] contains the complete ABORT chunk as
    defined in the SCTP specification RFC2960 [RFC2960] section 3.3.7.

    We now save received ABORT chunks into the sac_info field and pass that
    to the user.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • Parameters only take effect when a corresponding flag bit is set
    and a value is specified. This means we need to check the flags
    in addition to checking for non-zero value.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • This option induces partial delivery to run as soon
    as the specified amount of data has been accumulated on
    the association. However, we give preference to fully
    reassembled messages over PD messages. In any case,
    window and buffer is freed up.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • This option was introduced in draft-ietf-tsvwg-sctpsocket-13. It
    prevents head-of-line blocking in the case of one-to-many endpoint.
    Applications enabling this option really must enable SCTP_SNDRCV event
    so that they would know where the data belongs. Based on an
    earlier patch by Ivan Skytte Jørgensen.

    Additionally, this functionality now permits multiple associations
    on the same endpoint to enter Partial Delivery. Applications should
    be extra careful, when using this functionality, to track EOR indicators.

    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
    on 64bit architectures, allowing us to combine the 4 bytes hole left by the
    layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
    64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
    :-)

    Many calculations that previously required that skb->{transport,network,
    mac}_header be first converted to a pointer now can be done directly, being
    meaningful as offsets or pointers.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • With this we save 8 bytes per network packet, leaving a 4 bytes hole to be used
    in further shrinking work, likely with the offsetization of other pointers,
    such as ->{data,tail,end}, at the cost of adds, that were minimized by the
    usual practice of setting skb->{mac,nh,n}.raw to a local variable that is then
    accessed multiple times in each function, it also is not more expensive than
    before with regards to most of the handling of such headers, like setting one
    of these headers to another (transport to network, etc), or subtracting, adding
    to/from it, comparing them, etc.

    Now we have this layout for sk_buff on a x86_64 machine:

    [acme@mica net-2.6.22]$ pahole vmlinux sk_buff
    struct sk_buff {
    struct sk_buff * next; /* 0 8 */
    struct sk_buff * prev; /* 8 8 */
    struct rb_node rb; /* 16 24 */
    struct sock * sk; /* 40 8 */
    ktime_t tstamp; /* 48 8 */
    struct net_device * dev; /* 56 8 */
    /* --- cacheline 1 boundary (64 bytes) --- */
    struct net_device * input_dev; /* 64 8 */
    sk_buff_data_t transport_header; /* 72 4 */
    sk_buff_data_t network_header; /* 76 4 */
    sk_buff_data_t mac_header; /* 80 4 */

    /* XXX 4 bytes hole, try to pack */

    struct dst_entry * dst; /* 88 8 */
    struct sec_path * sp; /* 96 8 */
    char cb[48]; /* 104 48 */
    /* cacheline 2 boundary (128 bytes) was 24 bytes ago*/
    unsigned int len; /* 152 4 */
    unsigned int data_len; /* 156 4 */
    unsigned int mac_len; /* 160 4 */
    union {
    __wsum csum; /* 4 */
    __u32 csum_offset; /* 4 */
    }; /* 164 4 */
    __u32 priority; /* 168 4 */
    __u8 local_df:1; /* 172 1 */
    __u8 cloned:1; /* 172 1 */
    __u8 ip_summed:2; /* 172 1 */
    __u8 nohdr:1; /* 172 1 */
    __u8 nfctinfo:3; /* 172 1 */
    __u8 pkt_type:3; /* 173 1 */
    __u8 fclone:2; /* 173 1 */
    __u8 ipvs_property:1; /* 173 1 */

    /* XXX 2 bits hole, try to pack */

    __be16 protocol; /* 174 2 */
    void (*destructor)(struct sk_buff *); /* 176 8 */
    struct nf_conntrack * nfct; /* 184 8 */
    /* --- cacheline 3 boundary (192 bytes) --- */
    struct sk_buff * nfct_reasm; /* 192 8 */
    struct nf_bridge_info *nf_bridge; /* 200 8 */
    __u16 tc_index; /* 208 2 */
    __u16 tc_verd; /* 210 2 */
    dma_cookie_t dma_cookie; /* 212 4 */
    __u32 secmark; /* 216 4 */
    __u32 mark; /* 220 4 */
    unsigned int truesize; /* 224 4 */
    atomic_t users; /* 228 4 */
    unsigned char * head; /* 232 8 */
    unsigned char * data; /* 240 8 */
    unsigned char * tail; /* 248 8 */
    /* --- cacheline 4 boundary (256 bytes) --- */
    unsigned char * end; /* 256 8 */
    }; /* size: 264, cachelines: 5 */
    /* sum members: 260, holes: 1, sum holes: 4 */
    /* bit holes: 1, sum bit holes: 2 bits */
    /* last cacheline: 8 bytes */

    On 32 bits nothing changes, and pointers continue to be used with the compiler
    turning all this abstraction layer into dust. But there are some sk_buff
    validation tricks that are now possible, humm... :-)

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and
    skb->mac to skb->mac_header, to match the names of the associated helpers
    (skb[_[re]set]_{transport,network,mac}_header).

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo