22 Jun, 2005

8 commits


21 Jun, 2005

3 commits

  • Below is a more generic patch to do fib_lookup via netlink. For others
    we should say that we discussed this as a way to verify route selection.
    It's also possible there are others uses for this.

    In short the fist half of struct fib_result_nl is filled in by caller
    and netlink call fills in the other half and returns it.

    In case anyone is interested there is a corresponding user app to compare
    the full routing table this was used to test implementation of the LC-trie.

    Signed-off-by: David S. Miller

    Robert Olsson
     
  • This patch adds the flag XFRM_STATE_NOPMTUDISC for xfrm states. It is
    similar to the nopmtudisc on IPIP/GRE tunnels. It only has an effect
    on IPv4 tunnel mode states. For these states, it will ensure that the
    DF flag is always cleared.

    This is primarily useful to work around ICMP blackholes.

    In future this flag could also allow a larger MTU to be set within the
    tunnel just like IPIP/GRE tunnels. This could be useful for short haul
    tunnels where temporary fragmentation outside the tunnel is desired over
    smaller fragments inside the tunnel.

    Signed-off-by: Herbert Xu
    Acked-by: James Morris
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch adds xfrm_init_state which is simply a wrapper that calls
    xfrm_get_type and subsequently x->type->init_state. It also gets rid
    of the unused args argument.

    Abstracting it out allows us to add common initialisation code, e.g.,
    to set family-specific flags.

    The add_time setting in xfrm_user.c was deleted because it's already
    set by xfrm_state_alloc.

    Signed-off-by: Herbert Xu
    Acked-by: James Morris
    Signed-off-by: David S. Miller

    Herbert Xu
     

19 Jun, 2005

13 commits

  • When enabled, this should disable UCOPY prequeue'ing altogether,
    but it does not due to a missing test.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This patch changes the type of the third parameter 'length' of the
    raw_send_hdrinc() function from 'int' to 'size_t'.
    This makes sense since this function is only ever called from one
    location, and the value passed as the third parameter in that location is
    itself of type size_t, so this makes the recieving functions parameter
    type match. Also, inside raw_send_hdrinc() the 'length' variable is
    used in comparisons with unsigned values and passed as parameter to
    functions expecting unsigned values (it's used in a single comparison with
    a signed value, but that one can never actually be negative so the patch
    also casts that one to size_t to stop gcc worrying, and it is passed in a
    single instance to memcpy_fromiovecend() which expects a signed int, but
    as far as I can see that's not a problem since the value of 'length'
    shouldn't ever exceed the value of a signed int).

    Signed-off-by: Jesper Juhl
    Signed-off-by: David S. Miller

    Jesper Juhl
     
  • This patch changes the type of the local variable 'i' in
    raw_probe_proto_opt() from 'int' to 'unsigned int'. The only use of 'i' in
    this function is as a counter in a for() loop and subsequent index into
    the msg->msg_iov[] array.
    Since 'i' is compared in a loop to the unsigned variable msg->msg_iovlen
    gcc -W generates this warning :

    net/ipv4/raw.c:340: warning: comparison between signed and unsigned

    Changing 'i' to unsigned silences this warning and is safe since the array
    index can never be negative anyway, so unsigned int is the logical type to
    use for 'i' and also enables a larger msg_iov[] array (but I don't know if
    that will ever matter).

    Signed-off-by: Jesper Juhl
    Signed-off-by: David S. Miller

    Jesper Juhl
     
  • This patch gets rid of the following gcc -W warning in net/ipv4/raw.c :

    net/ipv4/raw.c:387: warning: comparison of unsigned expression < 0 is always false

    Since 'len' is of type size_t it is unsigned and can thus never be
    Signed-off-by: David S. Miller

    Jesper Juhl
     
  • This patch silences these two gcc -W warnings in net/ipv4/raw.c :

    net/ipv4/raw.c:517: warning: signed and unsigned type in conditional expression
    net/ipv4/raw.c:613: warning: signed and unsigned type in conditional expression

    It doesn't change the behaviour of the code, simply writes the conditional
    expression with plain 'if()' syntax instead of '? :' , but since this
    breaks it into sepperate statements gcc no longer complains about having
    both a signed and unsigned value in the same conditional expression.

    Signed-off-by: David S. Miller

    Jesper Juhl
     
  • In light of my recent patch to net/ipv4/udp.c that replaced the
    spin_lock_irq calls on the receive queue lock with spin_lock_bh,
    here is a similar patch for all other occurences of spin_lock_irq
    on receive/error queue locks in IPv4 and IPv6.

    In these stacks, we know that they can only be entered from user
    or softirq context. Therefore it's safe to disable BH only.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch ensures that netlink events created as a result of programns
    using ioctls (such as ifconfig, route etc) contains the correct PID of
    those events.

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     
  • This patch rectifies some rtnetlink message builders that derive the
    flags from the pid. It is now explicit like the other cases
    which get it right. Also fixes half a dozen dumpers which did not
    set NLM_F_MULTI at all.

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     
  • This fixes the CONFIG_INET=n build failure noticed
    by Andrew Morton.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • This chunks out the accept_queue and tcp_listen_opt code and moves
    them to net/core/request_sock.c and include/net/request_sock.h, to
    make it useful for other transport protocols, DCCP being the first one
    to use it.

    Next patches will rename tcp_listen_opt to accept_sock and remove the
    inline tcp functions that just call a reqsk_queue_ function.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Ok, this one just renames some stuff to have a better namespace and to
    dissassociate it from TCP:

    struct open_request -> struct request_sock
    tcp_openreq_alloc -> reqsk_alloc
    tcp_openreq_free -> reqsk_free
    tcp_openreq_fastfree -> __reqsk_free

    With this most of the infrastructure closely resembles a struct
    sock methods subset.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Kept this first changeset minimal, without changing existing names to
    ease peer review.

    Basicaly tcp_openreq_alloc now receives the or_calltable, that in turn
    has two new members:

    ->slab, that replaces tcp_openreq_cachep
    ->obj_size, to inform the size of the openreq descendant for
    a specific protocol

    The protocol specific fields in struct open_request were moved to a
    class hierarchy, with the things that are common to all connection
    oriented PF_INET protocols in struct inet_request_sock, the TCP ones
    in tcp_request_sock, that is an inet_request_sock, that is an
    open_request.

    I.e. this uses the same approach used for the struct sock class
    hierarchy, with sk_prot indicating if the protocol wants to use the
    open_request infrastructure by filling in sk_prot->rsk_prot with an
    or_calltable.

    Results? Performance is improved and TCP v4 now uses only 64 bytes per
    open request minisock, down from 96 without this patch :-)

    Next changeset will rename some of the structs, fields and functions
    mentioned above, struct or_calltable is way unclear, better name it
    struct request_sock_ops, s/struct open_request/struct request_sock/g,
    etc.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     

16 Jun, 2005

1 commit


14 Jun, 2005

5 commits


03 Jun, 2005

1 commit


01 Jun, 2005

1 commit


31 May, 2005

2 commits

  • Steven Hand wrote:
    >
    > Reconstructed forward trace:
    >
    > net/ipv4/udp.c:1334 spin_lock_irq()
    > net/ipv4/udp.c:1336 udp_checksum_complete()
    > net/core/skbuff.c:1069 skb_shinfo(skb)->nr_frags > 1
    > net/core/skbuff.c:1086 kunmap_skb_frag()
    > net/core/skbuff.h:1087 local_bh_enable()
    > kernel/softirq.c:0140 WARN_ON(irqs_disabled());

    The receive queue lock is never taken in IRQs (and should never be) so
    we can simply substitute bh for irq.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • When we have ip_queue being used from LOCAL_IN, then we end up with a
    situation where the verdicts coming back from userspace traverse the TCP
    input path from syscall context. While this seems to work most of the
    time, there's an ugly deadlock:

    syscall context is interrupted by the timer interrupt. When the timer
    interrupt leaves, the timer softirq get's scheduled and calls
    tcp_delack_timer() and alike. They themselves do bh_lock_sock(sk),
    which is already held from somewhere else -> boom.

    I've now tested the suggested solution by Patrick McHardy and Herbert Xu to
    simply use local_bh_{en,dis}able().

    Signed-off-by: Harald Welte
    Signed-off-by: David S. Miller

    Harald Welte
     

30 May, 2005

2 commits


24 May, 2005

1 commit

  • When we are doing ucopy, we try to defer the ACK generation to
    cleanup_rbuf(). This works most of the time very well, but if the
    ucopy prequeue is large, this ACKing behavior kills performance.

    With TSO, it is possible to fill the prequeue so large that by the
    time the ACK is sent and gets back to the sender, most of the window
    has emptied of data and performance suffers significantly.

    This behavior does help in some cases, so we should think about
    re-enabling this trick in the future, using some kind of limit in
    order to avoid the bug case.

    Signed-off-by: David S. Miller

    David S. Miller
     

20 May, 2005

2 commits


19 May, 2005

1 commit

  • Having frag_list members which holds wmem of an sk leads to nightmares
    with partially cloned frag skb's. The reason is that once you unleash
    a skb with a frag_list that has individual sk ownerships into the stack
    you can never undo those ownerships safely as they may have been cloned
    by things like netfilter. Since we have to undo them in order to make
    skb_linearize happy this approach leads to a dead-end.

    So let's go the other way and make this an invariant:

    For any skb on a frag_list, skb->sk must be NULL.

    That is, the socket ownership always belongs to the head skb.
    It turns out that the implementation is actually pretty simple.

    The above invariant is actually violated in the following patch
    for a short duration inside ip_fragment. This is OK because the
    offending frag_list member is either destroyed at the end of the
    slow path without being sent anywhere, or it is detached from
    the frag_list before being sent.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu