27 May, 2005

6 commits

  • "_s" suffix is certainly of hungarian origin.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     
  • [XFRM] Call dst_check() with appropriate cookie

    This fixes infinite loop issue with IPv6 tunnel mode.

    Signed-off-by: Kazunori Miyazawa
    Signed-off-by: Hideaki YOSHIFUJI
    Signed-off-by: David S. Miller

    Hideaki YOSHIFUJI
     
  • Here is a fixed up version of the reorder feature of netem.
    It is the same as the earlier patch plus with the bugfix from Julio merged in.
    Has expected backwards compatibility behaviour.

    Go ahead and merge this one, the TCP strangeness I was seeing was due
    to the reordering bug, and previous version of TSO patch.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Netem works better if there if packets are just queued in the inner discipline
    rather than having a separate delayed queue. Change to use the dequeue/requeue
    to peek like TBF does.

    By doing this potential qlen problems with the old method are avoided. The problems
    happened when the netem_run that moved packets from the inner discipline to the nested
    discipline failed (because inner queue was full). This happened in dequeue, so the
    effective qlen of the netem would be decreased (because of the drop), but there was
    no way to keep the outer qdisc (caller of netem dequeue) in sync.

    The problem window is still there since this patch doesn't address the issue of
    requeue failing in netem_dequeue, but that shouldn't happen since the sequence dequeue/requeue
    should always work. Long term correct fix is to implement qdisc->peek in all the qdisc's
    to allow for this (needed by several other qdisc's as well).

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Handle duplication of packets in netem by re-inserting at top of qdisc tree.
    This avoid problems with qlen accounting with nested qdisc. This recursion
    requires no additional locking but will potentially increase stack depth.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

24 May, 2005

2 commits

  • Signed-off-by: Herbert Xu
    Acked-by: Hideaki YOSHIFUJI
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • When we are doing ucopy, we try to defer the ACK generation to
    cleanup_rbuf(). This works most of the time very well, but if the
    ucopy prequeue is large, this ACKing behavior kills performance.

    With TSO, it is possible to fill the prequeue so large that by the
    time the ACK is sent and gets back to the sender, most of the window
    has emptied of data and performance suffers significantly.

    This behavior does help in some cases, so we should think about
    re-enabling this trick in the future, using some kind of limit in
    order to avoid the bug case.

    Signed-off-by: David S. Miller

    David S. Miller
     

20 May, 2005

8 commits


19 May, 2005

2 commits

  • Having frag_list members which holds wmem of an sk leads to nightmares
    with partially cloned frag skb's. The reason is that once you unleash
    a skb with a frag_list that has individual sk ownerships into the stack
    you can never undo those ownerships safely as they may have been cloned
    by things like netfilter. Since we have to undo them in order to make
    skb_linearize happy this approach leads to a dead-end.

    So let's go the other way and make this an invariant:

    For any skb on a frag_list, skb->sk must be NULL.

    That is, the socket ownership always belongs to the head skb.
    It turns out that the implementation is actually pretty simple.

    The above invariant is actually violated in the following patch
    for a short duration inside ip_fragment. This is OK because the
    offending frag_list member is either destroyed at the end of the
    slow path without being sent anywhere, or it is detached from
    the frag_list before being sent.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • It looks like skb_cow_data() does not set
    proper owner for newly created skb.

    If we have several fragments for skb and some of them
    are shared(?) or cloned (like in async IPsec) there
    might be a situation when we require recreating skb and
    thus using skb_copy() for it.
    Newly created skb has neither a destructor nor a socket
    assotiated with it, which must be copied from the old skb.
    As far as I can see, current code sets destructor and socket
    for the first one skb only and uses truesize of the first skb
    only to increment sk_wmem_alloc value.

    If above "analysis" is correct then attached patch fixes that.

    Signed-off-by: Evgeniy Polyakov
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Evgeniy Polyakov
     

06 May, 2005

5 commits


05 May, 2005

1 commit


04 May, 2005

16 commits

  • * net/irda/irda_device.c::irda_setup_dma() made conditional on
    ISA_DMA_API (it uses helpers in question and irda is usable on
    platforms that don't have them at all - think of USB IRDA, for
    example).
    * irda drivers that depend on ISA DMA marked as dependent on
    ISA_DMA_API

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • Long standing bug.
    Policy to repeat an action never worked.

    Signed-off-by: J Hadi Salim
    Signed-off-by: David S. Miller

    J Hadi Salim
     
  • I found a bug that stopped IPsec/IPv6 from working. About
    a month ago IPv6 started using rt6i_idev->dev on the cached socket dst
    entries. If the cached socket dst entry is IPsec, then rt6i_idev will
    be NULL.

    Since we want to look at the rt6i_idev of the original route in this
    case, the easiest fix is to store rt6i_idev in the IPsec dst entry just
    as we do for a number of other IPv6 route attributes. Unfortunately
    this means that we need some new code to handle the references to
    rt6i_idev. That's why this patch is bigger than it would otherwise be.

    I've also done the same thing for IPv4 since it is conceivable that
    once these idev attributes start getting used for accounting, we
    probably need to dereference them for IPv4 IPsec entries too.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Fix qlen underrun when doing duplication with netem. If netem is used
    as leaf discipline, then the parent needs to be tweaked when packets
    are duplicated.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Netem currently dumps packets into the queue when timer expires. This
    patch makes work by self-clocking (more like TBF). It fixes a bug
    when 0 delay is requested (only doing loss or duplication).

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Due to bugs in netem (fixed by later patches), it is possible to get qdisc
    qlen to go negative. If this happens the CPU ends up spinning forever
    in qdisc_run(). So add a BUG_ON() to trap it.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Some network drivers call netif_stop_queue() when detecting loss of
    carrier. This leads to packets being queued up at the qdisc level for
    an unbound period of time. In order to prevent this effect, the core
    networking stack will now cease to queue packets for any device, that
    is operationally down (i.e. the queue is flushed and disabled).

    Signed-off-by: Tommy S. Christensen
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Tommy S. Christensen
     
  • If we free up a partially processed packet because it's
    skb->len dropped to zero, we need to decrement qlen because
    we are dropping out of the top-level loop so it will do
    the decrement for us.

    Spotted by Herbert Xu.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The qlen should continue to decrement, even if we
    pop partially processed SKBs back onto the receive queue.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Let's recap the problem. The current asynchronous netlink kernel
    message processing is vulnerable to these attacks:

    1) Hit and run: Attacker sends one or more messages and then exits
    before they're processed. This may confuse/disable the next netlink
    user that gets the netlink address of the attacker since it may
    receive the responses to the attacker's messages.

    Proposed solutions:

    a) Synchronous processing.
    b) Stream mode socket.
    c) Restrict/prohibit binding.

    2) Starvation: Because various netlink rcv functions were written
    to not return until all messages have been processed on a socket,
    it is possible for these functions to execute for an arbitrarily
    long period of time. If this is successfully exploited it could
    also be used to hold rtnl forever.

    Proposed solutions:

    a) Synchronous processing.
    b) Stream mode socket.

    Firstly let's cross off solution c). It only solves the first
    problem and it has user-visible impacts. In particular, it'll
    break user space applications that expect to bind or communicate
    with specific netlink addresses (pid's).

    So we're left with a choice of synchronous processing versus
    SOCK_STREAM for netlink.

    For the moment I'm sticking with the synchronous approach as
    suggested by Alexey since it's simpler and I'd rather spend
    my time working on other things.

    However, it does have a number of deficiencies compared to the
    stream mode solution:

    1) User-space to user-space netlink communication is still vulnerable.

    2) Inefficient use of resources. This is especially true for rtnetlink
    since the lock is shared with other users such as networking drivers.
    The latter could hold the rtnl while communicating with hardware which
    causes the rtnetlink user to wait when it could be doing other things.

    3) It is still possible to DoS all netlink users by flooding the kernel
    netlink receive queue. The attacker simply fills the receive socket
    with a single netlink message that fills up the entire queue. The
    attacker then continues to call sendmsg with the same message in a loop.

    Point 3) can be countered by retransmissions in user-space code, however
    it is pretty messy.

    In light of these problems (in particular, point 3), we should implement
    stream mode netlink at some point. In the mean time, here is a patch
    that implements synchronous processing.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Here is a little optimisation for the cb_lock used by netlink_dump.
    While fixing that race earlier, I noticed that the reference count
    held by cb_lock is completely useless. The reason is that in order
    to obtain the protection of the reference count, you have to take
    the cb_lock. But the only way to take the cb_lock is through
    dereferencing the socket.

    That is, you must already possess a reference count on the socket
    before you can take advantage of the reference count held by cb_lock.
    As a corollary, we can remve the reference count held by the cb_lock.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • htb_enqueue(): Free skb and return NET_XMIT_DROP if a packet is
    destined for the direct_queue but the direct_queue is full. (Before
    this: erroneously returned NET_XMIT_SUCCESS even though the packet was
    not enqueued)

    Signed-off-by: Asim Shankar
    Signed-off-by: David S. Miller

    Asim Shankar
     
  • Signed-off-by: Folkert van Heusden
    Signed-off-by: David S. Miller

    Folkert van Heusden
     
  • Signed-off-by: Folkert van Heusden
    Signed-off-by: David S. Miller

    Folkert van Heusden