17 Feb, 2011

1 commit

  • Assigning a socket in timewait state to skb->sk can trigger
    kernel oops, e.g. in nfnetlink_log, which does:

    if (skb->sk) {
    read_lock_bh(&skb->sk->sk_callback_lock);
    if (skb->sk->sk_socket && skb->sk->sk_socket->file) ...

    in the timewait case, accessing sk->sk_callback_lock and sk->sk_socket
    is invalid.

    Either all of these spots will need to add a test for sk->sk_state != TCP_TIME_WAIT,
    or xt_TPROXY must not assign a timewait socket to skb->sk.

    This does the latter.

    If a TW socket is found, assign the tproxy nfmark, but skip the skb->sk assignment,
    thus mimicking behaviour of a '-m socket .. -j MARK/ACCEPT' re-routing rule.

    The 'SYN to TW socket' case is left unchanged -- we try to redirect to the
    listener socket.

    Cc: Balazs Scheidler
    Cc: KOVACS Krisztian
    Signed-off-by: Florian Westphal
    Signed-off-by: Patrick McHardy

    Florian Westphal
     

21 Oct, 2010

3 commits


23 Sep, 2010

1 commit


05 Feb, 2009

1 commit

  • As it currently stands, skb destructors are forbidden on the
    receive path because the protocol end-points will overwrite
    any existing destructor with their own.

    This is the reason why we have to call skb_orphan in the loopback
    driver before we reinject the packet back into the stack, thus
    creating a period during which loopback traffic isn't charged
    to any socket.

    With virtualisation, we have a similar problem in that traffic
    is reinjected into the stack without being associated with any
    socket entity, thus providing no natural congestion push-back
    for those poor folks still stuck with UDP.

    Now had we been consistent in telling them that UDP simply has
    no congestion feedback, I could just fob them off. Unfortunately,
    we appear to have gone to some length in catering for this on
    the standard UDP path, with skb/socket accounting so that has
    created a very unhealthy dependency.

    Alas habits are difficult to break out of, so we may just have
    to allow skb destructors on the receive path.

    It turns out that making skb destructors useable on the receive path
    isn't as easy as it seems. For instance, simply adding skb_orphan
    to skb_set_owner_r isn't enough. This is because we assume all
    over the IP stack that skb->sk is an IP socket if present.

    The new transparent proxy code goes one step further and assumes
    that skb->sk is the receiving socket if present.

    Now all of this can be dealt with by adding simple checks such
    as only treating skb->sk as an IP socket if skb->sk->sk_family
    matches. However, it turns out that for bridging at least we
    don't need to do all of this work.

    This is of interest because most virtualisation setups use bridging
    so we don't actually go through the IP stack on the host (with
    the exception of our old nemesis the bridge netfilter, but that's
    easily taken care of).

    So this patch simply adds skb_orphan to the point just before we
    enter the IP stack, but after we've gone through the bridge on the
    receive path. It also adds an skb_orphan to the one place in
    netfilter that touches skb->sk/skb->destructor, that is, tproxy.

    One word of caution, because of the internal code structure, anyone
    wishing to deploy this must use skb_set_owner_w as opposed to
    skb_set_owner_r since many functions that create a new skb from
    an existing one will invoke skb_set_owner_w on the new skb.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

13 Oct, 2008

1 commit


08 Oct, 2008

1 commit