08 Aug, 2011

1 commit


03 Aug, 2011

1 commit

  • Gergely Kalman reported crashes in check_peer_redir().

    It appears commit f39925dbde778 (ipv4: Cache learned redirect
    information in inetpeer.) added a race, leading to possible NULL ptr
    dereference.

    Since we can now change dst neighbour, we should make sure a reader can
    safely use a neighbour.

    Add RCU protection to dst neighbour, and make sure check_peer_redir()
    can be called safely by different cpus in parallel.

    As neighbours are already freed after one RCU grace period, this patch
    should not add typical RCU penalty (cache cold effects)

    Many thanks to Gergely for providing a pretty report pointing to the
    bug.

    Reported-by: Gergely Kalman
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

22 Jul, 2011

1 commit

  • Because the ip fragment offset field counts 8-byte chunks, ip
    fragments other than the last must contain a multiple of 8 bytes of
    payload. ip_ufo_append_data wasn't respecting this constraint and,
    depending on the MTU and ip option sizes, could create malformed
    non-final fragments.

    Google-Bug-Id: 5009328
    Signed-off-by: Bill Sommerfeld
    Signed-off-by: David S. Miller

    Bill Sommerfeld
     

18 Jul, 2011

1 commit


17 Jul, 2011

2 commits


14 Jul, 2011

1 commit

  • Now that there is a one-to-one correspondance between neighbour
    and hh_cache entries, we no longer need:

    1) dynamic allocation
    2) attachment to dst->hh
    3) refcounting

    Initialization of the hh_cache entry is indicated by hh_len
    being non-zero, and such initialization is always done with
    the neighbour's lock held as a writer.

    Signed-off-by: David S. Miller

    David S. Miller
     

06 Jul, 2011

1 commit


02 Jul, 2011

1 commit

  • We might call ip_ufo_append_data() for packets that will be IPsec
    transformed later. This function should be used just for real
    udp packets. So we check for rt->dst.header_len which is only
    nonzero on IPsec handling and call ip_ufo_append_data() just
    if rt->dst.header_len is zero.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     

28 Jun, 2011

2 commits

  • ip_append_data() builds packets based on the mtu from dst_mtu(rt->dst.path).
    On IPsec the effective mtu is lower because we need to add the protocol
    headers and trailers later when we do the IPsec transformations. So after
    the IPsec transformations the packet might be too big, which leads to a
    slowpath fragmentation then. This patch fixes this by building the packets
    based on the lower IPsec mtu from dst_mtu(&rt->dst) and adapts the exthdr
    handling to this.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • Git commit 59104f06 (ip: take care of last fragment in ip_append_data)
    added a check to see if we exceed the mtu when we add trailer_len.
    However, the mtu is already subtracted by the trailer length when the
    xfrm transfomation bundles are set up. So IPsec packets with mtu
    size get fragmented, or if the DF bit is set the packets will not
    be send even though they match the mtu perfectly fine. This patch
    actually reverts commit 59104f06.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     

22 Jun, 2011

1 commit


10 Jun, 2011

1 commit

  • We assume that transhdrlen is positive on the first fragment
    which is wrong for raw packets. So we don't add exthdrlen to the
    packet size for raw packets. This leads to a reallocation on IPsec
    because we have not enough headroom on the skb to place the IPsec
    headers. This patch fixes this by adding exthdrlen to the packet
    size whenever the send queue of the socket is empty. This issue was
    introduced with git commit 1470ddf7 (inet: Remove explicit write
    references to sk/inet in ip_append_data)

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     

14 May, 2011

1 commit


11 May, 2011

1 commit


09 May, 2011

5 commits


07 May, 2011

3 commits


05 May, 2011

1 commit


04 May, 2011

1 commit


29 Apr, 2011

1 commit

  • We lack proper synchronization to manipulate inet->opt ip_options

    Problem is ip_make_skb() calls ip_setup_cork() and
    ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
    without any protection against another thread manipulating inet->opt.

    Another thread can change inet->opt pointer and free old one under us.

    Use RCU to protect inet->opt (changed to inet->inet_opt).

    Instead of handling atomic refcounts, just copy ip_options when
    necessary, to avoid cache line dirtying.

    We cant insert an rcu_head in struct ip_options since its included in
    skb->cb[], so this patch is large because I had to introduce a new
    ip_options_rcu structure.

    Signed-off-by: Eric Dumazet
    Cc: Herbert Xu
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Apr, 2011

1 commit


31 Mar, 2011

2 commits


13 Mar, 2011

5 commits


03 Mar, 2011

1 commit


02 Mar, 2011

5 commits

  • The patch to replace inet->cork with cork left out two spots in
    __ip_append_data that can result in bogus packet construction.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This boolean state is now available in the flow flags.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Since that is what the current vague "flags" argument means.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This patch adds the helper ip_make_skb which is like ip_append_data
    and ip_push_pending_frames all rolled into one, except that it does
    not send the skb produced. The sending part is carried out by
    ip_send_skb, which the transport protocol can call after it has
    tweaked the skb.

    It is meant to be called in cases where corking is not used should
    have a one-to-one correspondence to sendmsg.

    This patch also adds the helper ip_finish_skb which is meant to
    be replace ip_push_pending_frames when corking is required.
    Previously the protocol stack would peek at the socket write
    queue and add its header to the first packet. With ip_finish_skb,
    the protocol stack can directly operate on the final skb instead,
    just like the non-corking case with ip_make_skb.

    Signed-off-by: Herbert Xu
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • In order to allow simultaneous calls to ip_append_data on the same
    socket, it must not modify any shared state in sk or inet (other
    than those that are designed to allow that such as atomic counters).

    This patch abstracts out write references to sk and inet_sk in
    ip_append_data and its friends so that we may use the underlying
    code in parallel.

    Signed-off-by: Herbert Xu
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Herbert Xu