15 Mar, 2013

1 commit

  • When neighbour table is full, dst_neigh_lookup/dst_neigh_lookup_skb will return
    -ENOBUFS which is absolutely non zero, while all the code in kernel which use
    above functions assume failure only on zero return which will cause panic. (for
    example: : https://bugzilla.kernel.org/show_bug.cgi?id=54731).

    This patch corrects above error with smallest changes to kernel source code and
    also correct two return value check missing bugs in drivers/infiniband/hw/cxgb4/cm.c

    Tested on my x86_64 SMP machine

    Reported-by: Zhouyi Zhou
    Tested-by: Zhouyi Zhou
    Signed-off-by: Zhouyi Zhou
    Signed-off-by: David S. Miller

    Zhouyi Zhou
     

21 Feb, 2013

1 commit

  • Eric Dumazet wrote:
    | Some strange crashes happen in rt6_check_expired(), with access
    | to random addresses.
    |
    | At first glance, it looks like the RTF_EXPIRES and
    | stuff added in commit 1716a96101c49186b
    | (ipv6: fix problem with expired dst cache)
    | are racy : same dst could be manipulated at the same time
    | on different cpus.
    |
    | At some point, our stack believes rt->dst.from contains a dst pointer,
    | while its really a jiffie value (as rt->dst.expires shares the same area
    | of memory)
    |
    | rt6_update_expires() should be fixed, or am I missing something ?
    |
    | CC Neil because of https://bugzilla.redhat.com/show_bug.cgi?id=892060

    Because we do not have any locks for dst_entry, we cannot change
    essential structure in the entry; e.g., we cannot change reference
    to other entity.

    To fix this issue, split 'from' and 'expires' field in dst_entry
    out of union. Once it is 'from' is assigned in the constructor,
    keep the reference until the very last stage of the life time of
    the object.

    Of course, it is unsafe to change 'from', so make rt6_set_from simple
    just for fresh entries.

    Reported-by: Eric Dumazet
    Reported-by: Neil Horman
    CC: Gao Feng
    Signed-off-by: YOSHIFUJI Hideaki
    Reviewed-by: Eric Dumazet
    Reported-by: Steinar H. Gunderson
    Reviewed-by: Neil Horman
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki / 吉藤英明
     

06 Feb, 2013

1 commit

  • As the default, we blackhole packets until the key manager resolves
    the states. This patch implements a packet queue where IPsec packets
    are queued until the states are resolved. We generate a dummy xfrm
    bundle, the output routine of the returned route enqueues the packet
    to a per policy queue and arms a timer that checks for state resolution
    when dst_output() is called. Once the states are resolved, the packets
    are sent out of the queue. If the states are not resolved after some
    time, the queue is flushed.

    This patch keeps the defaut behaviour to blackhole packets as long
    as we have no states. To enable the packet queue the sysctl
    xfrm_larval_drop must be switched off.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     

23 Aug, 2012

1 commit


09 Aug, 2012

1 commit

  • While investigating on network performance problems, I found this little
    gem :

    $ nm -v vmlinux | grep -1 dst_default_metrics
    ffffffff82736540 b busy.46605
    ffffffff82736560 B dst_default_metrics
    ffffffff82736598 b dst_busy_list

    Apparently, declaring a const array without initializer put it in
    (writeable) bss section, in middle of possibly often dirtied cache
    lines.

    Since we really want dst_default_metrics be const to avoid any possible
    false sharing and catch any buggy writes, I force a null initializer.

    ffffffff818a4c20 R dst_default_metrics

    Signed-off-by: Eric Dumazet
    Cc: Ben Hutchings
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Aug, 2012

1 commit

  • 1) Avoid dirtying neighbour's confirmed field.

    TCP workloads hits this cache line for each incoming ACK.
    Lets write n->confirmed only if there is a jiffie change.

    2) Optimize neigh_hh_output() for the common Ethernet case, were
    hh_len is less than 16 bytes. Replace the memcpy() call
    by two inlined 64bit load/stores on x86_64.

    Bench results using udpflood test, with -C option (MSG_CONFIRM flag
    added to sendto(), to reproduce the n->confirmed dirtying on UDP)

    24 threads doing 1.000.000 UDP sendto() on dummy device, 4 runs.

    before : 2.247s, 2.235s, 2.247s, 2.318s
    after : 1.884s, 1.905s, 1.891s, 1.895s

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Jul, 2012

2 commits


11 Jul, 2012

1 commit


05 Jul, 2012

3 commits


17 Jun, 2012

1 commit


27 May, 2012

1 commit

  • Since commit ad0081e43a
    "ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed"
    the fragment of packets is incorrect.
    because tunnel mode needs IPsec headers and trailer for all fragments,
    while on transport mode it is sufficient to add the headers to the
    first fragment and the trailer to the last.

    so modify mtu and maxfraglen base on ipsec mode and if fragment is first
    or last.

    with my test,it work well(every fragment's size is the mtu)
    and does not trigger slow fragment path.

    Changes from v1:
    though optimization, mtu_prev and maxfraglen_prev can be delete.
    replace xfrm mode codes with dst_entry's new frag DST_XFRM_TUNNEL.
    add fuction ip6_append_data_mtu to make codes clearer.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

24 Apr, 2012

1 commit

  • bridge: set fake_rtable's dst to NULL to avoid kernel Oops

    when bridge is deleted before tap/vif device's delete, kernel may
    encounter an oops because of NULL reference to fake_rtable's dst.
    Set fake_rtable's dst to NULL before sending packets out can solve
    this problem.

    v4 reformat, change br_drop_fake_rtable(skb) to {}

    v3 enrich commit header

    v2 introducing new flag DST_FAKE_RTABLE to dst_entry struct.

    [ Use "do { } while (0)" for nop br_drop_fake_rtable()
    implementation -DaveM ]

    Acked-by: Eric Dumazet
    Signed-off-by: Peter Huang
    Signed-off-by: David S. Miller

    Peter Huang (Peng)
     

14 Apr, 2012

1 commit

  • If the ipv6 dst cache which copy from the dst generated by ICMPV6 RA packet.
    this dst cache will not check expire because it has no RTF_EXPIRES flag.
    So this dst cache will always be used until the dst gc run.

    Change the struct dst_entry,add a union contains new pointer from and expires.
    When rt6_info.rt6i_flags has no RTF_EXPIRES flag,the dst.expires has no use.
    we can use this field to point to where the dst cache copy from.
    The dst.from is only used in IPV6.

    rt6_check_expired check if rt6_info.dst.from is expired.

    ip6_rt_copy only set dst.from when the ort has flag RTF_ADDRCONF
    and RTF_DEFAULT.then hold the ort.

    ip6_dst_destroy release the ort.

    Add some functions to operate the RTF_EXPIRES flag and expires(from) together.
    and change the code to use these new adding functions.

    Changes from v5:
    modify ip6_route_add and ndisc_router_discovery to use new adding functions.

    Only set dst.from when the ort has flag RTF_ADDRCONF
    and RTF_DEFAULT.then hold the ort.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

05 Mar, 2012

1 commit

  • If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
    other BUG variant in a static inline (i.e. not in a #define) then
    that header really should be including and not just
    expecting it to be implicitly present.

    We can make this change risk-free, since if the files using these
    headers didn't have exposure to linux/bug.h already, they would have
    been causing compile failures/warnings.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

24 Dec, 2011

1 commit


23 Dec, 2011

1 commit

  • Chris Boot reported crashes occurring in ipv6_select_ident().

    [ 461.457562] RIP: 0010:[] []
    ipv6_select_ident+0x31/0xa7

    [ 461.578229] Call Trace:
    [ 461.580742]
    [ 461.582870] [] ? udp6_ufo_fragment+0x124/0x1a2
    [ 461.589054] [] ? ipv6_gso_segment+0xc0/0x155
    [ 461.595140] [] ? skb_gso_segment+0x208/0x28b
    [ 461.601198] [] ? ipv6_confirm+0x146/0x15e
    [nf_conntrack_ipv6]
    [ 461.608786] [] ? nf_iterate+0x41/0x77
    [ 461.614227] [] ? dev_hard_start_xmit+0x357/0x543
    [ 461.620659] [] ? nf_hook_slow+0x73/0x111
    [ 461.626440] [] ? br_parse_ip_options+0x19a/0x19a
    [bridge]
    [ 461.633581] [] ? dev_queue_xmit+0x3af/0x459
    [ 461.639577] [] ? br_dev_queue_push_xmit+0x72/0x76
    [bridge]
    [ 461.646887] [] ? br_nf_post_routing+0x17d/0x18f
    [bridge]
    [ 461.653997] [] ? nf_iterate+0x41/0x77
    [ 461.659473] [] ? br_flood+0xfa/0xfa [bridge]
    [ 461.665485] [] ? nf_hook_slow+0x73/0x111
    [ 461.671234] [] ? br_flood+0xfa/0xfa [bridge]
    [ 461.677299] [] ?
    nf_bridge_update_protocol+0x20/0x20 [bridge]
    [ 461.684891] [] ? nf_ct_zone+0xa/0x17 [nf_conntrack]
    [ 461.691520] [] ? br_flood+0xfa/0xfa [bridge]
    [ 461.697572] [] ? NF_HOOK.constprop.8+0x3c/0x56
    [bridge]
    [ 461.704616] [] ?
    nf_bridge_push_encap_header+0x1c/0x26 [bridge]
    [ 461.712329] [] ? br_nf_forward_finish+0x8a/0x95
    [bridge]
    [ 461.719490] [] ?
    nf_bridge_pull_encap_header+0x1c/0x27 [bridge]
    [ 461.727223] [] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge]
    [ 461.734292] [] ? nf_iterate+0x41/0x77
    [ 461.739758] [] ? __br_deliver+0xa0/0xa0 [bridge]
    [ 461.746203] [] ? nf_hook_slow+0x73/0x111
    [ 461.751950] [] ? __br_deliver+0xa0/0xa0 [bridge]
    [ 461.758378] [] ? NF_HOOK.constprop.4+0x56/0x56
    [bridge]

    This is caused by bridge netfilter special dst_entry (fake_rtable), a
    special shared entry, where attaching an inetpeer makes no sense.

    Problem is present since commit 87c48fa3b46 (ipv6: make fragment
    identifications less predictable)

    Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and
    __ip_select_ident() fallback to the 'no peer attached' handling.

    Reported-by: Chris Boot
    Tested-by: Chris Boot
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Dec, 2011

1 commit


27 Nov, 2011

2 commits


18 Aug, 2011

1 commit

  • The l4_rxhash flag was added to the skb structure to indicate
    that the rxhash value was computed over the 4 tuple for the
    packet which includes the port information in the encapsulated
    transport packet. This is used by the stack to preserve the
    rxhash value in __skb_rx_tunnel.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

03 Aug, 2011

1 commit

  • Gergely Kalman reported crashes in check_peer_redir().

    It appears commit f39925dbde778 (ipv4: Cache learned redirect
    information in inetpeer.) added a race, leading to possible NULL ptr
    dereference.

    Since we can now change dst neighbour, we should make sure a reader can
    safely use a neighbour.

    Add RCU protection to dst neighbour, and make sure check_peer_redir()
    can be called safely by different cpus in parallel.

    As neighbours are already freed after one RCU grace period, this patch
    should not add typical RCU penalty (cache cold effects)

    Many thanks to Gergely for providing a pretty report pointing to the
    bug.

    Reported-by: Gergely Kalman
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Jul, 2011

2 commits


14 Jul, 2011

1 commit

  • Now that there is a one-to-one correspondance between neighbour
    and hh_cache entries, we no longer need:

    1) dynamic allocation
    2) attachment to dst->hh
    3) refcounting

    Initialization of the hh_cache entry is indicated by hh_len
    being non-zero, and such initialization is always done with
    the neighbour's lock held as a writer.

    Signed-off-by: David S. Miller

    David S. Miller
     

02 Jul, 2011

1 commit

  • IPV6, unlike IPV4, doesn't have a routing cache.

    Routing table entries, as well as clones made in response
    to route lookup requests, all live in the same table. And
    all of these things are together collected in the destination
    cache table for ipv6.

    This means that routing table entries count against the garbage
    collection limits, even though such entries cannot ever be reclaimed
    and are added explicitly by the administrator (rather than being
    created in response to lookups).

    Therefore it makes no sense to count ipv6 routing table entries
    against the GC limits.

    Add a DST_NOCOUNT destination cache entry flag, and skip the counting
    if it is set. Use this flag bit in ipv6 when adding routing table
    entries.

    Signed-off-by: David S. Miller

    David S. Miller
     

25 May, 2011

1 commit


19 May, 2011

1 commit

  • It's way past it's usefulness. And this gets rid of a bunch
    of stray ->rt_{dst,src} references.

    Even the comment documenting the macro was inaccurate (stated
    default was 1 when it's 0).

    If reintroduced, it should be done properly, with dynamic debug
    facilities.

    Signed-off-by: David S. Miller

    David S. Miller
     

29 Apr, 2011

1 commit


25 Apr, 2011

1 commit

  • These header files are never installed to user consumption, so any
    __KERNEL__ cpp checks are superfluous.

    Projects should also not copy these files into their userland utility
    sources and try to use them there. If they insist on doing so, the
    onus is on them to sanitize the headers as needed.

    Signed-off-by: David S. Miller

    David S. Miller
     

28 Mar, 2011

1 commit


03 Mar, 2011

1 commit


02 Mar, 2011

2 commits


23 Feb, 2011

1 commit


18 Feb, 2011

1 commit


09 Feb, 2011

1 commit


05 Feb, 2011

1 commit