Doug / smarc-fsl-linux-kernel | Embedian Git Server

15 Mar, 2013

1 commit

aaa0c23cb Fix dst_neigh_lookup/dst_neigh_lookup_skb return value handling bug ... Browse Code »

When neighbour table is full, dst_neigh_lookup/dst_neigh_lookup_skb will return
-ENOBUFS which is absolutely non zero, while all the code in kernel which use
above functions assume failure only on zero return which will cause panic. (for
example: : https://bugzilla.kernel.org/show_bug.cgi?id=54731).

This patch corrects above error with smallest changes to kernel source code and
also correct two return value check missing bugs in drivers/infiniband/hw/cxgb4/cm.c

Tested on my x86_64 SMP machine

Reported-by: Zhouyi Zhou
Tested-by: Zhouyi Zhou
Signed-off-by: Zhouyi Zhou
Signed-off-by: David S. Miller

Zhouyi Zhou
2013-03-15 21:06:58 +0800

21 Feb, 2013

1 commit

ecd988372 ipv6: fix race condition regarding dst->expires and dst->from. ... Browse Code »

Eric Dumazet wrote:
| Some strange crashes happen in rt6_check_expired(), with access
| to random addresses.
|
| At first glance, it looks like the RTF_EXPIRES and
| stuff added in commit 1716a96101c49186b
| (ipv6: fix problem with expired dst cache)
| are racy : same dst could be manipulated at the same time
| on different cpus.
|
| At some point, our stack believes rt->dst.from contains a dst pointer,
| while its really a jiffie value (as rt->dst.expires shares the same area
| of memory)
|
| rt6_update_expires() should be fixed, or am I missing something ?
|
| CC Neil because of https://bugzilla.redhat.com/show_bug.cgi?id=892060

Because we do not have any locks for dst_entry, we cannot change
essential structure in the entry; e.g., we cannot change reference
to other entity.

To fix this issue, split 'from' and 'expires' field in dst_entry
out of union. Once it is 'from' is assigned in the constructor,
keep the reference until the very last stage of the life time of
the object.

Of course, it is unsafe to change 'from', so make rt6_set_from simple
just for fresh entries.

Reported-by: Eric Dumazet
Reported-by: Neil Horman
CC: Gao Feng
Signed-off-by: YOSHIFUJI Hideaki
Reviewed-by: Eric Dumazet
Reported-by: Steinar H. Gunderson
Reviewed-by: Neil Horman
Signed-off-by: David S. Miller

YOSHIFUJI Hideaki / 吉藤英明
2013-02-21 04:11:45 +0800

06 Feb, 2013

1 commit

a0073fe18 xfrm: Add a state resolution packet queue ... Browse Code »

As the default, we blackhole packets until the key manager resolves
the states. This patch implements a packet queue where IPsec packets
are queued until the states are resolved. We generate a dummy xfrm
bundle, the output routine of the returned route enqueues the packet
to a per policy queue and arms a timer that checks for state resolution
when dst_output() is called. Once the states are resolved, the packets
are sent out of the queue. If the states are not resolved after some
time, the queue is flushed.

This patch keeps the defaut behaviour to blackhole packets as long
as we have no states. To enable the packet queue the sysctl
xfrm_larval_drop must be switched off.

Signed-off-by: Steffen Klassert

Steffen Klassert
2013-02-06 15:31:10 +0800

23 Aug, 2012

1 commit

1304a7343 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2012-08-23 05:21:38 +0800

09 Aug, 2012

1 commit

a37e6e344 net: force dst_default_metrics to const section ... Browse Code »

While investigating on network performance problems, I found this little
gem :

$ nm -v vmlinux | grep -1 dst_default_metrics
ffffffff82736540 b busy.46605
ffffffff82736560 B dst_default_metrics
ffffffff82736598 b dst_busy_list

Apparently, declaring a const array without initializer put it in
(writeable) bss section, in middle of possibly often dirtied cache
lines.

Since we really want dst_default_metrics be const to avoid any possible
false sharing and catch any buggy writes, I force a null initializer.

ffffffff818a4c20 R dst_default_metrics

Signed-off-by: Eric Dumazet
Cc: Ben Hutchings
Signed-off-by: David S. Miller

Eric Dumazet
2012-08-09 07:00:28 +0800

08 Aug, 2012

1 commit

425f09ab7 net: output path optimizations ... Browse Code »

1) Avoid dirtying neighbour's confirmed field.

TCP workloads hits this cache line for each incoming ACK.
Lets write n->confirmed only if there is a jiffie change.

2) Optimize neigh_hh_output() for the common Ethernet case, were
hh_len is less than 16 bytes. Replace the memcpy() call
by two inlined 64bit load/stores on x86_64.

Bench results using udpflood test, with -C option (MSG_CONFIRM flag
added to sendto(), to reproduce the n->confirmed dirtying on UDP)

24 threads doing 1.000.000 UDP sendto() on dummy device, 4 runs.

before : 2.247s, 2.235s, 2.247s, 2.318s
after : 1.884s, 1.905s, 1.891s, 1.895s

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-08-08 07:24:55 +0800

21 Jul, 2012

2 commits

ceb332061 ipv4: Kill routes during PMTU/redirect updates. ... Browse Code »

Mark them obsolete so there will be a re-lookup to fetch the
FIB nexthop exception info.

Signed-off-by: David S. Miller

David S. Miller
2012-07-21 04:31:22 +0800
f5b0a8743 net: Document dst->obsolete better. ... Browse Code »

Add a big comment explaining how the field works, and use defines
instead of magic constants for the values assigned to it.

Suggested by Joe Perches.

Signed-off-by: David S. Miller

David S. Miller
2012-07-21 04:31:21 +0800

11 Jul, 2012

1 commit

94334d5ed net: Kill set_dst_metric_rtt(). ... Browse Code »

No longer used.

Signed-off-by: David S. Miller

David S. Miller
2012-07-11 13:40:07 +0800

05 Jul, 2012

3 commits

36bdbcae2 net: Kill dst->_neighbour, accessors, and final uses. ... Browse Code »

No longer used.

Signed-off-by: David S. Miller

David S. Miller
2012-07-05 17:42:00 +0800
f894cbf84 net: Add optional SKB arg to dst_ops->neigh_lookup(). ... Browse Code »

Causes the handler to use the daddr in the ipv4/ipv6 header when
the route gateway is unspecified (local subnet).

Signed-off-by: David S. Miller

David S. Miller
2012-07-05 16:04:01 +0800
5110effee net: Do delayed neigh confirmation. ... Browse Code »

When a dst_confirm() happens, mark the confirmation as pending in the
dst. Then on the next packet out, when we have the neigh in-hand, do
the update.

This removes the dependency in dst_confirm() of dst's having an
attached neigh.

While we're here, remove the explicit 'dst' NULL check, all except 2
or 3 call sites ensure it's not NULL. So just fix those cases up.

Signed-off-by: David S. Miller

David S. Miller
2012-07-05 16:03:06 +0800

17 Jun, 2012

1 commit

7f95e1880 include/net/dst.h: neaten asterisk placement ... Browse Code »

Fix code style - place the asterisk where it belongs.

Signed-off-by: Eldad Zack
Signed-off-by: David S. Miller

Eldad Zack
2012-06-17 06:20:35 +0800

27 May, 2012

1 commit

0c1833797 ipv6: fix incorrect ipsec fragment ... Browse Code »

Since commit ad0081e43a
"ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed"
the fragment of packets is incorrect.
because tunnel mode needs IPsec headers and trailer for all fragments,
while on transport mode it is sufficient to add the headers to the
first fragment and the trailer to the last.

so modify mtu and maxfraglen base on ipsec mode and if fragment is first
or last.

with my test,it work well(every fragment's size is the mtu)
and does not trigger slow fragment path.

Changes from v1:
though optimization, mtu_prev and maxfraglen_prev can be delete.
replace xfrm mode codes with dst_entry's new frag DST_XFRM_TUNNEL.
add fuction ip6_append_data_mtu to make codes clearer.

Signed-off-by: Gao feng
Signed-off-by: David S. Miller

Gao feng
2012-05-27 13:11:22 +0800

24 Apr, 2012

1 commit

a881e963c set fake_rtable's dst to NULL to avoid kernel Oops ... Browse Code »

bridge: set fake_rtable's dst to NULL to avoid kernel Oops

when bridge is deleted before tap/vif device's delete, kernel may
encounter an oops because of NULL reference to fake_rtable's dst.
Set fake_rtable's dst to NULL before sending packets out can solve
this problem.

v4 reformat, change br_drop_fake_rtable(skb) to {}

v3 enrich commit header

v2 introducing new flag DST_FAKE_RTABLE to dst_entry struct.

[ Use "do { } while (0)" for nop br_drop_fake_rtable()
implementation -DaveM ]

Acked-by: Eric Dumazet
Signed-off-by: Peter Huang
Signed-off-by: David S. Miller

Peter Huang (Peng)
2012-04-24 12:16:24 +0800

14 Apr, 2012

1 commit

1716a9610 ipv6: fix problem with expired dst cache ... Browse Code »

If the ipv6 dst cache which copy from the dst generated by ICMPV6 RA packet.
this dst cache will not check expire because it has no RTF_EXPIRES flag.
So this dst cache will always be used until the dst gc run.

Change the struct dst_entry,add a union contains new pointer from and expires.
When rt6_info.rt6i_flags has no RTF_EXPIRES flag,the dst.expires has no use.
we can use this field to point to where the dst cache copy from.
The dst.from is only used in IPV6.

rt6_check_expired check if rt6_info.dst.from is expired.

ip6_rt_copy only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.

ip6_dst_destroy release the ort.

Add some functions to operate the RTF_EXPIRES flag and expires(from) together.
and change the code to use these new adding functions.

Changes from v5:
modify ip6_route_add and ndisc_router_discovery to use new adding functions.

Only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.

Signed-off-by: Gao feng
Signed-off-by: David S. Miller

Gao feng
2012-04-14 00:58:29 +0800

05 Mar, 2012

1 commit

187f1882b BUG: headers with BUG/BUG_ON etc. need linux/bug.h ... Browse Code »

If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
other BUG variant in a static inline (i.e. not in a #define) then
that header really should be including and not just
expecting it to be implicitly present.

We can make this change risk-free, since if the files using these
headers didn't have exposure to linux/bug.h already, they would have
been causing compile failures/warnings.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2012-03-05 06:54:34 +0800

24 Dec, 2011

1 commit

abb434cb0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
net/bluetooth/l2cap_core.c

Just two overlapping changes, one added an initialization of
a local variable, and another change added a new local variable.

Signed-off-by: David S. Miller

David S. Miller
2011-12-24 06:13:56 +0800

23 Dec, 2011

1 commit

e688a6048 net: introduce DST_NOPEER dst flag ... Browse Code »

Chris Boot reported crashes occurring in ipv6_select_ident().

[ 461.457562] RIP: 0010:[] []
ipv6_select_ident+0x31/0xa7

[ 461.578229] Call Trace:
[ 461.580742]
[ 461.582870] [] ? udp6_ufo_fragment+0x124/0x1a2
[ 461.589054] [] ? ipv6_gso_segment+0xc0/0x155
[ 461.595140] [] ? skb_gso_segment+0x208/0x28b
[ 461.601198] [] ? ipv6_confirm+0x146/0x15e
[nf_conntrack_ipv6]
[ 461.608786] [] ? nf_iterate+0x41/0x77
[ 461.614227] [] ? dev_hard_start_xmit+0x357/0x543
[ 461.620659] [] ? nf_hook_slow+0x73/0x111
[ 461.626440] [] ? br_parse_ip_options+0x19a/0x19a
[bridge]
[ 461.633581] [] ? dev_queue_xmit+0x3af/0x459
[ 461.639577] [] ? br_dev_queue_push_xmit+0x72/0x76
[bridge]
[ 461.646887] [] ? br_nf_post_routing+0x17d/0x18f
[bridge]
[ 461.653997] [] ? nf_iterate+0x41/0x77
[ 461.659473] [] ? br_flood+0xfa/0xfa [bridge]
[ 461.665485] [] ? nf_hook_slow+0x73/0x111
[ 461.671234] [] ? br_flood+0xfa/0xfa [bridge]
[ 461.677299] [] ?
nf_bridge_update_protocol+0x20/0x20 [bridge]
[ 461.684891] [] ? nf_ct_zone+0xa/0x17 [nf_conntrack]
[ 461.691520] [] ? br_flood+0xfa/0xfa [bridge]
[ 461.697572] [] ? NF_HOOK.constprop.8+0x3c/0x56
[bridge]
[ 461.704616] [] ?
nf_bridge_push_encap_header+0x1c/0x26 [bridge]
[ 461.712329] [] ? br_nf_forward_finish+0x8a/0x95
[bridge]
[ 461.719490] [] ?
nf_bridge_pull_encap_header+0x1c/0x27 [bridge]
[ 461.727223] [] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge]
[ 461.734292] [] ? nf_iterate+0x41/0x77
[ 461.739758] [] ? __br_deliver+0xa0/0xa0 [bridge]
[ 461.746203] [] ? nf_hook_slow+0x73/0x111
[ 461.751950] [] ? __br_deliver+0xa0/0xa0 [bridge]
[ 461.758378] [] ? NF_HOOK.constprop.4+0x56/0x56
[bridge]

This is caused by bridge netfilter special dst_entry (fake_rtable), a
special shared entry, where attaching an inetpeer makes no sense.

Problem is present since commit 87c48fa3b46 (ipv6: make fragment
identifications less predictable)

Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and
__ip_select_ident() fallback to the 'no peer attached' handling.

Reported-by: Chris Boot
Tested-by: Chris Boot
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-23 11:34:56 +0800

06 Dec, 2011

1 commit

272174550 net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}. ... Browse Code »

To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.

Signed-off-by: David S. Miller
Acked-by: Roland Dreier

David Miller
2011-12-06 04:20:19 +0800

27 Nov, 2011

2 commits

618f9bc74 net: Move mtu handling down to the protocol depended handlers ... Browse Code »

We move all mtu handling from dst_mtu() down to the protocol
layer. So each protocol can implement the mtu handling in
a different manner.

Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller

Steffen Klassert
2011-11-27 03:29:51 +0800
ebb762f27 net: Rename the dst_opt default_mtu method to mtu ... Browse Code »

We plan to invoke the dst_opt->default_mtu() method unconditioally
from dst_mtu(). So rename the method to dst_opt->mtu() to match
the name with the new meaning.

Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller

Steffen Klassert
2011-11-27 03:29:50 +0800

18 Aug, 2011

1 commit

bdeab9919 rps: Add flag to skb to indicate rxhash is based on L4 tuple ... Browse Code »

The l4_rxhash flag was added to the skb structure to indicate
that the rxhash value was computed over the 4 tuple for the
packet which includes the port information in the encapsulated
transport packet. This is used by the stack to preserve the
rxhash value in __skb_rx_tunnel.

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2011-08-18 11:06:03 +0800

03 Aug, 2011

1 commit

f2c31e32b net: fix NULL dereferences in check_peer_redir() ... Browse Code »

Gergely Kalman reported crashes in check_peer_redir().

It appears commit f39925dbde778 (ipv4: Cache learned redirect
information in inetpeer.) added a race, leading to possible NULL ptr
dereference.

Since we can now change dst neighbour, we should make sure a reader can
safely use a neighbour.

Add RCU protection to dst neighbour, and make sure check_peer_redir()
can be called safely by different cpus in parallel.

As neighbours are already freed after one RCU grace period, this patch
should not add typical RCU penalty (cache cold effects)

Many thanks to Gergely for providing a pretty report pointing to the
bug.

Reported-by: Gergely Kalman
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-08-03 18:34:12 +0800

18 Jul, 2011

2 commits

d3aaeb38c net: Add ->neigh_lookup() operation to dst_ops ... Browse Code »

In the future dst entries will be neigh-less. In that environment we
need to have an easy transition point for current users of
dst->neighbour outside of the packet output fast path.

Signed-off-by: David S. Miller

David S. Miller
2011-07-18 15:40:17 +0800
69cce1d14 net: Abstract dst->neighbour accesses behind helpers. ... Browse Code »

dst_{get,set}_neighbour()

Signed-off-by: David S. Miller

David S. Miller
2011-07-18 14:11:35 +0800

14 Jul, 2011

1 commit

f6b72b621 net: Embed hh_cache inside of struct neighbour. ... Browse Code »

Now that there is a one-to-one correspondance between neighbour
and hh_cache entries, we no longer need:

1) dynamic allocation
2) attachment to dst->hh
3) refcounting

Initialization of the hh_cache entry is indicated by hh_len
being non-zero, and such initialization is always done with
the neighbour's lock held as a writer.

Signed-off-by: David S. Miller

David S. Miller
2011-07-14 22:53:20 +0800

02 Jul, 2011

1 commit

957c665f3 ipv6: Don't put artificial limit on routing table size. ... Browse Code »

IPV6, unlike IPV4, doesn't have a routing cache.

Routing table entries, as well as clones made in response
to route lookup requests, all live in the same table. And
all of these things are together collected in the destination
cache table for ipv6.

This means that routing table entries count against the garbage
collection limits, even though such entries cannot ever be reclaimed
and are added explicitly by the administrator (rather than being
created in response to lookups).

Therefore it makes no sense to count ipv6 routing table entries
against the GC limits.

Add a DST_NOCOUNT destination cache entry flag, and skip the counting
if it is set. Use this flag bit in ipv6 when adding routing table
entries.

Signed-off-by: David S. Miller

David S. Miller
2011-07-02 08:30:43 +0800

25 May, 2011

1 commit

1f37070d3 dst: catch uninitialized metrics ... Browse Code »

Catch cases where dst_metric_set() and other functions are called
but _metrics is NULL.

Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller

Stephen Hemminger
2011-05-25 01:50:52 +0800

19 May, 2011

1 commit

6882f933c ipv4: Kill RT_CACHE_DEBUG ... Browse Code »

It's way past it's usefulness. And this gets rid of a bunch
of stray ->rt_{dst,src} references.

Even the comment documenting the macro was inaccurate (stated
default was 1 when it's 0).

If reintroduced, it should be done properly, with dynamic debug
facilities.

Signed-off-by: David S. Miller

David S. Miller
2011-05-19 06:23:21 +0800

29 Apr, 2011

1 commit

5c1e6aa30 net: Make dst_alloc() take more explicit initializations. ... Browse Code »

Now the dst->dev, dev->obsolete, and dst->flags values can
be specified as well.

Signed-off-by: David S. Miller

David S. Miller
2011-04-29 13:25:59 +0800

25 Apr, 2011

1 commit

2a9e95070 net: Remove __KERNEL__ cpp checks from include/net ... Browse Code »

These header files are never installed to user consumption, so any
__KERNEL__ cpp checks are superfluous.

Projects should also not copy these files into their userland utility
sources and try to use them there. If they insist on doing so, the
onus is on them to sanitize the headers as needed.

Signed-off-by: David S. Miller

David S. Miller
2011-04-25 01:54:56 +0800

28 Mar, 2011

1 commit

e433430a0 dst: Clone child entry in skb_dst_pop ... Browse Code »

We clone the child entry in skb_dst_pop before we call
skb_dst_drop(). Otherwise we might kill the child right
before we return it to the caller.

Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller

Steffen Klassert
2011-03-28 08:55:01 +0800

03 Mar, 2011

1 commit

452edd598 xfrm: Return dst directly from xfrm_lookup() ... Browse Code »

Instead of on the stack.

Signed-off-by: David S. Miller

David S. Miller
2011-03-03 05:27:41 +0800

02 Mar, 2011

2 commits

2774c131b xfrm: Handle blackhole route creation via afinfo. ... Browse Code »

That way we don't have to potentially do this in every xfrm_lookup()
caller.

Signed-off-by: David S. Miller

David S. Miller
2011-03-02 06:59:04 +0800
80c0bc9e3 xfrm: Kill XFRM_LOOKUP_WAIT flag. ... Browse Code »

This can be determined from the flow flags instead.

Signed-off-by: David S. Miller

David S. Miller
2011-03-02 06:36:37 +0800

23 Feb, 2011

1 commit

dee9f4bce net: Make flow cache paths use a const struct flowi. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-02-23 10:44:31 +0800

18 Feb, 2011

1 commit

3c7bd1a14 net: Add initial_ref arg to dst_alloc(). ... Browse Code »

This allows avoiding multiple writes to the initial __refcnt.

The most simplest cases of wanting an initial reference of "1"
in ipv4 and ipv6 have been converted, the rest have been left
along and kept at the existing "0".

Signed-off-by: David S. Miller

David S. Miller
2011-02-18 07:44:00 +0800

09 Feb, 2011

1 commit

e7b66bdc0 net: Remove bogus barrier() in dst_allfrag(). ... Browse Code »

I simply missed this one when modifying the other dst
metric interfaces earlier.

Signed-off-by: David S. Miller

David S. Miller
2011-02-09 07:33:22 +0800

05 Feb, 2011

1 commit

92d868292 inetpeer: Move ICMP rate limiting state into inet_peer entries. ... Browse Code »

Like metrics, the ICMP rate limiting bits are cached state about
a destination. So move it into the inet_peer entries.

If an inet_peer cannot be bound (the reason is memory allocation
failure or similar), the policy is to allow.

Signed-off-by: David S. Miller

David S. Miller
2011-02-05 07:59:53 +0800