Eric Lee / smarc-fsl-linux-kernel

24 Oct, 2011

1 commit

b73233960 ipv4: fix ipsec forward performance regression ... Browse Code »
1

There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
It makes xfrm4_fill_dst() modify wrong data structure.

Signed-off-by: Zheng Yan
Reported-by: Kim Phillips
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Yan, Zheng
2011-10-24 15:01:22 +0800

22 Jun, 2011

1 commit

56f8a75c1 ip: introduce ip_is_fragment helper inline function ... Browse Code »

There are enough instances of this:

iph->frag_off & htons(IP_MF | IP_OFFSET)

that a helper function is probably warranted.

Signed-off-by: Paul Gortmaker
Signed-off-by: David S. Miller

Paul Gortmaker
2011-06-22 11:33:34 +0800

11 May, 2011

1 commit

8f01cb082 ipv4: xfrm: Eliminate ->rt_src reference in policy code. ... Browse Code »

Rearrange xfrm4_dst_lookup() so that it works by calling a helper
function __xfrm_dst_lookup() that takes an explicit flow key storage
area as an argument.

Use this new helper in xfrm4_get_saddr() so we can fetch the selected
source address from the flow instead of from rt->rt_src

Signed-off-by: David S. Miller

David S. Miller
2011-05-11 04:32:48 +0800

04 May, 2011

1 commit

475949d8e ipv4: Renamt struct rtable's rt_tos to rt_key_tos. ... Browse Code »

To more accurately reflect that it is purely a routing
cache lookup key and is used in no other context.

Signed-off-by: David S. Miller

David S. Miller
2011-05-04 10:45:15 +0800

23 Apr, 2011

1 commit

b71d1d426 inet: constify ip headers and in6_addr ... Browse Code »

Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-04-23 02:04:14 +0800

08 Apr, 2011

1 commit

1b86a58f9 ipv4: Fix "Set rt->rt_iif more sanely on output routes." ... Browse Code »

Commit 1018b5c01636c7c6bda31a719bda34fc631db29a ("Set rt->rt_iif more
sanely on output routes.") breaks rt_is_{output,input}_route.

This became the cause to return "IP_PKTINFO's ->ipi_ifindex == 0".

To fix it, this does:

1) Add "int rt_route_iif;" to struct rtable

2) For input routes, always set rt_route_iif to same value as rt_iif

3) For output routes, always set rt_route_iif to zero. Set rt_iif
as it is done currently.

4) Change rt_is_{output,input}_route() to test rt_route_iif

Signed-off-by: OGAWA Hirofumi
Signed-off-by: David S. Miller

OGAWA Hirofumi
2011-04-08 05:04:08 +0800

13 Mar, 2011

5 commits

9cce96df5 net: Put fl4_* macros to struct flowi4 and use them again. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:54 +0800
7e1dc7b6f net: Use flowi4 and flowi6 in xfrm layer. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:52 +0800
9d6ec9380 ipv4: Use flowi4 in public route lookup interfaces. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:48 +0800
6281dcc94 net: Make flowi ports AF dependent. ... Browse Code »

Create two sets of port member accessors, one set prefixed by fl4_*
and the other prefixed by fl6_*

This will let us to create AF optimal flow instances.

It will work because every context in which we access the ports,
we have to be fully aware of which AF the flowi is anyways.

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:46 +0800
1d28f42c1 net: Put flowi_* prefix on AF independent members of struct flowi ... Browse Code »

I intend to turn struct flowi into a union of AF specific flowi
structs. There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.

Signed-off-by: David S. Miller

David S. Miller
2011-03-13 07:08:44 +0800

05 Mar, 2011

1 commit

5e2b61f78 ipv4: Remove flowi from struct rtable. ... Browse Code »
1

The only necessary parts are the src/dst addresses, the
interface indexes, the TOS, and the mark.

The rest is unnecessary bloat, which amounts to nearly
50 bytes on 64-bit.

Signed-off-by: David S. Miller

David S. Miller
2011-03-05 13:55:31 +0800

03 Mar, 2011

1 commit

b23dd4fe4 ipv4: Make output route lookup return rtable directly. ... Browse Code »

Instead of on the stack.

Signed-off-by: David S. Miller

David S. Miller
2011-03-03 06:31:35 +0800

02 Mar, 2011

1 commit

2774c131b xfrm: Handle blackhole route creation via afinfo. ... Browse Code »

That way we don't have to potentially do this in every xfrm_lookup()
caller.

Signed-off-by: David S. Miller

David S. Miller
2011-03-02 06:59:04 +0800

24 Feb, 2011

1 commit

5e6b930f2 xfrm: Const'ify address arguments to ->dst_lookup() ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-02-24 15:07:38 +0800

23 Feb, 2011

2 commits

0c7b3eefb xfrm: Mark flowi arg to ->fill_dst() const. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-02-23 09:48:57 +0800
05d840257 xfrm: Mark flowi arg to ->get_tos() const. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-02-23 09:47:10 +0800

27 Jan, 2011

1 commit

62fa8a846 net: Implement read-only protection and COW'ing of metrics. ... Browse Code »

Routing metrics are now copy-on-write.

Initially a route entry points it's metrics at a read-only location.
If a routing table entry exists, it will point there. Else it will
point at the all zero metric place-holder called 'dst_default_metrics'.

The writeability state of the metrics is stored in the low bits of the
metrics pointer, we have two bits left to spare if we want to store
more states.

For the initial implementation, COW is implemented simply via kmalloc.
However future enhancements will change this to place the writable
metrics somewhere else, in order to increase sharing. Very likely
this "somewhere else" will be the inetpeer cache.

Note also that this means that metrics updates may transiently fail
if we cannot COW the metrics successfully.

But even by itself, this patch should decrease memory usage and
increase cache locality especially for routing workloads. In those
cases the read-only metric copies stay in place and never get written
to.

TCP workloads where metrics get updated, and those rare cases where
PMTU triggers occur, will take a very slight performance hit. But
that hit will be alleviated when the long-term writable metrics
move to a more sharable location.

Since the metrics storage went from a u32 array of RTAX_MAX entries to
what is essentially a pointer, some retooling of the dst_entry layout
was necessary.

Most importantly, we need to preserve the alignment of the reference
count so that it doesn't share cache lines with the read-mostly state,
as per Eric Dumazet's alignment assertion checks.

The only non-trivial bit here is the move of the 'flags' member into
the writeable cacheline. This is OK since we are always accessing the
flags around the same moment when we made a modification to the
reference count.

Signed-off-by: David S. Miller

David S. Miller
2011-01-27 12:51:05 +0800

18 Nov, 2010

1 commit

5811662b1 net: use the macros defined for the members of flowi ... Browse Code »

Use the macros defined for the members of flowi to clean the code up.

Signed-off-by: Changli Gao
Signed-off-by: David S. Miller

Changli Gao
2010-11-18 04:27:45 +0800

16 Nov, 2010

1 commit

cc9ff19da xfrm: use gre key as flow upper protocol info ... Browse Code »

The GRE Key field is intended to be used for identifying an individual
traffic flow within a tunnel. It is useful to be able to have XFRM
policy selector matches to have different policies for different
GRE tunnels.

Signed-off-by: Timo Teräs
Signed-off-by: David S. Miller

Timo Teräs
2010-11-16 02:44:04 +0800

12 Nov, 2010

1 commit

72cdd1d97 net: get rid of rtable->idev ... Browse Code »

It seems idev field in struct rtable has no special purpose, but adding
extra atomic ops.

We hold refcounts on the device itself (using percpu data, so pretty
cheap in current kernel).

infiniband case is solved using dst.dev instead of idev->dev

Removal of this field means routing without route cache is now using
shared data, percpu data, and only potential contention is a pair of
atomic ops on struct neighbour per forwarded packet.

About 5% speedup on routing test.

Signed-off-by: Eric Dumazet
Cc: Herbert Xu
Cc: Roland Dreier
Cc: Sean Hefty
Cc: Hal Rosenstock
Signed-off-by: David S. Miller

Eric Dumazet
2010-11-12 02:29:40 +0800

12 Oct, 2010

1 commit

fc66f95c6 net dst: use a percpu_counter to track entries ... Browse Code »

struct dst_ops tracks number of allocated dst in an atomic_t field,
subject to high cache line contention in stress workload.

Switch to a percpu_counter, to reduce number of time we need to dirty a
central location. Place it on a separate cache line to avoid dirtying
read only fields.

Stress test :

(Sending 160.000.000 UDP frames,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_TRIE, SLUB/NUMA)

Before:

real 0m51.179s
user 0m15.329s
sys 10m15.942s

After:

real 0m45.570s
user 0m15.525s
sys 9m56.669s

With a small reordering of struct neighbour fields, subject of a
following patch, (to separate refcnt from other read mostly fields)

real 0m41.841s
user 0m15.261s
sys 8m45.949s

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-10-12 04:06:53 +0800

23 Sep, 2010

1 commit

94e223896 xfrm4: strip ECN bits from tos field ... Browse Code »

otherwise ECT(1) bit will get interpreted as RTO_ONLINK
and routing will fail with XfrmOutBundleGenError.

Signed-off-by: Ulrich Weber
Signed-off-by: David S. Miller

Ulrich Weber
2010-09-23 11:25:48 +0800

08 Jul, 2010

1 commit

597e608a8 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Browse Code »

David S. Miller
2010-07-08 06:59:38 +0800

05 Jul, 2010

1 commit

44b451f16 xfrm: fix xfrm by MARK logic ... Browse Code »

While using xfrm by MARK feature in
2.6.34 - 2.6.35 kernels, the mark
is always cleared in flowi structure via memset in
_decode_session4 (net/ipv4/xfrm4_policy.c), so
the policy lookup fails.
IPv6 code is affected by this bug too.

Signed-off-by: Peter Kosyh
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Peter Kosyh
2010-07-05 02:46:07 +0800

11 Jun, 2010

1 commit

d8d1f30b9 net-next: remove useless union keyword ... Browse Code »

remove useless union keyword in rtable, rt6_info and dn_route.

Since there is only one member in a union, the union keyword isn't useful.

Signed-off-by: Changli Gao
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Changli Gao
2010-06-11 14:31:35 +0800

07 Apr, 2010

1 commit

80c802f30 xfrm: cache bundles instead of policies for outgoing flows ... Browse Code »

__xfrm_lookup() is called for each packet transmitted out of
system. The xfrm_find_bundle() does a linear search which can
kill system performance depending on how many bundles are
required per policy.

This modifies __xfrm_lookup() to store bundles directly in
the flow cache. If we did not get a hit, we just create a new
bundle instead of doing slow search. This means that we can now
get multiple xfrm_dst's for same flow (on per-cpu basis).

Signed-off-by: Timo Teras
Signed-off-by: David S. Miller

Timo Teräs
2010-04-07 18:43:19 +0800

03 Mar, 2010

1 commit

87c1e12b5 ipsec: Fix bogus bundle flowi ... Browse Code »

When I merged the bundle creation code, I introduced a bogus
flowi value in the bundle. Instead of getting from the caller,
it was instead set to the flow in the route object, which is
totally different.

The end result is that the bundles we created never match, and
we instead end up with an ever growing bundle list.

Thanks to Jamal for find this problem.

Reported-by: Jamal Hadi Salim
Signed-off-by: Herbert Xu
Acked-by: Steffen Klassert
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Herbert Xu
2010-03-03 17:04:37 +0800

25 Jan, 2010

1 commit

d7c7544c3 netns xfrm: deal with dst entries in netns ... Browse Code »

GC is non-existent in netns, so after you hit GC threshold, no new
dst entries will be created until someone triggers cleanup in init_net.

Make xfrm4_dst_ops and xfrm6_dst_ops per-netns.
This is not done in a generic way, because it woule waste
(AF_MAX - 2) * sizeof(struct dst_ops) bytes per-netns.

Reorder GC threshold initialization so it'd be done before registering
XFRM policies.

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2010-01-25 14:47:53 +0800

12 Nov, 2009

1 commit

f8572d8f2 sysctl net: Remove unused binary sysctl code ... Browse Code »

Now that sys_sysctl is a compatiblity wrapper around /proc/sys
all sysctl strategy routines, and all ctl_name and strategy
entries in the sysctl tables are unused, and can be
revmoed.

In addition neigh_sysctl_register has been modified to no longer
take a strategy argument and it's callers have been modified not
to pass one.

Cc: "David Miller"
Cc: Hideaki YOSHIFUJI
Cc: netdev@vger.kernel.org
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2009-11-12 18:05:06 +0800

05 Aug, 2009

1 commit

f816700aa xfrm4: fix build when SYSCTLs are disabled ... Browse Code »

Fix build errors when SYSCTLs are not enabled:
(.init.text+0x5154): undefined reference to `net_ipv4_ctl_path'
(.init.text+0x5176): undefined reference to `register_net_sysctl_table'
xfrm4_policy.c:(.exit.text+0x573): undefined reference to `unregister_net_sysctl_table

Signed-off-by: Randy Dunlap
Signed-off-by: David S. Miller

Randy Dunlap
2009-08-05 11:18:33 +0800

31 Jul, 2009

1 commit

a33bc5c15 xfrm: select sane defaults for xfrm[4|6] gc_thresh ... Browse Code »

Choose saner defaults for xfrm[4|6] gc_thresh values on init

Currently, the xfrm[4|6] code has hard-coded initial gc_thresh values
(set to 1024). Given that the ipv4 and ipv6 routing caches are sized
dynamically at boot time, the static selections can be non-sensical.
This patch dynamically selects an appropriate gc threshold based on
the corresponding main routing table size, using the assumption that
we should in the worst case be able to handle as many connections as
the routing table can.

For ipv4, the maximum route cache size is 16 * the number of hash
buckets in the route cache. Given that xfrm4 starts garbage
collection at the gc_thresh and prevents new allocations at 2 *
gc_thresh, we set gc_thresh to half the maximum route cache size.

For ipv6, its a bit trickier. there is no maximum route cache size,
but the ipv6 dst_ops gc_thresh is statically set to 1024. It seems
sane to select a simmilar gc_thresh for the xfrm6 code that is half
the number of hash buckets in the v6 route cache times 16 (like the v4
code does).

Signed-off-by: Neil Horman
Signed-off-by: David S. Miller

Neil Horman
2009-07-31 09:52:15 +0800

28 Jul, 2009

1 commit

a44a4a006 xfrm: export xfrm garbage collector thresholds via sysctl ... Browse Code »

Export garbage collector thresholds for xfrm[4|6]_dst_ops

Had a problem reported to me recently in which a high volume of ipsec
connections on a system began reporting ENOBUFS for new connections
eventually.

It seemed that after about 2000 connections we started being unable to
create more. A quick look revealed that the xfrm code used a dst_ops
structure that limited the gc_thresh value to 1024, and always
dropped route cache entries after 2x the gc_thresh.

It seems the most direct solution is to export the gc_thresh values in
the xfrm[4|6] dst_ops as sysctls, like the main routing table does, so
that higher volumes of connections can be supported. This patch has
been tested and allows the reporter to increase their ipsec connection
volume successfully.

Reported-by: Joe Nall
Signed-off-by: Neil Horman

ipv4/xfrm4_policy.c | 18 ++++++++++++++++++
ipv6/xfrm6_policy.c | 18 ++++++++++++++++++
2 files changed, 36 insertions(+)
Signed-off-by: David S. Miller

Neil Horman
2009-07-28 02:35:32 +0800

04 Jul, 2009

1 commit

c615c9f3f xfrm4: fix the ports decode of sctp protocol ... Browse Code »

The SCTP pushed the skb data above the sctp chunk header, so the check
of pskb_may_pull(skb, xprth + 4 - skb->data) in _decode_session4() will
never return 0 because xprth + 4 - skb->data < 0, the ports decode of
sctp will always fail.

Signed-off-by: Wei Yongjun
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Wei Yongjun
2009-07-04 10:10:06 +0800

01 Feb, 2009

1 commit

09640e636 net: replace uses of __constant_{endian} ... Browse Code »

Base versions handle constant folding now.

Signed-off-by: Harvey Harrison
Signed-off-by: David S. Miller

Harvey Harrison
2009-02-01 16:45:17 +0800

26 Nov, 2008

3 commits

fbda33b2b netns xfrm: ->get_saddr in netns ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2008-11-26 09:56:49 +0800
c5b3cf46e netns xfrm: ->dst_lookup in netns ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2008-11-26 09:51:25 +0800
ddcfd7968 netns xfrm: dst garbage-collecting in netns ... Browse Code »

Pass netns pointer to struct xfrm_policy_afinfo::garbage_collect()

[This needs more thoughts on what to do with dst_ops]
[Currently stub to init_net]

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2008-11-26 09:37:23 +0800

12 Nov, 2008

1 commit

6bb3ce25d net: remove struct dst_entry::entry_size ... Browse Code »

Unused after kmem_cache_zalloc() conversion.

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2008-11-12 09:25:22 +0800

03 Nov, 2008

1 commit

5a5f3a8db net: clean up net/ipv4/ipip.c raw.c tcp.c tcp_minisocks.c tcp_yeah.c xfrm4_policy.c ... Browse Code »

Signed-off-by: Jianjun Kong
Signed-off-by: David S. Miller

Jianjun Kong
2008-11-03 16:24:34 +0800