Doug / smarc-fsl-linux-kernel | Embedian Git Server

01 Aug, 2012

2 commits

99a1dec70 net: introduce sk_gfp_atomic() to allow addition of GFP flags depending on the individual socket ... Browse Code »

Introduce sk_gfp_atomic(), this function allows to inject sock specific
flags to each sock related allocation. It is only used on allocation
paths that may be required for writing pages back to network storage.

[davem@davemloft.net: Use sk_gfp_atomic only when necessary]
Signed-off-by: Peter Zijlstra
Signed-off-by: Mel Gorman
Acked-by: David S. Miller
Cc: Neil Brown
Cc: Mike Christie
Cc: Eric B Munson
Cc: Eric Dumazet
Cc: Sebastian Andrzej Siewior
Cc: Mel Gorman
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-08-01 09:42:46 +0800
c255a4580 memcg: rename config variables ... Browse Code »

Sanity:

CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG
CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP
CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED
CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM

[mhocko@suse.cz: fix missed bits]
Cc: Glauber Costa
Acked-by: Michal Hocko
Cc: Johannes Weiner
Cc: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Cc: Tejun Heo
Cc: Aneesh Kumar K.V
Cc: David Rientjes
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2012-08-01 09:42:43 +0800

27 Jul, 2012

1 commit

c7109986d ipv6: Early TCP socket demux ... Browse Code »

This is the IPv6 missing bits for infrastructure added in commit
41063e9dd1195 (ipv4: Early TCP socket demux.)

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-07-27 06:50:39 +0800

23 Jul, 2012

1 commit

563d34d05 tcp: dont drop MTU reduction indications ... Browse Code »

ICMP messages generated in output path if frame length is bigger than
mtu are actually lost because socket is owned by user (doing the xmit)

One example is the ipgre_tunnel_xmit() calling
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));

We had a similar case fixed in commit a34a101e1e6 (ipv6: disable GSO on
sockets hitting dst_allfrag).

Problem of such fix is that it relied on retransmit timers, so short tcp
sessions paid a too big latency increase price.

This patch uses the tcp_release_cb() infrastructure so that MTU
reduction messages (ICMP messages) are not lost, and no extra delay
is added in TCP transmits.

Reported-by: Maciej Żenczykowski
Diagnosed-by: Neal Cardwell
Signed-off-by: Eric Dumazet
Cc: Nandita Dukkipati
Cc: Tom Herbert
Cc: Tore Anderson
Signed-off-by: David S. Miller

Eric Dumazet
2012-07-23 15:58:46 +0800

21 Jul, 2012

1 commit

f5b0a8743 net: Document dst->obsolete better. ... Browse Code »

Add a big comment explaining how the field works, and use defines
instead of magic constants for the values assigned to it.

Suggested by Joe Perches.

Signed-off-by: David S. Miller

David S. Miller
2012-07-21 04:31:21 +0800

20 Jul, 2012

1 commit

2100c8d2d net-tcp: Fast Open base ... Browse Code »

This patch impelements the common code for both the client and server.

1. TCP Fast Open option processing. Since Fast Open does not have an
option number assigned by IANA yet, it shares the experiment option
code 254 by implementing draft-ietf-tcpm-experimental-options
with a 16 bits magic number 0xF989. This enables global experiments
without clashing the scarce(2) experimental options available for TCP.

When the draft status becomes standard (maybe), the client should
switch to the new option number assigned while the server supports
both numbers for transistion.

2. The new sysctl tcp_fastopen

3. A place holder init function

Signed-off-by: Yuchung Cheng
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Yuchung Cheng
2012-07-20 01:55:36 +0800

19 Jul, 2012

1 commit

ddbe50320 ipv6: add ipv6_addr_hash() helper ... Browse Code »

Introduce ipv6_addr_hash() helper doing a XOR on all bits
of an IPv6 address, with an optimized x86_64 version.

Use it in flow dissector, as suggested by Andrew McGregor,
to reduce hash collision probabilities in fq_codel (and other
users of flow dissector)

Use it in ip6_tunnel.c and use more bit shuffling, as suggested
by David Laight, as existing hash was ignoring most of them.

Use it in sunrpc and use more bit shuffling, using hash_32().

Use it in net/ipv6/addrconf.c, using hash_32() as well.

As a cleanup, use it in net/ipv4/tcp_metrics.c

Signed-off-by: Eric Dumazet
Reported-by: Andrew McGregor
Cc: Dave Taht
Cc: Tom Herbert
Cc: David Laight
Cc: Joe Perches
Signed-off-by: David S. Miller

Eric Dumazet
2012-07-19 02:28:46 +0800

18 Jul, 2012

2 commits

d3818c92a ipv6: fix inet6_csk_xmit() ... Browse Code »

We should provide to inet6_csk_route_socket a struct flowi6 pointer,
so that net6_csk_xmit() works correctly instead of sending garbage.

Also add some consts

Signed-off-by: Eric Dumazet
Reported-by: Yuchung Cheng
Cc: Neal Cardwell
Signed-off-by: David S. Miller

Eric Dumazet
2012-07-18 23:59:58 +0800
a6ff1a2f1 Merge branch 'nexthop_exceptions' ... Browse Code »

These patches implement the final mechanism necessary to really allow
us to go without the route cache in ipv4.

We need a place to have long-term storage of PMTU/redirect information
which is independent of the routes themselves, yet does not get us
back into a situation where we have to write to metrics or anything
like that.

For this we use an "next-hop exception" table in the FIB nexthops.

The one thing I desperately want to avoid is having to create clone
routes in the FIB trie for this purpose, because that is very
expensive. However, I'm willing to entertain such an idea later
if this current scheme proves to have downsides that the FIB trie
variant would not have.

In order to accomodate this any such scheme, we need to be able to
produce a full flow key at PMTU/redirect time. That required an
adjustment of the interface call-sites used to propagate these events.

For a PMTU/redirect with a fully specified socket, we pass that socket
and use it to produce the flow key.

Otherwise we use a passed in SKB to formulate the key. There are two
cases that need to be distinguished, ICMP message processing (in which
case the IP header is at skb->data) and output packet processing
(mostly tunnels, and in all such cases the IP header is at ip_hdr(skb)).

We also have to make the code able to handle the case where the dst
itself passed into the dst_ops->{update_pmtu,redirect} method is
invalidated. This matters for calls from sockets that have cached
that route. We provide a inet{,6} helper function for this purpose,
and edit SCTP specially since it caches routes at the transport rather
than socket level.

Signed-off-by: David S. Miller

David S. Miller
2012-07-18 01:48:26 +0800

17 Jul, 2012

3 commits

6700c2709 net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() ... Browse Code »

This will be used so that we can compose a full flow key.

Even though we have a route in this context, we need more. In the
future the routes will be without destination address, source address,
etc. keying. One ipv4 route will cover entire subnets, etc.

In this environment we have to have a way to possess persistent storage
for redirects and PMTU information. This persistent storage will exist
in the FIB tables, and that's why we'll need to be able to rebuild a
full lookup flow key here. Using that flow key will do a fib_lookup()
and create/update the persistent entry.

Signed-off-by: David S. Miller

David S. Miller
2012-07-17 18:29:28 +0800
a858d64b7 ipv6: fix unappropriate errno returned for non-multicast address ... Browse Code »

We need to check the passed in multicast address and return
appropriate errno(EINVAL) if it is not valid. And it's no need
to walk through the ipv6_mc_list in this situation.

Signed-off-by: Li Wei
Signed-off-by: David S. Miller

Li Wei
2012-07-17 16:35:03 +0800
f0396f60d ipv6: fix RTPROT_RA markup of RA routes w/nexthops ... Browse Code »

Userspace implementations of network routing protocols sometimes need to
tell RA-originated IPv6 routes from other kernel routes to make proper
routing decisions. This makes most sense for RA routes with nexthops,
namely, default routes and Route Information routes.

The intended mean of preserving RA route origin in a netlink message is
through indicating RTPROT_RA as protocol code. Function rt6_fill_node()
tried to do that for default routes, but its test condition was taken
wrong. This change is modeled after the original mailing list posting
by Jeff Haran. It fixes the test condition for default route case and
sets the same behaviour for Route Information case (both types use
nexthops). Handling of the 3rd RA route type, Prefix Information, is
left unchanged, as it stands for interface connected routes (without
nexthops).

Signed-off-by: Denis Ovsienko
Signed-off-by: David S. Miller

Denis Ovsienko
2012-07-17 13:55:54 +0800

16 Jul, 2012

1 commit

35ad9b9cf ipv6: Add helper inet6_csk_update_pmtu(). ... Browse Code »

This is the ipv6 version of inet_csk_update_pmtu().

Signed-off-by: David S. Miller

David S. Miller
2012-07-16 18:44:56 +0800

14 Jul, 2012

1 commit

8104891b8 ipv6: Initialize the struct rt6_info behind the dst_enty field ... Browse Code »

We start initializing the struct rt6_info at the first field
behind the struct dst_enty. This is error prone because it
might leave a new field uninitialized. So start initializing
the struct rt6_info right behind the dst_entry.

Suggested-by: Eric Dumazet
Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller

Steffen Klassert
2012-07-14 15:29:12 +0800

12 Jul, 2012

9 commits

1ed5c48f2 net: Remove checks for dst_ops->redirect being NULL. ... Browse Code »

No longer necessary.

Signed-off-by: David S. Miller

David S. Miller
2012-07-12 15:41:25 +0800
b587ee3ba net: Add dummy dst_ops->redirect method where needed. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2012-07-12 15:39:24 +0800
b94f1c090 ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect(). ... Browse Code »

And delete rt6_redirect(), since it is no longer used.

Signed-off-by: David S. Miller

David S. Miller
2012-07-12 15:33:37 +0800
ec18d9a26 ipv6: Add redirect support to all protocol icmp error handlers. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2012-07-12 15:25:15 +0800
3a5ad2ee5 ipv6: Add ip6_redirect() and ip6_sk_redirect() helper functions. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2012-07-12 15:08:07 +0800
6e157b6ac ipv6: Pull main logic of rt6_redirect() into rt6_do_redirect(). ... Browse Code »

Hook it into dst_ops->redirect as well.

Signed-off-by: David S. Miller

David S. Miller
2012-07-12 15:05:02 +0800
e8599ff4b ipv6: Move bulk of redirect handling into rt6_redirect(). ... Browse Code »

This sets things up so that we can have the protocol error handlers
call down into the ipv6 route code for redirects just as ipv4 already
does.

Signed-off-by: David S. Miller

David S. Miller
2012-07-12 14:43:53 +0800
30f2a5f37 ipv6: Export ndisc option parsing from ndisc.c ... Browse Code »

This is going to be used internally by the rt6 redirect code.

Signed-off-by: David S. Miller

David S. Miller
2012-07-12 14:39:11 +0800
46d3ceabd tcp: TCP Small Queues ... Browse Code »

This introduce TSQ (TCP Small Queues)

TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.

sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.

TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.

As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.

This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.

Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.

Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)

I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.

As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.

If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.

[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
but some drivers call it in their start_xmit() handler.
These drivers should at least use BQL, or else a single TCP
session can still fill the whole NIC TX ring, since TSQ will
have no effect.

Signed-off-by: Eric Dumazet
Cc: Dave Taht
Cc: Tom Herbert
Cc: Matt Mathis
Cc: Yuchung Cheng
Cc: Nandita Dukkipati
Signed-off-by: David S. Miller

Eric Dumazet
2012-07-12 09:12:59 +0800

11 Jul, 2012

7 commits

2c53040f0 net: Fix (nearly-)kernel-doc comments for various functions ... Browse Code »

Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches.

Signed-off-by: Ben Hutchings
Signed-off-by: David S. Miller

Ben Hutchings
2012-07-11 14:13:45 +0800
87a50699c rtnetlink: Remove ts/tsage args to rtnl_put_cacheinfo(). ... Browse Code »

Nobody provides non-zero values any longer.

Signed-off-by: David S. Miller

David S. Miller
2012-07-11 13:40:13 +0800
3e12939a2 inet: Kill FLOWI_FLAG_PRECOW_METRICS. ... Browse Code »

No longer needed. TCP writes metrics, but now in it's own special
cache that does not dirty the route metrics. Therefore there is no
longer any reason to pre-cow metrics in this way.

Signed-off-by: David S. Miller

David S. Miller
2012-07-11 13:40:12 +0800
1d861aa4b inet: Minimize use of cached route inetpeer. ... Browse Code »

Only use it in the absolutely required cases:

1) COW'ing metrics

2) ipv4 PMTU

3) ipv4 redirects

Signed-off-by: David S. Miller

David S. Miller
2012-07-11 13:40:11 +0800
16d183990 inet: Remove ->get_peer() method. ... Browse Code »

No longer used.

Signed-off-by: David S. Miller

David S. Miller
2012-07-11 13:40:10 +0800
81166dd6f tcp: Move timestamps from inetpeer to metrics cache. ... Browse Code »

With help from Lin Ming.

Signed-off-by: David S. Miller

David S. Miller
2012-07-11 13:40:08 +0800
ab92bb2f6 tcp: Abstract back handling peer aliveness test into helper function. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2012-07-11 11:33:49 +0800

08 Jul, 2012

1 commit

d3a5ea6e2 Merge branch 'master' of git://1984.lsi.us.es/nf-next Browse Code »

David S. Miller
2012-07-08 07:18:50 +0800

06 Jul, 2012

3 commits

c56bf6fe7 ipv6: fix a bad cast in ip6_dst_lookup_tail() ... Browse Code »

Fix a bug in ip6_dst_lookup_tail(), where typeof(dst) is
"struct dst_entry **", not "struct dst_entry *"

Reported-by: Fengguang Wu
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-07-06 15:23:41 +0800
883dd4fb5 ipv6: remove redundant declarations ... Browse Code »

remove redundant declarations, they belong in include/net/tcp.h

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-07-06 14:40:28 +0800
a2de86f63 ipv6: Initialize the neighbour pointer of rt6_info on allocation ... Browse Code »

git commit 97cac082 (ipv6: Store route neighbour in rt6_info struct)
added a neighbour pointer to rt6_info. Currently we don't initialize
this pointer at allocation time. We assume this pointer to be valid
if it is not a null pointer, so initialize it on allocation.

Signed-off-by: Steffen Klassert
Signed-off-by: David S. Miller

Steffen Klassert
2012-07-06 05:20:07 +0800

05 Jul, 2012

6 commits

43264e0bd ipv6: remove unnecessary codes in tcp_ipv6.c ... Browse Code »

opt always equals np->opts, so it is meaningless to define opt, and
check if opt does not equal np->opts and then try to free opt.

Signed-off-by: RongQing.Li
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

RongQing.Li
2012-07-05 18:11:15 +0800
97cac0821 ipv6: Store route neighbour in rt6_info struct. ... Browse Code »

This makes for a simplified conversion away from dst_get_neighbour*().

All code outside of ipv6 will use neigh lookups via dst_neigh_lookup*().

Signed-off-by: David S. Miller

David S. Miller
2012-07-05 17:41:58 +0800
1d248b1cf net: Pass neighbours and dest address into NETEVENT_REDIRECT events. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2012-07-05 17:21:55 +0800
f894cbf84 net: Add optional SKB arg to dst_ops->neigh_lookup(). ... Browse Code »

Causes the handler to use the daddr in the ipv4/ipv6 header when
the route gateway is unspecified (local subnet).

Signed-off-by: David S. Miller

David S. Miller
2012-07-05 16:04:01 +0800
5110effee net: Do delayed neigh confirmation. ... Browse Code »

When a dst_confirm() happens, mark the confirmation as pending in the
dst. Then on the next packet out, when we have the neigh in-hand, do
the update.

This removes the dependency in dst_confirm() of dst's having an
attached neigh.

While we're here, remove the explicit 'dst' NULL check, all except 2
or 3 call sites ensure it's not NULL. So just fix those cases up.

Signed-off-by: David S. Miller

David S. Miller
2012-07-05 16:03:06 +0800
08911475d netfilter: nf_conntrack: generalize nf_ct_l4proto_net ... Browse Code »

This patch generalizes nf_ct_l4proto_net by splitting it into chunks and
moving the corresponding protocol part to where it really belongs to.

To clarify, note that we follow two different approaches to support per-net
depending if it's built-in or run-time loadable protocol tracker.

Signed-off-by: Pablo Neira Ayuso
Acked-by: Gao feng

Pablo Neira Ayuso
2012-07-05 01:37:22 +0800