12 Dec, 2011
1 commit
-
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
06 Dec, 2011
1 commit
-
To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.Signed-off-by: David S. Miller
Acked-by: Roland Dreier
19 Nov, 2011
1 commit
-
ip_gre: Set needed_headroom dynamically again
Now that all needed_headroom users have been fixed up so that
we can safely increase needed_headroom, this patch restore the
dynamic update of needed_headroom.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
09 Nov, 2011
1 commit
-
Tunnels can force an alignment of their percpu data to reduce number of
cache lines used in fast path, or read in .ndo_get_stats()percpu_alloc() is a very fine grained allocator, so any small hole will
be used anyway.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
21 Oct, 2011
1 commit
-
It seems ip_gre is able to change dev->needed_headroom on the fly.
Its is not legal unfortunately and triggers a BUG in raw_sendmsg()
skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev)
< another cpu change dev->needed_headromm (making it bigger)
...
skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev));We end with LL_RESERVED_SPACE() being bigger than LL_ALLOCATED_SPACE()
-> we crash later because skb head is exhausted.Bug introduced in commit 243aad83 in 2.6.34 (ip_gre: include route
header_len in max_headroom calculation)Reported-by: Elmar Vonlanthen
Signed-off-by: Eric Dumazet
CC: Timo Teräs
CC: Herbert Xu
Signed-off-by: David S. Miller
18 Jul, 2011
1 commit
-
dst_{get,set}_neighbour()
Signed-off-by: David S. Miller
06 May, 2011
1 commit
-
Force dev_alloc_name() to be called from register_netdevice() by
dev_get_valid_name(). That allows to remove multiple explicit
dev_alloc_name() calls.The possibility to call dev_alloc_name in advance remains.
This also fixes veth creation regresion caused by
84c49d8c3e4abefb0a41a77b25aa37ebe8d6b743Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller
05 May, 2011
1 commit
-
First, make callers pass on-stack flowi4 to ip_route_output_gre()
so they can get at the fully resolved flow key.Next, use that in ipgre_tunnel_xmit() to avoid the need to use
rt->rt_{dst,src}.Signed-off-by: David S. Miller
23 Apr, 2011
1 commit
-
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
13 Mar, 2011
1 commit
-
The idea here is this minimizes the number of places one has to edit
in order to make changes to how flows are defined and used.Signed-off-by: David S. Miller
11 Mar, 2011
1 commit
-
Conflicts:
drivers/net/bnx2x/bnx2x_cmn.c
10 Mar, 2011
1 commit
-
Since a8f80e8ff94ecba629542d9b4b5f5a8ee3eb565c any process with
CAP_NET_ADMIN may load any module from /lib/modules/. This doesn't mean
that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are
limited to /lib/modules/**. However, CAP_NET_ADMIN capability shouldn't
allow anybody load any module not related to networking.This patch restricts an ability of autoloading modules to netdev modules
with explicit aliases. This fixes CVE-2011-1019.Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior
of loading netdev modules by name (without any prefix) for processes
with CAP_SYS_MODULE to maintain the compatibility with network scripts
that use autoloading netdev modules by aliases like "eth0", "wlan0".Currently there are only three users of the feature in the upstream
kernel: ipip, ip_gre and sit.root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) --
root@albatros:~# grep Cap /proc/$$/status
CapInh: 0000000000000000
CapPrm: fffffff800001000
CapEff: fffffff800001000
CapBnd: fffffff800001000
root@albatros:~# modprobe xfs
FATAL: Error inserting xfs
(/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted
root@albatros:~# lsmod | grep xfs
root@albatros:~# ifconfig xfs
xfs: error fetching interface information: Device not found
root@albatros:~# lsmod | grep xfs
root@albatros:~# lsmod | grep sit
root@albatros:~# ifconfig sit
sit: error fetching interface information: Device not found
root@albatros:~# lsmod | grep sit
root@albatros:~# ifconfig sit0
sit0 Link encap:IPv6-in-IPv4
NOARP MTU:1480 Metric:1root@albatros:~# lsmod | grep sit
sit 10457 0
tunnel4 2957 1 sitFor CAP_SYS_MODULE module loading is still relaxed:
root@albatros:~# grep Cap /proc/$$/status
CapInh: 0000000000000000
CapPrm: ffffffffffffffff
CapEff: ffffffffffffffff
CapBnd: ffffffffffffffff
root@albatros:~# ifconfig xfs
xfs: error fetching interface information: Device not found
root@albatros:~# lsmod | grep xfs
xfs 745319 0Reference: https://lkml.org/lkml/2011/2/24/203
Signed-off-by: Vasiliy Kulikov
Signed-off-by: Michael Tokarev
Acked-by: David S. Miller
Acked-by: Kees Cook
Signed-off-by: James Morris
03 Mar, 2011
1 commit
-
Instead of on the stack.
Signed-off-by: David S. Miller
12 Feb, 2011
1 commit
-
Commit 5811662b15db018c740c57d037523683fd3e6123 ("net: use the macros
defined for the members of flowi") accidentally removed the setting of
IPPROTO_GRE from the struct flowi in ipgre_tunnel_xmit. This patch
restores it.Signed-off-by: Steffen Klassert
Acked-by: Changli Gao
Signed-off-by: David S. Miller
13 Dec, 2010
2 commits
-
Always go through a new ip4_dst_hoplimit() helper, just like ipv6.
This allowed several simplifications:
1) The interim dst_metric_hoplimit() can go as it's no longer
userd.2) The sysctl_ip_default_ttl entry no longer needs to use
ipv4_doint_and_flush, since the sysctl is not cached in
routing cache metrics any longer.3) ipv4_doint_and_flush no longer needs to be exported and
therefore can be marked static.When ipv4_doint_and_flush_strategy was removed some time ago,
the external declaration in ip.h was mistakenly left around
so kill that off too.We have to move the sysctl_ip_default_ttl declaration into
ipv4's route cache definition header net/route.h, because
currently net/ip.h (where the declaration lives now) has
a back dependency on net/route.hSigned-off-by: David S. Miller
-
Signed-off-by: David S. Miller
10 Dec, 2010
1 commit
-
Use helper functions to hide all direct accesses, especially writes,
to dst_entry metrics values.This will allow us to:
1) More easily change how the metrics are stored.
2) Implement COW for metrics.
In particular this will help us put metrics into the inetpeer
cache if that is what we end up doing. We can make the _metrics
member a pointer instead of an array, initially have it point
at the read-only metrics in the FIB, and then on the first set
grab an inetpeer entry and point the _metrics member there.Signed-off-by: David S. Miller
Acked-by: Eric Dumazet
02 Dec, 2010
2 commits
-
If gre is built as a module the 'ip tunnel add' command would fail because
the ip_gre module was not being autoloaded. Adding an alias for
the gre0 device name cause dev_load() to autoload it when needed.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller -
Use strcpy() rather the sprintf() for the case where name is getting
generated. Fix indentation.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller
18 Nov, 2010
1 commit
-
Use the macros defined for the members of flowi to clean the code up.
Signed-off-by: Changli Gao
Signed-off-by: David S. Miller
16 Nov, 2010
1 commit
-
The GRE Key field is intended to be used for identifying an individual
traffic flow within a tunnel. It is useful to be able to have XFRM
policy selector matches to have different policies for different
GRE tunnels.Signed-off-by: Timo Teräs
Signed-off-by: David S. Miller
12 Nov, 2010
1 commit
-
When we test rt->fl.iif against zero, we're seeing if it's
an output or an input route.Make that explicit with some helper functions.
Signed-off-by: David S. Miller
31 Oct, 2010
1 commit
-
Before making the fallback tunnel visible to lookups, we should make
sure it is completely setup, once ipgre_tunnel_init() had been called
and tstats per_cpu pointer allocated.move rcu_assign_pointer(ign->tunnels_wc[0], tunnel); from
ipgre_fb_tunnel_init() to ipgre_init_net()Based on a patch from Pavel Emelyanov
Reported-by: Pavel Emelyanov
Signed-off-by: Eric Dumazet
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller
28 Oct, 2010
1 commit
-
After making rcu protection for tunnels (ipip, gre, sit and ip6) a bug
was introduced into the SIOCCHGTUNNEL code.The tunnel is first unlinked, then addresses change, then it is linked
back probably into another bucket. But while changing the parms, the
hash table is unlocked to readers and they can lookup the improper tunnel.Respective commits are b7285b79 (ipip: get rid of ipip_lock), 1507850b
(gre: get rid of ipgre_lock), 3a43be3c (sit: get rid of ipip6_lock) and
94767632 (ip6tnl: get rid of ip6_tnl_lock).The quick fix is to wait for quiescent state to pass after unlinking,
but if it is inappropriate I can invent something better, just let me
know.Signed-off-by: Pavel Emelyanov
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
19 Oct, 2010
1 commit
-
Convert inetdev_by_index() to not increment in_dev refcount.
Callers hold RCU or RTNL, and should not decrement in_dev refcount.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
06 Oct, 2010
1 commit
-
In various situations, a device provides a packet to our stack and we
drop it before it enters protocol stack :
- softnet backlog full (accounted in /proc/net/softnet_stat)
- bad vlan tag (not accounted)
- unknown/unregistered protocol (not accounted)We can handle a per-device counter of such dropped frames at core level,
and automatically adds it to the device provided stats (rx_dropped), so
that standard tools can be used (ifconfig, ip link, cat /proc/net/dev)This is a generalization of commit 8990f468a (net: rx_dropped
accounting), thus reverting it.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
30 Sep, 2010
2 commits
-
HARD_TX_LOCK no longer protects tunnels from dead loops,
but xmit_recursion percpu counter.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX
Note: If tunnels are created with the "oseq" option, LLTX is not
enabled :Even using an atomic_t o_seq, we would increase chance for packets being
out of order at receiver.Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one gre tunnel (size:200 bytes per frame)Before patch :
real 3m0.094s
user 0m9.365s
sys 47m50.103sAfter patch:
real 0m29.756s
user 0m11.097s
sys 7m33.012sLast problem to solve is the contention on dst :
38660.00 21.4% __ip_route_output_key vmlinux
20786.00 11.5% dst_release vmlinux
14191.00 7.8% __xfrm_lookup vmlinux
12410.00 6.9% ip_finish_output vmlinux
4540.00 2.5% ip_push_pending_frames vmlinux
4427.00 2.4% ip_append_data vmlinux
4265.00 2.4% __alloc_skb vmlinux
4140.00 2.3% __ip_local_out vmlinux
3991.00 2.2% dev_queue_xmit vmlinuxSigned-off-by: Eric Dumazet
Signed-off-by: David S. Miller
28 Sep, 2010
1 commit
-
Le lundi 27 septembre 2010 à 14:29 +0100, Ben Hutchings a écrit :
> > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> > index 5d6ddcb..de39b22 100644
> > --- a/net/ipv4/ip_gre.c
> > +++ b/net/ipv4/ip_gre.c
> [...]
> > @@ -377,7 +405,7 @@ static struct ip_tunnel *ipgre_tunnel_locate(struct net *net,
> > if (parms->name[0])
> > strlcpy(name, parms->name, IFNAMSIZ);
> > else
> > - sprintf(name, "gre%%d");
> > + strcpy(name, "gre%d");
> >
> > dev = alloc_netdev(sizeof(*t), name, ipgre_tunnel_setup);
> > if (!dev)
> [...]
>
> This is a valid fix, but doesn't belong in this patch!
>Sorry ? It was not a fix, but at most a cleanup ;)
Anyway I forgot the gretap case...
[PATCH 2/4 v2] ip_gre: percpu stats accounting
Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.
Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
27 Sep, 2010
1 commit
-
Conflicts:
drivers/net/qlcnic/qlcnic_init.c
net/ipv4/ip_output.c
24 Sep, 2010
1 commit
-
Change "return (EXPR);" to "return EXPR;"
return is not a function, parentheses are not required.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
21 Sep, 2010
2 commits
-
Under load, netif_rx() can drop incoming packets but administrators dont
have a chance to spot which device needs some tuning (RPS activation for
example)This patch adds rx_dropped accounting in vlans and tunnels.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
ipv6 can be a module, we should test CONFIG_IPV6 and CONFIG_IPV6_MODULE
to enable ipv6 bits in ip_gre.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
16 Sep, 2010
1 commit
-
As RTNL is held while doing tunnels inserts and deletes, we can remove
ipgre_lock spinlock. My initial RCU conversion was conservative and
converted the rwlock to spinlock, with no RTNL requirement.Use appropriate rcu annotations and modern lockdep checks as well.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
22 Aug, 2010
1 commit
-
PPP: introduce "pptp" module which implements point-to-point tunneling protocol using pppox framework
NET: introduce the "gre" module for demultiplexing GRE packets on version criteria
(required to pptp and ip_gre may coexists)
NET: ip_gre: update to use the "gre" moduleThis patch introduces then pptp support to the linux kernel which
dramatically speeds up pptp vpn connections and decreases cpu usage in
comparison of existing user-space implementation
(poptop/pptpclient). There is accel-pptp project
(https://sourceforge.net/projects/accel-pptp/) to utilize this module,
it contains plugin for pppd to use pptp in client-mode and modified
pptpd (poptop) to build high-performance pptp NAS.There was many changes from initial submitted patch, most important are:
1. using rcu instead of read-write locks
2. using static bitmap instead of dynamically allocated
3. using vmalloc for memory allocation instead of BITS_PER_LONG + __get_free_pages
4. fixed many coding style issues
Thanks to Eric Dumazet.Signed-off-by: Dmitry Kozlov
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
09 Jul, 2010
1 commit
-
This patch makes IPV6 over IPv4 GRE tunnel propagate the transport
class field from the underlying IPV6 header to the IPV4 Type Of Service
field. Without the patch, all IPV6 packets in tunnel look the same to QoS.This assumes that IPV6 transport class is exactly the same
as IPv4 TOS. Not sure if that is always the case? Maybe need
to mask off some bits.The mask and shift to get tclass is copied from ipv6/datagram.c
Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller
11 Jun, 2010
1 commit
-
remove useless union keyword in rtable, rt6_info and dn_route.
Since there is only one member in a union, the union keyword isn't useful.
Signed-off-by: Changli Gao
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
18 May, 2010
2 commits
-
This patch removes from net/ (but not any netfilter files)
all the unnecessary return; statements that precede the
last closing brace of void functions.It does not remove the returns that are immediately
preceded by a label as gcc doesn't like that.Done via:
$ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'Signed-off-by: Joe Perches
Signed-off-by: David S. Miller -
skb rxhash should be cleared when a skb is handled by a tunnel before
being delivered again, so that correct packet steering can take place.There are other cleanups and accounting that we can factorize in a new
helper, skb_tunnel_rx()Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
30 Mar, 2010
1 commit
-
…it slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>