19 Mar, 2011
1 commit
-
Whenever we enter the IP stack proper from bridge netfilter we
need to ensure that the skb is in a form the IP stack expects
it to be in.The entry point on NF_FORWARD did not meet the requirements of
the IP stack, therefore leading to potential crashes/panics.This patch fixes the problem.
Signed-off-by: Herbert Xu
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller
13 Mar, 2011
1 commit
-
The idea here is this minimizes the number of places one has to edit
in order to make changes to how flows are defined and used.Signed-off-by: David S. Miller
03 Mar, 2011
1 commit
-
Instead of on the stack.
Signed-off-by: David S. Miller
11 Dec, 2010
1 commit
-
The nf_pre_routing functions in bridging have collected two
distinct ways of returning NF_DROP over the years, inline and
via goto. There is no reason for preferring either one.So this patch arbitrarily picks the inline variant and converts
the all the gotos.Also removes a redundant comment.
Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
10 Dec, 2010
1 commit
-
Use helper functions to hide all direct accesses, especially writes,
to dst_entry metrics values.This will allow us to:
1) More easily change how the metrics are stored.
2) Implement COW for metrics.
In particular this will help us put metrics into the inetpeer
cache if that is what we end up doing. We can make the _metrics
member a pointer instead of an array, initially have it point
at the read-only metrics in the FIB, and then on the first set
grab an inetpeer entry and point the _metrics member there.Signed-off-by: David S. Miller
Acked-by: Eric Dumazet
18 Nov, 2010
1 commit
-
Use the macros defined for the members of flowi to clean the code up.
Signed-off-by: Changli Gao
Signed-off-by: David S. Miller
16 Nov, 2010
1 commit
-
The macro br_port_exists() is not enough protection when only
RCU is being used. There is a tiny race where other CPU has cleared port
handler hook, but is bridge port flag might still be set.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller
21 Oct, 2010
2 commits
-
Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller -
An upcoming commit will allow packets with hardware vlan acceleration
information to be passed though more parts of the network stack, including
packets trunked through the bridge. This adds support for matching and
filtering those packets through ebtables.Signed-off-by: Jesse Gross
Signed-off-by: David S. Miller
12 Oct, 2010
1 commit
-
struct dst_ops tracks number of allocated dst in an atomic_t field,
subject to high cache line contention in stress workload.Switch to a percpu_counter, to reduce number of time we need to dirty a
central location. Place it on a separate cache line to avoid dirtying
read only fields.Stress test :
(Sending 160.000.000 UDP frames,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_TRIE, SLUB/NUMA)Before:
real 0m51.179s
user 0m15.329s
sys 10m15.942sAfter:
real 0m45.570s
user 0m15.525s
sys 9m56.669sWith a small reordering of struct neighbour fields, subject of a
following patch, (to separate refcnt from other read mostly fields)real 0m41.841s
user 0m15.261s
sys 8m45.949sSigned-off-by: Eric Dumazet
Signed-off-by: David S. Miller
20 Sep, 2010
1 commit
-
Related dicussion here : http://lkml.org/lkml/2010/9/3/16
Introduce a function br_parse_ip_options that will audit the
skb and possibly refill IP options before a packet enters the
IP stack. If no options are present, the function will zero out
the skb cb area so that it is not misinterpreted as options by some
unsuspecting IP layer routine. If packet consistency fails, drop it.Signed-off-by: Bandan Das
Signed-off-by: David S. Miller
02 Sep, 2010
1 commit
-
In a similar vain to commit 17762060c25590bfddd68cc1131f28ec720f405f
("bridge: Clear IPCB before possible entry into IP stack")Any time we call into the IP stack we have to make sure the state
there is as expected by the ipv4 code.With help from Eric Dumazet and Herbert Xu.
Reported-by: Bandan Das
Signed-off-by: David S. Miller
24 Aug, 2010
1 commit
-
nf_bridge_alloc() always reset the skb->nf_bridge, so we should always
put the old one.Signed-off-by: Changli Gao
Signed-off-by: Bart De Schuymer
Signed-off-by: David S. Miller
08 Jul, 2010
2 commits
-
The bridge protocol lives dangerously by having incestuous relations
with the IP stack. In this instance an abomination has been created
where a bogus IPCB area from a bridged packet leads to a crash in
the IP stack because it's interpreted as IP options.This patch papers over the problem by clearing the IPCB area in that
particular spot. To fix this properly we'd also need to parse any
IP options if present but I'm way too lazy for that.Signed-off-by: Herbert Xu
Cheers,
Signed-off-by: David S. Miller
03 Jul, 2010
1 commit
02 Jul, 2010
1 commit
-
Support more fine grained control of bridge netfilter iptables invocation
by adding seperate brnf_call_*tables parameters for each device using the
sysfs interface. Packets are passed to layer 3 netfilter when either the
global parameter or the per bridge parameter is enabled.Acked-by: Stephen Hemminger
Acked-by: David S. Miller
Signed-off-by: Patrick McHardy
16 Jun, 2010
2 commits
-
Register net_bridge_port pointer as rx_handler data pointer. As br_port is
removed from struct net_device, another netdev priv_flag is added to indicate
the device serves as a bridge port. Also rcuized pointers are now correctly
dereferenced in br_fdb.c and in netfilter parts.Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller
15 Jun, 2010
1 commit
-
Conflicts:
include/net/netfilter/xt_rateest.h
net/bridge/br_netfilter.c
net/netfilter/nf_conntrack_core.cSigned-off-by: Patrick McHardy
11 Jun, 2010
1 commit
-
remove useless union keyword in rtable, rt6_info and dn_route.
Since there is only one member in a union, the union keyword isn't useful.
Signed-off-by: Changli Gao
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
01 Jun, 2010
1 commit
-
Avoid dirtying bridge_parent_rtable refcount, using new dst noref
infrastructure.Signed-off-by: Eric Dumazet
Signed-off-by: Patrick McHardy
13 May, 2010
1 commit
-
[ 4593.956206] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 4593.956219] IP: [] br_nf_forward_finish+0x154/0x170 [bridge]
[ 4593.956232] PGD 195ece067 PUD 1ba005067 PMD 0
[ 4593.956241] Oops: 0000 [#1] SMP
[ 4593.956248] last sysfs file:
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:08/ATK0110:00/hwmon/hwmon0/temp2_label
[ 4593.956253] CPU 3
...
[ 4593.956380] Pid: 29512, comm: kvm Not tainted 2.6.34-rc7-net #195 P6T DELUXE/System Product Name
[ 4593.956384] RIP: 0010:[] [] br_nf_forward_finish+0x154/0x170 [bridge]
[ 4593.956395] RSP: 0018:ffff880001e63b78 EFLAGS: 00010246
[ 4593.956399] RAX: 0000000000000608 RBX: ffff880057181700 RCX: ffff8801b813d000
[ 4593.956402] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff880057181700
[ 4593.956406] RBP: ffff880001e63ba8 R08: ffff8801b9d97000 R09: ffffffffa0335650
[ 4593.956410] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801b813d000
[ 4593.956413] R13: ffffffff81ab3940 R14: ffff880057181700 R15: 0000000000000002
[ 4593.956418] FS: 00007fc40d380710(0000) GS:ffff880001e60000(0000) knlGS:0000000000000000
[ 4593.956422] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
[ 4593.956426] CR2: 0000000000000018 CR3: 00000001ba1d7000 CR4: 00000000000026e0
[ 4593.956429] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4593.956433] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 4593.956437] Process kvm (pid: 29512, threadinfo ffff8801ba566000, task ffff8801b8003870)
[ 4593.956441] Stack:
[ 4593.956443] 0000000100000020 ffff880001e63ba0 ffff880001e63ba0 ffff880057181700
[ 4593.956451] ffffffffa0335650 ffffffff81ab3940 ffff880001e63bd8 ffffffffa03350e6
[ 4593.956462] ffff880001e63c40 000000000000024d ffff880057181700 0000000080000000
[ 4593.956474] Call Trace:
[ 4593.956478]
[ 4593.956488] [] ? br_nf_forward_finish+0x0/0x170 [bridge]
[ 4593.956496] [] NF_HOOK_THRESH+0x56/0x60 [bridge]
[ 4593.956504] [] br_nf_forward_arp+0x112/0x120 [bridge]
[ 4593.956511] [] nf_iterate+0x64/0xa0
[ 4593.956519] [] ? br_forward_finish+0x0/0x60 [bridge]
[ 4593.956524] [] nf_hook_slow+0x6c/0x100
[ 4593.956531] [] ? br_forward_finish+0x0/0x60 [bridge]
[ 4593.956538] [] ? __br_forward+0x0/0xc0 [bridge]
[ 4593.956545] [] __br_forward+0x6d/0xc0 [bridge]
[ 4593.956550] [] ? skb_clone+0x3e/0x70
[ 4593.956557] [] deliver_clone+0x32/0x60 [bridge]
[ 4593.956564] [] br_flood+0xa6/0xe0 [bridge]
[ 4593.956571] [] ? __br_forward+0x0/0xc0 [bridge]Don't call nf_bridge_update_protocol() for ARP traffic as skb->nf_bridge isn't
used in the ARP case.Reported-by: Stephen Hemminger
Signed-off-by: Bart De Schuymer
Signed-off-by: Patrick McHardy
20 Apr, 2010
2 commits
-
The MTU for IP traffic encapsulated inside PPPoE traffic is smaller
than the MTU of the Ethernet device (1500). Connection tracking
gathers all IP packets and sometimes will refragment them in
ip_fragment(). We then need to subtract the length of the
encapsulating header from the mtu used in ip_fragment(). The check in
br_nf_dev_queue_xmit() which determines if ip_fragment() has to be
called is also updated for the PPPoE-encapsulated packets.
nf_bridge_copy_header() is also updated to make sure the PPPoE data
length field has the correct value.Signed-off-by: Bart De Schuymer
Signed-off-by: Patrick McHardy -
Conflicts:
Documentation/feature-removal-schedule.txt
net/ipv6/netfilter/ip6t_REJECT.c
net/netfilter/xt_limit.cSigned-off-by: Patrick McHardy
15 Apr, 2010
2 commits
-
- fix IP DNAT on vlan- or pppoe-encapsulated traffic: The functions
neigh_hh_output() or dst->neighbour->output() overwrite the complete
Ethernet header, although we only need the destination MAC address.
For encapsulated packets, they ended up overwriting the encapsulating
header. The new code copies the Ethernet source MAC address and
protocol number before calling dst->neighbour->output(). The Ethernet
source MAC and protocol number are copied back in place in
br_nf_pre_routing_finish_bridge_slow(). This also makes the IP DNAT
more transparent because in the old scheme the source MAC of the
bridge was copied into the source address in the Ethernet header. We
also let skb->protocol equal ETH_P_IP resp. ETH_P_IPV6 during the
execution of the PF_INET resp. PF_INET6 hooks.- Speed up IP DNAT by calling neigh_hh_bridge() instead of
neigh_hh_output(): if dst->hh is available, we already know the MAC
address so we can just copy it.Signed-off-by: Bart De Schuymer
Signed-off-by: Patrick McHardy -
Remove br_netfilter.c::br_nf_local_out(). The function
br_nf_local_out() was needed because the PF_BRIDGE::LOCAL_OUT hook
could be called when IP DNAT happens on to-be-bridged traffic. The
new scheme eliminates this mess.Signed-off-by: Bart De Schuymer
Signed-off-by: Patrick McHardy
13 Apr, 2010
1 commit
-
bridge-netfilter: cleanup br_netfilter.c
- remove some of the graffiti at the head of br_netfilter.c
- remove __br_dnat_complain()
- remove KERN_INFO messages when CONFIG_NETFILTER_DEBUG is definedSigned-off-by: Bart De Schuymer
Signed-off-by: Patrick McHardy
30 Mar, 2010
1 commit
-
…it slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
25 Mar, 2010
1 commit
-
The first argument to NF_HOOK* is an nfproto since quite some time.
Commit v2.6.27-2457-gfdc9314 was the first to practically start using
the new names. Do that now for the remaining NF_HOOK calls.The semantic patch used was:
//
@@
@@
(NF_HOOK
|NF_HOOK_THRESH
)(
-PF_BRIDGE,
+NFPROTO_BRIDGE,
...)@@
@@
NF_HOOK(
-PF_INET6,
+NFPROTO_IPV6,
...)@@
@@
NF_HOOK(
-PF_INET,
+NFPROTO_IPV4,
...)
//Signed-off-by: Jan Engelhardt
12 Nov, 2009
1 commit
-
Now that sys_sysctl is a compatiblity wrapper around /proc/sys
all sysctl strategy routines, and all ctl_name and strategy
entries in the sysctl tables are unused, and can be
revmoed.In addition neigh_sysctl_register has been modified to no longer
take a strategy argument and it's callers have been modified not
to pass one.Cc: "David Miller"
Cc: Hideaki YOSHIFUJI
Cc: netdev@vger.kernel.org
Signed-off-by: Eric W. Biederman
24 Sep, 2009
1 commit
-
It's unused.
It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.It _was_ used in two places at arch/frv for some reason.
Signed-off-by: Alexey Dobriyan
Cc: David Howells
Cc: "Eric W. Biederman"
Cc: Al Viro
Cc: Ralf Baechle
Cc: Martin Schwidefsky
Cc: Ingo Molnar
Cc: "David S. Miller"
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Aug, 2009
1 commit
-
commit f216f082b2b37c4943f1e7c393e2786648d48f6f
([NETFILTER]: bridge netfilter: deal with martians correctly)
added a refcount leak on in_dev.Instead of using in_dev_get(), we can use __in_dev_get_rcu(),
as netfilter hooks are running under rcu_read_lock(), as pointed
by Patrick.Signed-off-by: Eric Dumazet
Signed-off-by: Patrick McHardy
06 Jul, 2009
1 commit
-
No functional change -- just for easier reading.
Signed-off-by: Cyrill Gorcunov
Signed-off-by: David S. Miller
03 Jun, 2009
2 commits
-
Define three accessors to get/set dst attached to a skb
struct dst_entry *skb_dst(const struct sk_buff *skb)
void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;Delete skb->dst field
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb
Delete skb->rtable field
Setting rtable is not allowed, just set dst instead as rtable is an alias.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
20 Apr, 2009
1 commit
-
br_nf_dev_queue_xmit only checks for ETH_P_IP packets for fragmenting but not
VLAN packets. This results in dropping of large VLAN packets. This can be
observed when connection tracking is enabled. Connection tracking re-assembles
fragmented packets, and these have to re-fragmented when transmitting out. Also,
make sure only refragmented packets are defragmented as per suggestion from
Patrick McHardy.Signed-off-by: Saikiran Madugula
Signed-off-by: Patrick McHardy
01 Feb, 2009
1 commit
-
Base versions handle constant folding now.
Signed-off-by: Harvey Harrison
Signed-off-by: David S. Miller
13 Jan, 2009
2 commits
-
The PPPOE/VLAN processing code in the bridge netfilter is broken
by design. The VLAN tag and the PPPOE session ID are an integral
part of the packet flow information, yet they're completely
ignored by the bridge netfilter. This is potentially a security
hole as it treats all VLANs and PPPOE sessions as the same.What's more, it's actually broken for PPPOE as the bridge netfilter
tries to trim the packets to the IP length without adjusting the
PPPOE header (and adjusting the PPPOE header isn't much better
since the PPPOE peer may require the padding to be present).Therefore we should disable this by default.
It does mean that people relying on this feature may lose networking
depending on how their bridge netfilter rules are configured.
However, IMHO the problems this code causes are serious enough to
warrant this.Signed-off-by: Herbert Xu
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller -
Currently the bridge FORWARD/POST_ROUTING chains treats all
non-IPv4 packets as IPv6. This packet fixes that by returning
NF_ACCEPT on non-IP packets instead, just as is done in PRE_ROUTING.Signed-off-by: Herbert Xu
Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller