19 Mar, 2011

1 commit

  • Whenever we enter the IP stack proper from bridge netfilter we
    need to ensure that the skb is in a form the IP stack expects
    it to be in.

    The entry point on NF_FORWARD did not meet the requirements of
    the IP stack, therefore leading to potential crashes/panics.

    This patch fixes the problem.

    Signed-off-by: Herbert Xu
    Acked-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Herbert Xu
     

13 Mar, 2011

1 commit


03 Mar, 2011

1 commit


11 Dec, 2010

1 commit

  • The nf_pre_routing functions in bridging have collected two
    distinct ways of returning NF_DROP over the years, inline and
    via goto. There is no reason for preferring either one.

    So this patch arbitrarily picks the inline variant and converts
    the all the gotos.

    Also removes a redundant comment.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

10 Dec, 2010

1 commit

  • Use helper functions to hide all direct accesses, especially writes,
    to dst_entry metrics values.

    This will allow us to:

    1) More easily change how the metrics are stored.

    2) Implement COW for metrics.

    In particular this will help us put metrics into the inetpeer
    cache if that is what we end up doing. We can make the _metrics
    member a pointer instead of an array, initially have it point
    at the read-only metrics in the FIB, and then on the first set
    grab an inetpeer entry and point the _metrics member there.

    Signed-off-by: David S. Miller
    Acked-by: Eric Dumazet

    David S. Miller
     

18 Nov, 2010

1 commit


16 Nov, 2010

1 commit

  • The macro br_port_exists() is not enough protection when only
    RCU is being used. There is a tiny race where other CPU has cleared port
    handler hook, but is bridge port flag might still be set.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     

21 Oct, 2010

2 commits


12 Oct, 2010

1 commit

  • struct dst_ops tracks number of allocated dst in an atomic_t field,
    subject to high cache line contention in stress workload.

    Switch to a percpu_counter, to reduce number of time we need to dirty a
    central location. Place it on a separate cache line to avoid dirtying
    read only fields.

    Stress test :

    (Sending 160.000.000 UDP frames,
    IP route cache disabled, dual E5540 @2.53GHz,
    32bit kernel, FIB_TRIE, SLUB/NUMA)

    Before:

    real 0m51.179s
    user 0m15.329s
    sys 10m15.942s

    After:

    real 0m45.570s
    user 0m15.525s
    sys 9m56.669s

    With a small reordering of struct neighbour fields, subject of a
    following patch, (to separate refcnt from other read mostly fields)

    real 0m41.841s
    user 0m15.261s
    sys 8m45.949s

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Sep, 2010

1 commit

  • Related dicussion here : http://lkml.org/lkml/2010/9/3/16

    Introduce a function br_parse_ip_options that will audit the
    skb and possibly refill IP options before a packet enters the
    IP stack. If no options are present, the function will zero out
    the skb cb area so that it is not misinterpreted as options by some
    unsuspecting IP layer routine. If packet consistency fails, drop it.

    Signed-off-by: Bandan Das
    Signed-off-by: David S. Miller

    Bandan Das
     

02 Sep, 2010

1 commit


24 Aug, 2010

1 commit


08 Jul, 2010

2 commits


03 Jul, 2010

1 commit


02 Jul, 2010

1 commit

  • Support more fine grained control of bridge netfilter iptables invocation
    by adding seperate brnf_call_*tables parameters for each device using the
    sysfs interface. Packets are passed to layer 3 netfilter when either the
    global parameter or the per bridge parameter is enabled.

    Acked-by: Stephen Hemminger
    Acked-by: David S. Miller
    Signed-off-by: Patrick McHardy

    Patrick McHardy
     

16 Jun, 2010

2 commits


15 Jun, 2010

1 commit


11 Jun, 2010

1 commit


01 Jun, 2010

1 commit


13 May, 2010

1 commit

  • [ 4593.956206] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
    [ 4593.956219] IP: [] br_nf_forward_finish+0x154/0x170 [bridge]
    [ 4593.956232] PGD 195ece067 PUD 1ba005067 PMD 0
    [ 4593.956241] Oops: 0000 [#1] SMP
    [ 4593.956248] last sysfs file:
    /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:08/ATK0110:00/hwmon/hwmon0/temp2_label
    [ 4593.956253] CPU 3
    ...
    [ 4593.956380] Pid: 29512, comm: kvm Not tainted 2.6.34-rc7-net #195 P6T DELUXE/System Product Name
    [ 4593.956384] RIP: 0010:[] [] br_nf_forward_finish+0x154/0x170 [bridge]
    [ 4593.956395] RSP: 0018:ffff880001e63b78 EFLAGS: 00010246
    [ 4593.956399] RAX: 0000000000000608 RBX: ffff880057181700 RCX: ffff8801b813d000
    [ 4593.956402] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff880057181700
    [ 4593.956406] RBP: ffff880001e63ba8 R08: ffff8801b9d97000 R09: ffffffffa0335650
    [ 4593.956410] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801b813d000
    [ 4593.956413] R13: ffffffff81ab3940 R14: ffff880057181700 R15: 0000000000000002
    [ 4593.956418] FS: 00007fc40d380710(0000) GS:ffff880001e60000(0000) knlGS:0000000000000000
    [ 4593.956422] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
    [ 4593.956426] CR2: 0000000000000018 CR3: 00000001ba1d7000 CR4: 00000000000026e0
    [ 4593.956429] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 4593.956433] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [ 4593.956437] Process kvm (pid: 29512, threadinfo ffff8801ba566000, task ffff8801b8003870)
    [ 4593.956441] Stack:
    [ 4593.956443] 0000000100000020 ffff880001e63ba0 ffff880001e63ba0 ffff880057181700
    [ 4593.956451] ffffffffa0335650 ffffffff81ab3940 ffff880001e63bd8 ffffffffa03350e6
    [ 4593.956462] ffff880001e63c40 000000000000024d ffff880057181700 0000000080000000
    [ 4593.956474] Call Trace:
    [ 4593.956478]
    [ 4593.956488] [] ? br_nf_forward_finish+0x0/0x170 [bridge]
    [ 4593.956496] [] NF_HOOK_THRESH+0x56/0x60 [bridge]
    [ 4593.956504] [] br_nf_forward_arp+0x112/0x120 [bridge]
    [ 4593.956511] [] nf_iterate+0x64/0xa0
    [ 4593.956519] [] ? br_forward_finish+0x0/0x60 [bridge]
    [ 4593.956524] [] nf_hook_slow+0x6c/0x100
    [ 4593.956531] [] ? br_forward_finish+0x0/0x60 [bridge]
    [ 4593.956538] [] ? __br_forward+0x0/0xc0 [bridge]
    [ 4593.956545] [] __br_forward+0x6d/0xc0 [bridge]
    [ 4593.956550] [] ? skb_clone+0x3e/0x70
    [ 4593.956557] [] deliver_clone+0x32/0x60 [bridge]
    [ 4593.956564] [] br_flood+0xa6/0xe0 [bridge]
    [ 4593.956571] [] ? __br_forward+0x0/0xc0 [bridge]

    Don't call nf_bridge_update_protocol() for ARP traffic as skb->nf_bridge isn't
    used in the ARP case.

    Reported-by: Stephen Hemminger
    Signed-off-by: Bart De Schuymer
    Signed-off-by: Patrick McHardy

    Bart De Schuymer
     

20 Apr, 2010

2 commits

  • The MTU for IP traffic encapsulated inside PPPoE traffic is smaller
    than the MTU of the Ethernet device (1500). Connection tracking
    gathers all IP packets and sometimes will refragment them in
    ip_fragment(). We then need to subtract the length of the
    encapsulating header from the mtu used in ip_fragment(). The check in
    br_nf_dev_queue_xmit() which determines if ip_fragment() has to be
    called is also updated for the PPPoE-encapsulated packets.
    nf_bridge_copy_header() is also updated to make sure the PPPoE data
    length field has the correct value.

    Signed-off-by: Bart De Schuymer
    Signed-off-by: Patrick McHardy

    Bart De Schuymer
     
  • Conflicts:
    Documentation/feature-removal-schedule.txt
    net/ipv6/netfilter/ip6t_REJECT.c
    net/netfilter/xt_limit.c

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     

15 Apr, 2010

2 commits

  • - fix IP DNAT on vlan- or pppoe-encapsulated traffic: The functions
    neigh_hh_output() or dst->neighbour->output() overwrite the complete
    Ethernet header, although we only need the destination MAC address.
    For encapsulated packets, they ended up overwriting the encapsulating
    header. The new code copies the Ethernet source MAC address and
    protocol number before calling dst->neighbour->output(). The Ethernet
    source MAC and protocol number are copied back in place in
    br_nf_pre_routing_finish_bridge_slow(). This also makes the IP DNAT
    more transparent because in the old scheme the source MAC of the
    bridge was copied into the source address in the Ethernet header. We
    also let skb->protocol equal ETH_P_IP resp. ETH_P_IPV6 during the
    execution of the PF_INET resp. PF_INET6 hooks.

    - Speed up IP DNAT by calling neigh_hh_bridge() instead of
    neigh_hh_output(): if dst->hh is available, we already know the MAC
    address so we can just copy it.

    Signed-off-by: Bart De Schuymer
    Signed-off-by: Patrick McHardy

    Bart De Schuymer
     
  • Remove br_netfilter.c::br_nf_local_out(). The function
    br_nf_local_out() was needed because the PF_BRIDGE::LOCAL_OUT hook
    could be called when IP DNAT happens on to-be-bridged traffic. The
    new scheme eliminates this mess.

    Signed-off-by: Bart De Schuymer
    Signed-off-by: Patrick McHardy

    Bart De Schuymer
     

13 Apr, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

25 Mar, 2010

1 commit

  • The first argument to NF_HOOK* is an nfproto since quite some time.
    Commit v2.6.27-2457-gfdc9314 was the first to practically start using
    the new names. Do that now for the remaining NF_HOOK calls.

    The semantic patch used was:
    //
    @@
    @@
    (NF_HOOK
    |NF_HOOK_THRESH
    )(
    -PF_BRIDGE,
    +NFPROTO_BRIDGE,
    ...)

    @@
    @@
    NF_HOOK(
    -PF_INET6,
    +NFPROTO_IPV6,
    ...)

    @@
    @@
    NF_HOOK(
    -PF_INET,
    +NFPROTO_IPV4,
    ...)
    //

    Signed-off-by: Jan Engelhardt

    Jan Engelhardt
     

12 Nov, 2009

1 commit

  • Now that sys_sysctl is a compatiblity wrapper around /proc/sys
    all sysctl strategy routines, and all ctl_name and strategy
    entries in the sysctl tables are unused, and can be
    revmoed.

    In addition neigh_sysctl_register has been modified to no longer
    take a strategy argument and it's callers have been modified not
    to pass one.

    Cc: "David Miller"
    Cc: Hideaki YOSHIFUJI
    Cc: netdev@vger.kernel.org
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

24 Sep, 2009

1 commit

  • It's unused.

    It isn't needed -- read or write flag is already passed and sysctl
    shouldn't care about the rest.

    It _was_ used in two places at arch/frv for some reason.

    Signed-off-by: Alexey Dobriyan
    Cc: David Howells
    Cc: "Eric W. Biederman"
    Cc: Al Viro
    Cc: Ralf Baechle
    Cc: Martin Schwidefsky
    Cc: Ingo Molnar
    Cc: "David S. Miller"
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

25 Aug, 2009

1 commit

  • commit f216f082b2b37c4943f1e7c393e2786648d48f6f
    ([NETFILTER]: bridge netfilter: deal with martians correctly)
    added a refcount leak on in_dev.

    Instead of using in_dev_get(), we can use __in_dev_get_rcu(),
    as netfilter hooks are running under rcu_read_lock(), as pointed
    by Patrick.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Patrick McHardy

    Eric Dumazet
     

06 Jul, 2009

1 commit


03 Jun, 2009

2 commits

  • Define three accessors to get/set dst attached to a skb

    struct dst_entry *skb_dst(const struct sk_buff *skb)

    void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

    void skb_dst_drop(struct sk_buff *skb)
    This one should replace occurrences of :
    dst_release(skb->dst)
    skb->dst = NULL;

    Delete skb->dst field

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb

    Delete skb->rtable field

    Setting rtable is not allowed, just set dst instead as rtable is an alias.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Apr, 2009

1 commit

  • br_nf_dev_queue_xmit only checks for ETH_P_IP packets for fragmenting but not
    VLAN packets. This results in dropping of large VLAN packets. This can be
    observed when connection tracking is enabled. Connection tracking re-assembles
    fragmented packets, and these have to re-fragmented when transmitting out. Also,
    make sure only refragmented packets are defragmented as per suggestion from
    Patrick McHardy.

    Signed-off-by: Saikiran Madugula
    Signed-off-by: Patrick McHardy

    hummerbliss@gmail.com
     

01 Feb, 2009

1 commit


13 Jan, 2009

2 commits

  • The PPPOE/VLAN processing code in the bridge netfilter is broken
    by design. The VLAN tag and the PPPOE session ID are an integral
    part of the packet flow information, yet they're completely
    ignored by the bridge netfilter. This is potentially a security
    hole as it treats all VLANs and PPPOE sessions as the same.

    What's more, it's actually broken for PPPOE as the bridge netfilter
    tries to trim the packets to the IP length without adjusting the
    PPPOE header (and adjusting the PPPOE header isn't much better
    since the PPPOE peer may require the padding to be present).

    Therefore we should disable this by default.

    It does mean that people relying on this feature may lose networking
    depending on how their bridge netfilter rules are configured.
    However, IMHO the problems this code causes are serious enough to
    warrant this.

    Signed-off-by: Herbert Xu
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Currently the bridge FORWARD/POST_ROUTING chains treats all
    non-IPv4 packets as IPv6. This packet fixes that by returning
    NF_ACCEPT on non-IP packets instead, just as is done in PRE_ROUTING.

    Signed-off-by: Herbert Xu
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Herbert Xu