17 Apr, 2014

3 commits

  • Because the netdevice may be in another netns than the i/o netns, we should
    use the i/o netns instead of dev_net(dev).

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • Because the netdevice may be in another netns than the i/o netns, we should
    use the i/o netns instead of dev_net(dev).

    Note that netdev_priv(dev) cannot bu NULL, hence we can remove these useless
    checks.

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • As suggested by Julian:

    Simply, flowi4_iif must not contain 0, it does not
    look logical to ignore all ip rules with specified iif.

    because in fib_rule_match() we do:

    if (rule->iifindex && (rule->iifindex != fl->flowi_iif))
    goto out;

    flowi4_iif should be LOOPBACK_IFINDEX by default.

    We need to move LOOPBACK_IFINDEX to include/net/flow.h:

    1) It is mostly used by flowi_iif

    2) Fix the following compile error if we use it in flow.h
    by the patches latter:

    In file included from include/linux/netfilter.h:277:0,
    from include/net/netns/netfilter.h:5,
    from include/net/net_namespace.h:21,
    from include/linux/netdevice.h:43,
    from include/linux/icmpv6.h:12,
    from include/linux/ipv6.h:61,
    from include/net/ipv6.h:16,
    from include/linux/sunrpc/clnt.h:27,
    from include/linux/nfs_fs.h:30,
    from init/do_mounts.c:32:
    include/net/flow.h: In function ‘flowi4_init_output’:
    include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function)

    Cc: Eric Biederman
    Cc: Julian Anastasov
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

16 Apr, 2014

3 commits

  • It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but
    this is unsafe, the module always supposes that this device exists. For example,
    ip6gre_tunnel_lookup() may use it unconditionally.

    Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we
    let ip6gre_destroy_tunnels() do the job).

    Introduced by commit c12b395a4664 ("gre: Support GRE over IPv6").

    CC: Dmitry Kozlov
    Signed-off-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • In the dst->output() path for ipv4, the code assumes the skb it has to
    transmit is attached to an inet socket, specifically via
    ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
    provider of the packet is an AF_PACKET socket.

    The dst->output() method gets an additional 'struct sock *sk'
    parameter. This needs a cascade of changes so that this parameter can
    be propagated from vxlan to final consumer.

    Fixes: 8f646c922d55 ("vxlan: keep original skb ownership")
    Reported-by: lucien xin
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • ip_queue_xmit() assumes the skb it has to transmit is attached to an
    inet socket. Commit 31c70d5956fc ("l2tp: keep original skb ownership")
    changed l2tp to not change skb ownership and thus broke this assumption.

    One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(),
    so that we do not assume skb->sk points to the socket used by l2tp
    tunnel.

    Fixes: 31c70d5956fc ("l2tp: keep original skb ownership")
    Reported-by: Zhan Jianyu
    Tested-by: Zhan Jianyu
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

15 Apr, 2014

1 commit

  • Francois reported that setting big mtu on loopback device could prevent
    tcp sessions making progress.

    We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets.

    We must limit the IPv6 MTU to (65535 + 40) bytes in theory.

    Tested:

    ifconfig lo mtu 70000
    netperf -H ::1

    Before patch : Throughput : 0.05 Mbits

    After patch : Throughput : 35484 Mbits

    Reported-by: Francois WELLENREITER
    Signed-off-by: Eric Dumazet
    Acked-by: YOSHIFUJI Hideaki
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Apr, 2014

1 commit

  • net-next commit 9c76a11, ipv6: tcp_ipv6 policy route issue, had
    a boolean logic error that caused incorrect behaviour for TCP
    SYN+ACK when oif-based rules are in use. Specifically:

    1. If a SYN comes in from a global address, and sk_bound_dev_if
    is not set, the routing lookup has oif set to the interface
    the SYN came in on. Instead, it should have oif unset,
    because for global addresses, the incoming interface doesn't
    necessarily have any bearing on the interface the SYN+ACK is
    sent out on.
    2. If a SYN comes in from a link-local address, and
    sk_bound_dev_if is set, the routing lookup has oif set to the
    interface the SYN came in on. Instead, it should have oif set
    to sk_bound_dev_if, because that's what the application
    requested.

    Signed-off-by: Lorenzo Colitti
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Lorenzo Colitti
     

05 Apr, 2014

1 commit

  • All xtables variants suffer from the defect that the copy_to_user()
    to copy the counters to user memory may fail after the table has
    already been exchanged and thus exposed. Return an error at this
    point will result in freeing the already exposed table. Any
    subsequent packet processing will result in a kernel panic.

    We can't copy the counters before exposing the new tables as we
    want provide the counter state after the old table has been
    unhooked. Therefore convert this into a silent error.

    Cc: Florian Westphal
    Signed-off-by: Thomas Graf
    Signed-off-by: Pablo Neira Ayuso

    Thomas Graf
     

01 Apr, 2014

5 commits

  • After commit c15b1ccadb323ea ("ipv6: move DAD and addrconf_verify
    processing to workqueue") some counters are now updated in process context
    and thus need to disable bh before doing so, otherwise deadlocks can
    happen on 32-bit archs. Fabio Estevam noticed this while while mounting
    a NFS volume on an ARM board.

    As a compensation for missing this I looked after the other *_STATS_BH
    and found three other calls which need updating:

    1) icmp6_send: ip6_fragment -> icmpv6_send -> icmp6_send (error handling)
    2) ip6_push_pending_frames: rawv6_sendmsg -> rawv6_push_pending_frames -> ...
    (only in case of icmp protocol with raw sockets in error handling)
    3) ping6_v6_sendmsg (error handling)

    Fixes: c15b1ccadb323ea ("ipv6: move DAD and addrconf_verify processing to workqueue")
    Reported-by: Fabio Estevam
    Tested-by: Fabio Estevam
    Cc: Eric Dumazet
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • First off, we don't need to check for non-NULL rt any more, as we are
    guaranteed to always get a valid rt6_info. Drop the check.

    In case we couldn't allocate an inet_peer for fragmentation information
    we currently generate strictly incrementing fragmentation ids for all
    destination. This is done to maximize the cycle and avoid collisions.

    Those fragmentation ids are very predictable. At least we should try to
    mix in the destination address.

    While it should make no difference to simply use a PRNG at this point,
    secure_ipv6_id ensures that we don't leak information from prandom,
    so its internal state could be recoverable.

    This fallback function should normally not get used thus this should
    not affect performance at all. It is just meant as a safety net.

    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • The issue raises when adding policy route, specify a particular
    NIC as oif, the policy route did not take effect. The reason is
    that fl6.oif is not set and route map failed. From the
    tcp_v6_send_response function, if the binding address is linklocal,
    fl6.oif is set, but not for global address.

    Acked-by: Hannes Frederic Sowa
    Signed-off-by: Wang Yufen
    Signed-off-by: David S. Miller

    Wang Yufen
     
  • Move the whole rt6_need_strict as static inline into ip6_route.h,
    so that it can be reused

    Signed-off-by: Wang Yufen
    Signed-off-by: David S. Miller

    Wang Yufen
     
  • Signed-off-by: Wang Yufen
    Signed-off-by: David S. Miller

    Wang Yufen
     

30 Mar, 2014

4 commits


29 Mar, 2014

1 commit

  • addrconf_join_solict and addrconf_join_anycast may cause actions which
    need rtnl locked, especially on first address creation.

    A new DAD state is introduced which defers processing of the initial
    DAD processing into a workqueue.

    To get rtnl lock we need to push the code paths which depend on those
    calls up to workqueues, specifically addrconf_verify and the DAD
    processing.

    (v2)
    addrconf_dad_failure needs to be queued up to the workqueue, too. This
    patch introduces a new DAD state and stop the DAD processing in the
    workqueue (this is because of the possible ipv6_del_addr processing
    which removes the solicited multicast address from the device).

    addrconf_verify_lock is removed, too. After the transition it is not
    needed any more.

    As we are not processing in bottom half anymore we need to be a bit more
    careful about disabling bottom half out when we lock spin_locks which are also
    used in bh.

    Relevant backtrace:
    [ 541.030090] RTNL: assertion failed at net/core/dev.c (4496)
    [ 541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 3.10.33-1-amd64-vyatta #1
    [ 541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
    [ 541.031146] ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8
    [ 541.031148] 0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18
    [ 541.031150] 0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000
    [ 541.031152] Call Trace:
    [ 541.031153] [] ? dump_stack+0xd/0x17
    [ 541.031180] [] ? __dev_set_promiscuity+0x101/0x180
    [ 541.031183] [] ? __hw_addr_create_ex+0x60/0xc0
    [ 541.031185] [] ? __dev_set_rx_mode+0xaa/0xc0
    [ 541.031189] [] ? __dev_mc_add+0x61/0x90
    [ 541.031198] [] ? igmp6_group_added+0xfc/0x1a0 [ipv6]
    [ 541.031208] [] ? kmem_cache_alloc+0xcb/0xd0
    [ 541.031212] [] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6]
    [ 541.031216] [] ? addrconf_join_solict+0x2e/0x40 [ipv6]
    [ 541.031219] [] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6]
    [ 541.031223] [] ? addrconf_join_anycast+0x92/0xa0 [ipv6]
    [ 541.031226] [] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6]
    [ 541.031229] [] ? ipv6_ifa_notify+0x33/0x50 [ipv6]
    [ 541.031233] [] ? addrconf_dad_completed+0x28/0x100 [ipv6]
    [ 541.031241] [] ? task_cputime+0x2d/0x50
    [ 541.031244] [] ? addrconf_dad_timer+0x136/0x150 [ipv6]
    [ 541.031247] [] ? addrconf_dad_completed+0x100/0x100 [ipv6]
    [ 541.031255] [] ? call_timer_fn.isra.22+0x2a/0x90
    [ 541.031258] [] ? addrconf_dad_completed+0x100/0x100 [ipv6]

    Hunks and backtrace stolen from a patch by Stephen Hemminger.

    Reported-by: Stephen Hemminger
    Signed-off-by: Stephen Hemminger
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

28 Mar, 2014

1 commit

  • If an IPv6 host route with metrics exists, an attempt to add a
    new route for the same target with different metrics fails but
    rewrites the metrics anyway:

    12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
    12sp0:~ # ip -6 route show
    fe80::/64 dev eth0 proto kernel metric 256
    fec0::1 dev eth0 metric 1024 rto_min lock 1s
    12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1500
    RTNETLINK answers: File exists
    12sp0:~ # ip -6 route show
    fe80::/64 dev eth0 proto kernel metric 256
    fec0::1 dev eth0 metric 1024 rto_min lock 1.5s

    This is caused by all IPv6 host routes using the metrics in
    their inetpeer (or the shared default). This also holds for the
    new route created in ip6_route_add() which shares the metrics
    with the already existing route and thus ip6_route_add()
    rewrites the metrics even if the new route ends up not being
    used at all.

    Another problem is that old metrics in inetpeer can reappear
    unexpectedly for a new route, e.g.

    12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
    12sp0:~ # ip route del fec0::1
    12sp0:~ # ip route add fec0::1 dev eth0
    12sp0:~ # ip route change fec0::1 dev eth0 hoplimit 10
    12sp0:~ # ip -6 route show
    fe80::/64 dev eth0 proto kernel metric 256
    fec0::1 dev eth0 metric 1024 hoplimit 10 rto_min lock 1s

    Resolve the first problem by moving the setting of metrics down
    into fib6_add_rt2node() to the point we are sure we are
    inserting the new route into the tree. Second problem is
    addressed by introducing new flag DST_METRICS_FORCE_OVERWRITE
    which is set for a new host route in ip6_route_add() and makes
    ipv6_cow_metrics() always overwrite the metrics in inetpeer
    (even if they are not "new"); it is reset after that.

    v5: use a flag in _metrics member rather than one in flags

    v4: fix a typo making a condition always true (thanks to Hannes
    Frederic Sowa)

    v3: rewritten based on David Miller's idea to move setting the
    metrics (and allocation in non-host case) down to the point we
    already know the route is to be inserted. Also rebased to
    net-next as it is quite late in the cycle.

    Signed-off-by: Michal Kubecek
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Michal Kubeček
     

26 Mar, 2014

1 commit


21 Mar, 2014

1 commit

  • Commit 812e44dd1829 ("ip6mr: advertise new mfc entries via rtnl") reuses the
    function ip6mr_fill_mroute() to notify mfc events.
    But this function was used only for dump and thus was always setting the
    flag NLM_F_MULTI, which is wrong in case of a single notification.

    Libraries like libnl will wait forever for NLMSG_DONE.

    CC: Thomas Graf
    Signed-off-by: Nicolas Dichtel
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

19 Mar, 2014

2 commits

  • In ip6_append_data_mtu(), when the xfrm mode is not tunnel(such as
    transport),the ipsec header need to be added in the first fragment, so the mtu
    will decrease to reserve space for it, then the second fragment come, the mtu
    should be turn back, as the commit 0c1833797a5a6ec23ea9261d979aa18078720b74
    said. however, in the commit a493e60ac4bbe2e977e7129d6d8cbb0dd236be, it use
    *mtu = min(*mtu, ...) to change the mtu, which lead to the new mtu is alway
    equal with the first fragment's. and cannot turn back.

    when I test through ping6 -c1 -s5000 $ip (mtu=1280):
    ...frag (0|1232) ESP(spi=0x00002000,seq=0xb), length 1232
    ...frag (1232|1216)
    ...frag (2448|1216)
    ...frag (3664|1216)
    ...frag (4880|164)

    which should be:
    ...frag (0|1232) ESP(spi=0x00001000,seq=0x1), length 1232
    ...frag (1232|1232)
    ...frag (2464|1232)
    ...frag (3696|1232)
    ...frag (4928|116)

    so delete the min() when change back the mtu.

    Signed-off-by: Xin Long
    Fixes: 75a493e60ac4bb ("ipv6: ip6_append_data_mtu did not care about pmtudisc and frag_size")
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    lucien
     
  • Steffen Klassert says:

    ====================
    One patch to rename a newly introduced struct. The rest is
    the rework of the IPsec virtual tunnel interface for ipv6 to
    support inter address family tunneling and namespace crossing.

    1) Rename the newly introduced struct xfrm_filter to avoid a
    conflict with iproute2. From Nicolas Dichtel.

    2) Introduce xfrm_input_afinfo to access the address family
    dependent tunnel callback functions properly.

    3) Add and use a IPsec protocol multiplexer for ipv6.

    4) Remove dst_entry caching. vti can lookup multiple different
    dst entries, dependent of the configured xfrm states. Therefore
    it does not make to cache a dst_entry.

    5) Remove caching of flow informations. vti6 does not use the the
    tunnel endpoint addresses to do route and xfrm lookups.

    6) Update the vti6 to use its own receive hook.

    7) Remove the now unused xfrm_tunnel_notifier. This was used from vti
    and is replaced by the IPsec protocol multiplexer hooks.

    8) Support inter address family tunneling for vti6.

    9) Check if the tunnel endpoints of the xfrm state and the vti interface
    are matching and return an error otherwise.

    10) Enable namespace crossing for vti devices.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

15 Mar, 2014

2 commits


14 Mar, 2014

12 commits

  • vti6 is now fully namespace aware, so allow namespace changing
    for vti devices.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • The tunnel endpoints of the xfrm_state we got from the xfrm_lookup
    must match the tunnel endpoints of the vti interface. This patch
    ensures this matching.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • With this patch we can tunnel ipv4 traffic via a vti6
    interface. A vti6 interface can now have an ipv4 address
    and ipv4 traffic can be routed via a vti6 interface.
    The resulting traffic is xfrm transformed and tunneled
    through ipv6 if matching IPsec policies and states are
    present.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • This was used from vti and is replaced by the IPsec protocol
    multiplexer hooks. It is now unused, so remove it.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • With this patch, vti6 uses the IPsec protocol multiplexer to
    register its own receive side hooks for ESP, AH and IPCOMP.

    Vti6 now does the following on receive side:

    1. Do an input policy check for the IPsec packet we received.
    This is required because this packet could be already
    prosecces by IPsec, so an inbuond policy check is needed.

    2. Mark the packet with the i_key. The policy and the state
    must match this key now. Policy and state belong to the vti
    namespace and policy enforcement is done at the further layers.

    3. Call the generic xfrm layer to do decryption and decapsulation.

    4. Wait for a callback from the xfrm layer to properly clean the
    skb to not leak informations on namespace transitions and
    update the device statistics.

    On transmit side:

    1. Mark the packet with the o_key. The policy and the state
    must match this key now.

    2. Do a xfrm_lookup on the original packet with the mark applied.

    3. Check if we got an IPsec route.

    4. Clean the skb to not leak informations on namespace
    transitions.

    5. Attach the dst_enty we got from the xfrm_lookup to the skb.

    6. Call dst_output to do the IPsec processing.

    7. Do the device statistics.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • Unlike ip6_tunnel, vti6 does not use the the tunnel
    endpoint addresses to do route and xfrm lookups.
    So no need to cache the flow informations. It also
    does not make sense to calculate the mtu based on
    such flow informations, so remove this too.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • Unlike ip6_tunnel, vti6 can lookup multiple different dst entries,
    dependent of the configured xfrm states. Therefore it does not make
    sense to cache a dst_entry.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • Switch ipcomp6 to use the new IPsec protocol multiplexer.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • Switch ah6 to use the new IPsec protocol multiplexer.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • Switch esp6 to use the new IPsec protocol multiplexer.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • This patch adds an IPsec protocol multiplexer for ipv6. With
    this it is possible to add alternative protocol handlers, as
    needed for IPsec virtual tunnel interfaces.

    Signed-off-by: Steffen Klassert

    Steffen Klassert
     
  • tmp_prefered_lft is an offset to ifp->tstamp, not now. Therefore
    age needs to be added to the condition.

    Age calculation in ipv6_create_tempaddr is different from the one
    in addrconf_verify and doesn't consider ADDRCONF_TIMER_FUZZ_MINUS.
    This can cause age in ipv6_create_tempaddr to be less than the one
    in addrconf_verify and therefore unnecessary temporary address to
    be generated.
    Use age calculation as in addrconf_modify to avoid this.

    Signed-off-by: Heiner Kallweit
    Signed-off-by: David S. Miller

    Heiner Kallweit
     

11 Mar, 2014

1 commit


07 Mar, 2014

1 commit

  • DST_NOCOUNT should only be used if an authorized user adds routes
    locally. In case of routes which are added on behalf of router
    advertisments this flag must not get used as it allows an unlimited
    number of routes getting added remotely.

    Signed-off-by: Sabrina Dubroca
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Sabrina Dubroca