16 Jan, 2016

1 commit

  • Skb_gso_segment() uses skb control block during segmentation.
    This patch adds 32-bytes room for previous control block which
    will be copied into all resulting segments.

    This patch fixes kernel crash during fragmenting forwarded packets.
    Fragmentation requires valid IP CB in skb for clearing ip options.
    Also patch removes custom save/restore in ovs code, now it's redundant.

    Signed-off-by: Konstantin Khlebnikov
    Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com
    Signed-off-by: David S. Miller

    Konstantin Khlebnikov
     

12 Jan, 2016

2 commits

  • Conflicts:
    drivers/net/bonding/bond_main.c
    drivers/net/ethernet/mellanox/mlxsw/spectrum.h
    drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c

    The bond_main.c and mellanox switch conflicts were cases of
    overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Commit acf8dd0a9d0b ("udp: only allow UFO for packets from SOCK_DGRAM
    sockets") disallows UFO for packets sent from raw sockets. We need to do
    the same also for SOCK_DGRAM sockets with SO_NO_CHECK options, even if
    for a bit different reason: while such socket would override the
    CHECKSUM_PARTIAL set by ip_ufo_append_data(), gso_size is still set and
    bad offloading flags warning is triggered in __skb_gso_segment().

    In the IPv6 case, SO_NO_CHECK option is ignored but we need to disallow
    UFO for packets sent by sockets with UDP_NO_CHECK6_TX option.

    Signed-off-by: Michal Kubecek
    Tested-by: Shannon Nelson
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Michal Kubeček
     

16 Dec, 2015

1 commit

  • These netif flags are unnecessary convolutions. It is more
    straightforward to just use NETIF_F_HW_CSUM, NETIF_F_IP_CSUM,
    and NETIF_F_IPV6_CSUM directly.

    This patch also:
    - Cleans up can_checksum_protocol
    - Simplifies netdev_intersect_features

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

01 Dec, 2015

1 commit


02 Nov, 2015

2 commits


19 Oct, 2015

1 commit

  • At the time of commit fff326990789 ("tcp: reflect SYN queue_mapping into
    SYNACK packets") we had little ways to cope with SYN floods.

    We no longer need to reflect incoming skb queue mappings, and instead
    can pick a TX queue based on cpu cooking the SYNACK, with normal XPS
    affinities.

    Note that all SYNACK retransmits were picking TX queue 0, this no longer
    is a win given that SYNACK rtx are now distributed on all cpus.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Oct, 2015

10 commits


30 Sep, 2015

2 commits


26 Sep, 2015

1 commit

  • This function is used to build and send SYNACK packets,
    possibly on behalf of unlocked listener socket.

    Make sure we did not miss a write by making this socket const.

    We no longer can use ip_select_ident() and have to either
    set iph->id to 0 or directly call __ip_select_ident()

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Sep, 2015

7 commits

  • This is immediately motivated by the bridge code that chains functions that
    call into netfilter. Without passing net into the okfns the bridge code would
    need to guess about the best expression for the network namespace to process
    packets in.

    As net is frequently one of the first things computed in continuation functions
    after netfilter has done it's job passing in the desired network namespace is in
    many cases a code simplification.

    To support this change the function dst_output_okfn is introduced to
    simplify passing dst_output as an okfn. For the moment dst_output_okfn
    just silently drops the struct net.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Pass a network namespace parameter into the netfilter hooks. At the
    call site of the netfilter hooks the path a packet is taking through
    the network stack is well known which allows the network namespace to
    be easily and reliabily.

    This allows the replacement of magic code like
    "dev_net(state->in?:state->out)" that appears at the start of most
    netfilter hooks with "state->net".

    In almost all cases the network namespace passed in is derived
    from the first network device passed in, guaranteeing those
    paths will not see any changes in practice.

    The exceptions are:
    xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm)
    ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp)
    ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp)
    ipv4/raw.c:raw_send_hdrinc() sock_net(sk)
    ipv6/ip6_output.c:ip6_xmit() sock_net(sk)
    ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev)
    ipv6/raw.c:raw6_send_hdrinc() sock_net(sk)
    br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev

    In all cases these exceptions seem to be a better expression for the
    network namespace the packet is being processed in then the historic
    "dev_net(in?in:out)". I am documenting them in case something odd
    pops up and someone starts trying to track down what happened.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This is a prepatory patch to passing net int the netfilter hooks,
    where net will be used again.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Add a sock paramter to dst_output making dst_output_sk superfluous.
    Add a skb->sk parameter to all of the callers of dst_output
    Have the callers of dst_output_sk call dst_output.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

14 Aug, 2015

1 commit


16 Jun, 2015

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    This a bit large (and late) patchset that contains Netfilter updates for
    net-next. Most relevantly br_netfilter fixes, ipset RCU support, removal of
    x_tables percpu ruleset copy and rework of the nf_tables netdev support. More
    specifically, they are:

    1) Warn the user when there is a better protocol conntracker available, from
    Marcelo Ricardo Leitner.

    2) Fix forwarding of IPv6 fragmented traffic in br_netfilter, from Bernhard
    Thaler. This comes with several patches to prepare the change in first place.

    3) Get rid of special mtu handling of PPPoE/VLAN frames for br_netfilter. This
    is not needed anymore since now we use the largest fragment size to
    refragment, from Florian Westphal.

    4) Restore vlan tag when refragmenting in br_netfilter, also from Florian.

    5) Get rid of the percpu ruleset copy in x_tables, from Florian. Plus another
    follow up patch to refine it from Eric Dumazet.

    6) Several ipset cleanups, fixes and finally RCU support, from Jozsef Kadlecsik.

    7) Get rid of parens in Netfilter Kconfig files.

    8) Attach the net_device to the basechain as opposed to the initial per table
    approach in the nf_tables netdev family.

    9) Subscribe to netdev events to detect the removal and registration of a
    device that is referenced by a basechain.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

13 Jun, 2015

1 commit


12 Jun, 2015

1 commit

  • since commit d6b915e29f4adea9
    ("ip_fragment: don't forward defragmented DF packet") the largest
    fragment size is available in the IPCB.

    Therefore we no longer need to care about 'encapsulation'
    overhead of stripped PPPOE/VLAN headers since ip_do_fragment
    doesn't use device mtu in such cases.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

28 May, 2015

2 commits

  • We currently always send fragments without DF bit set.

    Thus, given following setup:

    mtu1500 - mtu1500:1400 - mtu1400:1280 - mtu1280
    A R1 R2 B

    Where R1 and R2 run linux with netfilter defragmentation/conntrack
    enabled, then if Host A sent a fragmented packet _with_ DF set to B, R1
    will respond with icmp too big error if one of these fragments exceeded
    1400 bytes.

    However, if R1 receives fragment sizes 1200 and 100, it would
    forward the reassembled packet without refragmenting, i.e.
    R2 will send an icmp error in response to a packet that was never sent,
    citing mtu that the original sender never exceeded.

    The other minor issue is that a refragmentation on R1 will conceal the
    MTU of R2-B since refragmentation does not set DF bit on the fragments.

    This modifies ip_fragment so that we track largest fragment size seen
    both for DF and non-DF packets, and set frag_max_size to the largest
    value.

    If the DF fragment size is larger or equal to the non-df one, we will
    consider the packet a path mtu probe:
    We set DF bit on the reassembled skb and also tag it with a new IPCB flag
    to force refragmentation even if skb fits outdev mtu.

    We will also set DF bit on each fragment in this case.

    Joint work with Hannes Frederic Sowa.

    Reported-by: Jesse Gross
    Signed-off-by: Florian Westphal
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • ip_skb_dst_mtu is small inline helper, but its called in several places.

    before: 17061 44 0 17105 42d1 net/ipv4/ip_output.o
    after: 16805 44 0 16849 41d1 net/ipv4/ip_output.o

    Signed-off-by: Florian Westphal
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florian Westphal
     

25 May, 2015

1 commit


19 May, 2015

1 commit

  • When bridge netfilter re-fragments an IP packet for output, all
    packets that can not be re-fragmented to their original input size
    should be silently discarded.

    However, current bridge netfilter output path generates an ICMP packet
    with 'size exceeded MTU' message for such packets, this is a bug.

    This patch refactors the ip_fragment() API to allow two separate
    use cases. The bridge netfilter user case will not
    send ICMP, the routing output will, as before.

    Signed-off-by: Andy Zhou
    Acked-by: Florian Westphal
    Signed-off-by: David S. Miller

    Andy Zhou
     

14 May, 2015

1 commit

  • __ip_local_out_sk() is only used from net/ipv4/ip_output.c

    net/ipv4/ip_output.c:94:5: warning: symbol '__ip_local_out_sk' was not
    declared. Should it be static?

    Fixes: 7026b1ddb6b8 ("netfilter: Pass socket pointer down through okfn().")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Apr, 2015

2 commits

  • Signed-off-by: Sheng Yong
    Signed-off-by: David S. Miller

    Sheng Yong
     
  • On the output paths in particular, we have to sometimes deal with two
    socket contexts. First, and usually skb->sk, is the local socket that
    generated the frame.

    And second, is potentially the socket used to control a tunneling
    socket, such as one the encapsulates using UDP.

    We do not want to disassociate skb->sk when encapsulating in order
    to fix this, because that would break socket memory accounting.

    The most extreme case where this can cause huge problems is an
    AF_PACKET socket transmitting over a vxlan device. We hit code
    paths doing checks that assume they are dealing with an ipv4
    socket, but are actually operating upon the AF_PACKET one.

    Signed-off-by: David S. Miller

    David Miller
     

04 Apr, 2015

1 commit

  • The ipv4 code uses a mixture of coding styles. In some instances check
    for non-NULL pointer is done as x != NULL and sometimes as x. x is
    preferred according to checkpatch and this patch makes the code
    consistent by adopting the latter form.

    No changes detected by objdiff.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris