26 Mar, 2015

1 commit


03 Jun, 2014

1 commit

  • Ideally, we would need to generate IP ID using a per destination IP
    generator.

    linux kernels used inet_peer cache for this purpose, but this had a huge
    cost on servers disabling MTU discovery.

    1) each inet_peer struct consumes 192 bytes

    2) inetpeer cache uses a binary tree of inet_peer structs,
    with a nominal size of ~66000 elements under load.

    3) lookups in this tree are hitting a lot of cache lines, as tree depth
    is about 20.

    4) If server deals with many tcp flows, we have a high probability of
    not finding the inet_peer, allocating a fresh one, inserting it in
    the tree with same initial ip_id_count, (cf secure_ip_id())

    5) We garbage collect inet_peer aggressively.

    IP ID generation do not have to be 'perfect'

    Goal is trying to avoid duplicates in a short period of time,
    so that reassembly units have a chance to complete reassembly of
    fragments belonging to one message before receiving other fragments
    with a recycled ID.

    We simply use an array of generators, and a Jenkin hash using the dst IP
    as a key.

    ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
    belongs (it is only used from this file)

    secure_ip_id() and secure_ipv6_id() no longer are needed.

    Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
    unnecessary decrement/increment of the number of segments.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Feb, 2014

1 commit


01 Oct, 2013

1 commit

  • Conflicts:
    include/net/xfrm.h

    Simple conflict between Joe Perches "extern" removal for function
    declarations in header files and the changes in Steffen's tree.

    Steffen Klassert says:

    ====================
    Two patches that are left from the last development cycle.
    Manual merging of include/net/xfrm.h is needed. The conflict
    can be solved as it is currently done in linux-next.

    1) We announce the creation of temporary acquire state via an asyc event,
    so the deletion should be annunced too. From Nicolas Dichtel.

    2) The VTI tunnels do not real tunning, they just provide a routable
    IPsec tunnel interface. So introduce and use xfrm_tunnel_notifier
    instead of xfrm_tunnel for xfrm tunnel mode callback. From Fan Du.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

20 Sep, 2013

1 commit

  • If local fragmentation is allowed, then ip_select_ident() and
    ip_select_ident_more() need to generate unique IDs to ensure
    correct defragmentation on the peer.

    For example, if IPsec (tunnel mode) has to encrypt large skbs
    that have local_df bit set, then all IP fragments that belonged
    to different ESP datagrams would have used the same identificator.
    If one of these IP fragments would get lost or reordered, then
    peer could possibly stitch together wrong IP fragments that did
    not belong to the same datagram. This would lead to a packet loss
    or data corruption.

    Signed-off-by: Ansis Atteka
    Signed-off-by: David S. Miller

    Ansis Atteka
     

28 Aug, 2013

1 commit

  • Some thoughts on IPv4 VTI implementation:

    The connection between VTI receiving part and xfrm tunnel mode input process
    is hardly a "xfrm_tunnel", xfrm_tunnel is used in places where, e.g ipip/sit
    and xfrm4_tunnel, acts like a true "tunnel" device.

    In addition, IMHO, VTI doesn't need vti_err to do something meaningful, as all
    VTI needs is just a notifier to be called whenever xfrm_input ingress a packet
    to update statistics.

    A IPsec protected packet is first handled by protocol handlers, e.g AH/ESP,
    to check packet authentication or encryption rightness. PMTU update is taken
    care of in this stage by protocol error handler.

    Then the packet is rearranged properly depending on whether it's transport
    mode or tunnel mode packed by mode "input" handler. The VTI handler code
    takes effects in this stage in tunnel mode only. So it neither need propagate
    PMTU, as it has already been done if necessary, nor the VTI handler is
    qualified as a xfrm_tunnel.

    So this patch introduces xfrm_tunnel_notifier and meanwhile wipe out vti_err
    code.

    Signed-off-by: Fan Du
    Cc: Steffen Klassert
    Cc: David S. Miller
    Reviewed-by: Saurabh Mohan
    Signed-off-by: Steffen Klassert

    Fan Du
     

06 Mar, 2013

1 commit

  • By default, DSCP is copying during encapsulation.
    Copying the DSCP in IPsec tunneling may be a bit dangerous because packets with
    different DSCP may get reordered relative to each other in the network and then
    dropped by the remote IPsec GW if the reordering becomes too big compared to the
    replay window.

    It is possible to avoid this copy with netfilter rules, but it's very convenient
    to be able to configure it for each SA directly.

    This patch adds a toogle for this purpose. By default, it's not set to maintain
    backward compatibility.

    Field flags in struct xfrm_usersa_info is full, hence I add a new attribute.

    Signed-off-by: Nicolas Dichtel
    Signed-off-by: Steffen Klassert

    Nicolas Dichtel
     

19 Feb, 2013

1 commit


16 Feb, 2013

1 commit


19 Jul, 2012

1 commit


24 Feb, 2012

1 commit

  • Niccolo Belli reported ipsec crashes in case we handle a frame without
    mac header (atm in his case)

    Before copying mac header, better make sure it is present.

    Bugzilla reference: https://bugzilla.kernel.org/show_bug.cgi?id=42809

    Reported-by: Niccolò Belli
    Tested-by: Niccolò Belli
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Dec, 2010

2 commits

  • Always go through a new ip4_dst_hoplimit() helper, just like ipv6.

    This allowed several simplifications:

    1) The interim dst_metric_hoplimit() can go as it's no longer
    userd.

    2) The sysctl_ip_default_ttl entry no longer needs to use
    ipv4_doint_and_flush, since the sysctl is not cached in
    routing cache metrics any longer.

    3) ipv4_doint_and_flush no longer needs to be exported and
    therefore can be marked static.

    When ipv4_doint_and_flush_strategy was removed some time ago,
    the external declaration in ip.h was mistakenly left around
    so kill that off too.

    We have to move the sysctl_ip_default_ttl declaration into
    ipv4's route cache definition header net/route.h, because
    currently net/ip.h (where the declaration lives now) has
    a back dependency on net/route.h

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: David S. Miller

    David S. Miller
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

03 Jun, 2009

1 commit

  • Define three accessors to get/set dst attached to a skb

    struct dst_entry *skb_dst(const struct sk_buff *skb)

    void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

    void skb_dst_drop(struct sk_buff *skb)
    This one should replace occurrences of :
    dst_release(skb->dst)
    skb->dst = NULL;

    Delete skb->dst field

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Jun, 2008

1 commit

  • When generating the ip header for the transformed packet we just copy
    the frag_off field of the ip header from the original packet to the ip
    header of the new generated packet. If we receive a packet as a chain
    of fragments, all but the last of the new generated packets have the
    IP_MF flag set. We have to mask the frag_off field to only keep the
    IP_DF flag from the original packet. This got lost with git commit
    36cf9acf93e8561d9faec24849e57688a81eb9c5 ("[IPSEC]: Separate
    inner/outer mode processing on output")

    Signed-off-by: Steffen Klassert
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Steffen Klassert
     

25 Mar, 2008

1 commit


29 Jan, 2008

4 commits

  • It appears that I've managed to create two different functions both
    called xfrm6_tunnel_output. This is because we have the plain tunnel
    encapsulation named xfrmX_tunnel as well as the tunnel-mode encapsulation
    which lives in the files xfrmX_mode_tunnel.c.

    This patch renames functions from the latter to use the xfrmX_mode_tunnel
    prefix to avoid name-space conflicts.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • With inter-family transforms the inner mode differs from the outer
    mode. Attempting to handle both sides from the same function means
    that it needs to handle both IPv4 and IPv6 which creates duplication
    and confusion.

    This patch separates the two parts on the input path so that each
    function deals with one family only.

    In particular, the functions xfrm4_extract_inut/xfrm6_extract_inut
    moves the pertinent fields from the IPv4/IPv6 IP headers into a
    neutral format stored in skb->cb. This is then used by the inner mode
    input functions to modify the inner IP header. In this way the input
    function no longer has to know about the outer address family.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • With inter-family transforms the inner mode differs from the outer
    mode. Attempting to handle both sides from the same function means
    that it needs to handle both IPv4 and IPv6 which creates duplication
    and confusion.

    This patch separates the two parts on the output path so that each
    function deals with one family only.

    In particular, the functions xfrm4_extract_output/xfrm6_extract_output
    moves the pertinent fields from the IPv4/IPv6 IP headers into a
    neutral format stored in skb->cb. This is then used by the outer mode
    output functions to write the outer IP header. In this way the output
    function no longer has to know about the inner address family.

    Since the extract functions are only called by tunnel modes (the only
    modes that can support inter-family transforms), I've also moved the
    xfrm*_tunnel_check_size calls into them. This allows the correct ICMP
    message to be sent as opposed to now where you might call icmp_send
    with an IPv6 packet and vice versa.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch changes the prototype of ipv4_copy_dscp and ipv6_copy_dscp so
    that they directly take the outer DSCP rather than the outer IP header.
    This will help us to unify the code for inter-family tunnels.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

18 Oct, 2007

1 commit

  • Currently BEET mode does not reinject the packet back into the stack
    like tunnel mode does. Since BEET should behave just like tunnel mode
    this is incorrect.

    This patch fixes this by introducing a flags field to xfrm_mode that
    tells the IPsec code whether it should terminate and reinject the packet
    back into the stack.

    It then sets the flag for BEET and tunnel mode.

    I've also added a number of missing BEET checks elsewhere where we check
    whether a given mode is a tunnel or not.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

11 Oct, 2007

3 commits

  • This patch moves the setting of the IP length and checksum fields out of
    the transforms and into the xfrmX_output functions. This would help future
    efforts in merging the transforms themselves.

    It also adds an optimisation to ipcomp due to the fact that the transport
    offset is guaranteed to be zero.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The IPv6 calling convention for x->mode->output is more general and could
    help an eventual protocol-generic x->type->output implementation. This
    patch adopts it for IPv4 as well and modifies the IPv4 type output functions
    accordingly.

    It also rewrites the IPv6 mac/transport header calculation to be based off
    the network header where practical.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch changes the calling convention so that on entry from
    x->mode->output and before entry into x->type->output skb->data
    will point to the payload instead of the IP header.

    This is essentially a redistribution of skb_push/skb_pull calls
    with the aim of minimising them on the common path of tunnel +
    ESP.

    It'll also let us use the same calling convention between IPv4
    and IPv6 with the next patch.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

31 May, 2007

1 commit


26 Apr, 2007

9 commits


27 Feb, 2007

1 commit


13 Feb, 2007

1 commit


09 Feb, 2007

1 commit


23 Sep, 2006

1 commit


22 Jul, 2006

1 commit