14 Sep, 2013

1 commit

  • [ Upstream commit f46078cfcd77fa5165bf849f5e568a7ac5fa569c ]

    It is not allowed for an ipv6 packet to contain multiple fragmentation
    headers. So discard packets which were already reassembled by
    fragmentation logic and send back a parameter problem icmp.

    The updates for RFC 6980 will come in later, I have to do a bit more
    research here.

    Cc: YOSHIFUJI Hideaki
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Hannes Frederic Sowa
     

23 Apr, 2013

1 commit

  • Conflicts:
    drivers/net/ethernet/emulex/benet/be_main.c
    drivers/net/ethernet/intel/igb/igb_main.c
    drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
    include/net/scm.h
    net/batman-adv/routing.c
    net/ipv4/tcp_input.c

    The e{uid,gid} --> {uid,gid} credentials fix conflicted with the
    cleanup in net-next to now pass cred structs around.

    The be2net driver had a bug fix in 'net' that overlapped with the VLAN
    interface changes by Patrick McHardy in net-next.

    An IGB conflict existed because in 'net' the build_skb() support was
    reverted, and in 'net-next' there was a comment style fix within that
    code.

    Several batman-adv conflicts were resolved by making sure that all
    calls to batadv_is_my_mac() are changed to have a new bat_priv first
    argument.

    Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO
    rewrite in 'net-next', mostly overlapping changes.

    Thanks to Stephen Rothwell and Antonio Quartulli for help with several
    of these merge resolutions.

    Signed-off-by: David S. Miller

    David S. Miller
     

17 Apr, 2013

1 commit

  • Commit 4a94445c9a5c (net: Use ip_route_input_noref() in input path)
    added a bug in IP defragmentation handling, as non refcounted
    dst could escape an RCU protected section.

    Commit 64f3b9e203bd068 (net: ip_expire() must revalidate route) fixed
    the case of timeouts, but not the general problem.

    Tom Parkin noticed crashes in UDP stack and provided a patch,
    but further analysis permitted us to pinpoint the root cause.

    Before queueing a packet into a frag list, we must drop its dst,
    as this dst has limited lifetime (RCU protected)

    When/if a packet is finally reassembled, we use the dst of the very
    last skb, still protected by RCU and valid, as the dst of the
    reassembled packet.

    Use same logic in IPv6, as there is no need to hold dst references.

    Reported-by: Tom Parkin
    Tested-by: Tom Parkin
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Mar, 2013

1 commit

  • Hello!

    After patch 1 got accepted to net-next I will also send a patch to
    netfilter-devel to make the corresponding changes to the netfilter
    reassembly logic.

    Thanks,

    Hannes

    -- >8 --
    [PATCH 2/2] ipv6: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling

    This patch also ensures that INET_ECN_CE is propagated if one fragment
    had the codepoint set.

    Cc: Eric Dumazet
    Cc: Jesper Dangaard Brouer
    Cc: YOSHIFUJI Hideaki
    Signed-off-by: Hannes Frederic Sowa
    Acked-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

19 Mar, 2013

1 commit

  • This patch introduces a constant limit of the fragment queue hash
    table bucket list lengths. Currently the limit 128 is choosen somewhat
    arbitrary and just ensures that we can fill up the fragment cache with
    empty packets up to the default ip_frag_high_thresh limits. It should
    just protect from list iteration eating considerable amounts of cpu.

    If we reach the maximum length in one hash bucket a warning is printed.
    This is implemented on the caller side of inet_frag_find to distinguish
    between the different users of inet_fragment.c.

    I dropped the out of memory warning in the ipv4 fragment lookup path,
    because we already get a warning by the slab allocator.

    Cc: Eric Dumazet
    Cc: Jesper Dangaard Brouer
    Signed-off-by: Hannes Frederic Sowa
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

19 Feb, 2013

2 commits

  • net/ipv6/reassembly.c:82:72: warning: incorrect type in argument 3 (different base types)
    net/ipv6/reassembly.c:82:72: expected unsigned int [unsigned] [usertype] c
    net/ipv6/reassembly.c:82:72: got restricted __be32 [usertype] id

    Signed-off-by: Eric Dumazet
    Reported-by: Fengguang Wu
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Use ipv6_addr_hash() and a single jhash invocation.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Feb, 2013

1 commit


30 Jan, 2013

2 commits

  • Updating the fragmentation queues LRU (Least-Recently-Used) list,
    required taking the hash writer lock. However, the LRU list isn't
    tied to the hash at all, so we can use a separate lock for it.

    Original-idea-by: Florian Westphal
    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • This change is primarily a preparation to ease the extension of memory
    limit tracking.

    The change does reduce the number atomic operation, during freeing of
    a frag queue. This does introduce a some performance improvement, as
    these atomic operations are at the core of the performance problems
    seen on NUMA systems.

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

19 Nov, 2012

1 commit

  • In preparation for supporting the creation of network namespaces
    by unprivileged users, modify all of the per net sysctl exports
    and refuse to allow them to unprivileged users.

    This makes it safe for unprivileged users in general to access
    per net sysctls, and allows sysctls to be exported to unprivileged
    users on an individual basis as they are deemed safe.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

20 Sep, 2012

3 commits

  • Cc: Herbert Xu
    Cc: Michal Kubeček
    Cc: David Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Amerigo Wang
     
  • Cc: Herbert Xu
    Cc: Michal Kubeček
    Cc: David Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Amerigo Wang
     
  • Two years ago, Shan Wei tried to fix this:
    http://patchwork.ozlabs.org/patch/43905/

    The problem is that RFC2460 requires an ICMP Time
    Exceeded -- Fragment Reassembly Time Exceeded message should be
    sent to the source of that fragment, if the defragmentation
    times out.

    "
    If insufficient fragments are received to complete reassembly of a
    packet within 60 seconds of the reception of the first-arriving
    fragment of that packet, reassembly of that packet must be
    abandoned and all the fragments that have been received for that
    packet must be discarded. If the first fragment (i.e., the one
    with a Fragment Offset of zero) has been received, an ICMP Time
    Exceeded -- Fragment Reassembly Time Exceeded message should be
    sent to the source of that fragment.
    "

    As Herbert suggested, we could actually use the standard IPv6
    reassembly code which follows RFC2460.

    With this patch applied, I can see ICMP Time Exceeded sent
    from the receiver when the sender sent out 3/4 fragmented
    IPv6 UDP packet.

    Cc: Herbert Xu
    Cc: Michal Kubeček
    Cc: David Miller
    Cc: Hideaki YOSHIFUJI
    Cc: Patrick McHardy
    Cc: Pablo Neira Ayuso
    Cc: netfilter-devel@vger.kernel.org
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Amerigo Wang
     

20 May, 2012

1 commit

  • ip6_frag_reasm() can use skb_try_coalesce() to build optimized skb,
    reducing memory used by them (truesize), and reducing number of cache
    line misses and overhead for the consumer.

    Signed-off-by: Eric Dumazet
    Cc: Alexander Duyck
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 May, 2012

1 commit


16 May, 2012

1 commit


26 Apr, 2012

1 commit


21 Apr, 2012

2 commits

  • This results in code with less boiler plate that is a bit easier
    to read.

    Additionally stops us from using compatibility code in the sysctl
    core, hastening the day when the compatibility code can be removed.

    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • register_sysctl_rotable never caught on as an interesting way to
    register sysctls. My take on the situation is that what we want are
    sysctls that we can only see in the initial network namespace. What we
    have implemented with register_sysctl_rotable are sysctls that we can
    see in all of the network namespaces and can only change in the initial
    network namespace.

    That is a very silly way to go. Just register the network sysctls
    in the initial network namespace and we don't have any weird special
    cases to deal with.

    The sysctls affected are:
    /proc/sys/net/ipv4/ipfrag_secret_interval
    /proc/sys/net/ipv4/ipfrag_max_dist
    /proc/sys/net/ipv6/ip6frag_secret_interval
    /proc/sys/net/ipv6/mld_max_msf

    I really don't expect anyone will miss them if they can't read them in a
    child user namespace.

    CC: Pavel Emelyanov
    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

31 Jan, 2012

1 commit

  • RFC5722 Section 4 was amended by Errata 3089

    Our implementation did the right thing anyway...

    Signed-off-by: Eric Dumazet
    Cc: Nicolas Dichtel
    Acked-by: Nicolas Dichtel
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Nov, 2011

1 commit


01 Nov, 2011

1 commit


19 Oct, 2011

1 commit

  • To ease skb->truesize sanitization, its better to be able to localize
    all references to skb frags size.

    Define accessors : skb_frag_size() to fetch frag size, and
    skb_frag_size_{set|add|sub}() to manipulate it.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Apr, 2011

1 commit


29 Nov, 2010

1 commit

  • jhash is widely used in the kernel and because the functions
    are inlined, the cost in size is significant. Also, the new jhash
    functions are slightly larger than the previous ones so better un-inline.
    As a preparation step, the calls to the internal macros are replaced
    with the plain jhash function calls.

    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: David S. Miller

    Jozsef Kadlecsik
     

09 Nov, 2010

1 commit

  • The type of FRAG6_CB(prev)->offset is int, skb->len is *unsigned* int,
    and offset is int.

    Without this patch, type conversion occurred to this expression, when
    (FRAG6_CB(prev)->offset + prev->len) is less than offset.

    Signed-off-by: Shan Wei
    Signed-off-by: David S. Miller

    Shan Wei
     

10 Sep, 2010

1 commit


08 Sep, 2010

1 commit


23 Aug, 2010

1 commit

  • SKBs can be "fragmented" in two ways, via a page array (called
    skb_shinfo(skb)->frags[]) and via a list of SKBs (called
    skb_shinfo(skb)->frag_list).

    Since skb_has_frags() tests the latter, it's name is confusing
    since it sounds more like it's testing the former.

    Signed-off-by: David S. Miller

    David S. Miller
     

01 Jul, 2010

1 commit

  • add fast path for in-order fragments

    As the fragments are sent in order in most of OSes, such as Windows, Darwin and
    FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
    In the fast path, we check if the skb at the end of the inet_frag_queue is the
    prev we expect.

    Signed-off-by: Changli Gao
    ----
    include/net/inet_frag.h | 1 +
    net/ipv4/ip_fragment.c | 12 ++++++++++++
    net/ipv6/reassembly.c | 11 +++++++++++
    3 files changed, 24 insertions(+)
    Signed-off-by: David S. Miller

    Changli Gao
     

16 Jun, 2010

2 commits


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

19 Feb, 2010

1 commit


17 Feb, 2010

1 commit


16 Feb, 2010

1 commit

  • When no more memory can be allocated, fq_find() will return NULL and
    increase the value of IPSTATS_MIB_REASMFAILS. In this case,
    ipv6_frag_rcv() also increase the value of IPSTATS_MIB_REASMFAILS.

    So, the patch deletes redundant counter of IPSTATS_MIB_REASMFAILS in fq_find().
    and deletes the unused parameter of idev.

    Signed-off-by: Shan Wei
    Signed-off-by: David S. Miller

    Shan Wei
     

10 Feb, 2010

1 commit


20 Jan, 2010

1 commit


18 Jan, 2010

1 commit