02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

17 Jul, 2017

1 commit

  • As discussed in Faro during Netfilter Workshop 2017, RB trees can be
    used with RCU, using a seqlock.

    Note that net/rxrpc/conn_service.c is already using this.

    This patch converts inetpeer from AVL tree to RB tree, since it allows
    to remove private AVL implementation in favor of shared RB code.

    $ size net/ipv4/inetpeer.before net/ipv4/inetpeer.after
    text data bss dec hex filename
    3195 40 128 3363 d23 net/ipv4/inetpeer.before
    1562 24 0 1586 632 net/ipv4/inetpeer.after

    The same technique can be used to speed up
    net/netfilter/nft_set_rbtree.c (removing rwlock contention in fast path)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Jul, 2017

1 commit

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.
    This conversion requires overall +1 on the whole
    refcounting scheme.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     

16 Dec, 2015

1 commit

  • David Ahern added a vif field in the a4 part of inetpeer_addr struct.

    This broke IPv4 TCP fast open client side and more generally tcp metrics
    cache, because inetpeer_addr_cmp() is now comparing two u32 instead of
    one.

    inetpeer_set_addr_v4() needs to properly init vif field, otherwise
    the comparison result depends on uninitialized data.

    Fixes: 192132b9a034 ("net: Add support for VRFs to inetpeer cache")
    Reported-by: Yuchung Cheng
    Signed-off-by: Eric Dumazet
    Cc: Neal Cardwell
    Signed-off-by: David S. Miller

    Eric Dumazet
     

29 Aug, 2015

4 commits


26 Aug, 2015

1 commit


01 Apr, 2015

1 commit

  • In many places, the a6 field is typecasted to struct in6_addr. As the
    fields are in union anyway, just add in6_addr type to the union and get rid
    of the typecasting.

    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc
     

09 Sep, 2014

1 commit

  • inetpeer sequence numbers are no longer incremented, so no need to
    check and flush the tree. The function that increments the sequence
    number was already dead code and removed in in "ipv4: remove unused
    function" (068a6e18). Remove the code that checks for a change, too.

    Verifying that v4_seq and v6_seq are never incremented and thus that
    flush_check compares bp->flush_seq to 0 is trivial.

    The second part of the change removes flush_check completely even
    though bp->flush_seq is exactly !0 once, at initialization. This
    change is correct because the time this branch is true is when
    bp->root == peer_avl_empty_rcu, in which the branch and
    inetpeer_invalidate_tree are a NOOP.

    Signed-off-by: Willem de Bruijn
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

04 Jun, 2014

1 commit


03 Jun, 2014

2 commits

  • I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery
    is disabled.
    Note how GSO/TSO packets do not have monotonically incrementing ID.

    06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396)
    06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212)
    06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972)
    06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292)
    06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764)

    It appears I introduced this bug in linux-3.1.

    inet_getid() must return the old value of peer->ip_id_count,
    not the new one.

    Lets revert this part, and remove the prevention of
    a null identification field in IPv6 Fragment Extension Header,
    which is dubious and not even done properly.

    Fixes: 87c48fa3b463 ("ipv6: make fragment identifications less predictable")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Ideally, we would need to generate IP ID using a per destination IP
    generator.

    linux kernels used inet_peer cache for this purpose, but this had a huge
    cost on servers disabling MTU discovery.

    1) each inet_peer struct consumes 192 bytes

    2) inetpeer cache uses a binary tree of inet_peer structs,
    with a nominal size of ~66000 elements under load.

    3) lookups in this tree are hitting a lot of cache lines, as tree depth
    is about 20.

    4) If server deals with many tcp flows, we have a high probability of
    not finding the inet_peer, allocating a fresh one, inserting it in
    the tree with same initial ip_id_count, (cf secure_ip_id())

    5) We garbage collect inet_peer aggressively.

    IP ID generation do not have to be 'perfect'

    Goal is trying to avoid duplicates in a short period of time,
    so that reassembly units have a chance to complete reassembly of
    fragments belonging to one message before receiving other fragments
    with a recycled ID.

    We simply use an array of generators, and a Jenkin hash using the dst IP
    as a key.

    ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
    belongs (it is only used from this file)

    secure_ip_id() and secure_ipv6_id() no longer are needed.

    Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
    unnecessary decrement/increment of the number of segments.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

29 Dec, 2013

1 commit


22 Sep, 2013

1 commit

  • There are a mix of function prototypes with and without extern
    in the kernel sources. Standardize on not using extern for
    function prototypes.

    Function prototypes don't need to be written with extern.
    extern is assumed by the compiler. Its use is as unnecessary as
    using auto to declare automatic/local variables in a block.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

11 Jul, 2012

2 commits


11 Jun, 2012

3 commits

  • We handle NULL in rt{,6}_set_peer but then our caller will try to pass
    that NULL pointer into inet_putpeer() which isn't ready for it.

    Fix this by moving the NULL check one level up, and then remove the
    now unnecessary NULL check from inetpeer_ptr_set_peer().

    Reported-by: Eric Dumazet
    Signed-off-by: David S. Miller

    David S. Miller
     
  • This implementation can deal with having many inetpeer roots, which is
    a necessary prerequisite for per-FIB table rooted peer tables.

    Each family (AF_INET, AF_INET6) has a sequence number which we bump
    when we get a family invalidation request.

    Each peer lookup cheaply checks whether the flush sequence of the
    root we are using is out of date, and if so flushes it and updates
    the sequence number.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • We encode the pointer(s) into an unsigned long with one state bit.

    The state bit is used so we can store the inetpeer tree root to use
    when resolving the peer later.

    Later the peer roots will be per-FIB table, and this change works to
    facilitate that.

    Signed-off-by: David S. Miller

    David S. Miller
     

10 Jun, 2012

3 commits


09 Jun, 2012

2 commits

  • add struct net as a parameter of inet_getpeer_v[4,6],
    use net to replace &init_net.

    and modify some places to provide net for inet_getpeer_v[4,6]

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     
  • now inetpeer doesn't support namespace,the information will
    be leaking across namespace.

    this patch move the global vars v4_peers and v6_peers to
    netns_ipv4 and netns_ipv6 as a field peers.

    add struct pernet_operations inetpeer_ops to initial pernet
    inetpeer data.

    and change family_to_base and inet_getpeer to support namespace.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     

07 Jun, 2012

1 commit

  • commit 5faa5df1fa2024 (inetpeer: Invalidate the inetpeer tree along with
    the routing cache) added a race :

    Before freeing an inetpeer, we must respect a RCU grace period, and make
    sure no user will attempt to increase refcnt.

    inetpeer_invalidate_tree() waits for a RCU grace period before inserting
    inetpeer tree into gc_list and waking the worker. At that time, no
    concurrent lookup can find a inetpeer in this tree.

    Signed-off-by: Eric Dumazet
    Cc: Steffen Klassert
    Acked-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Mar, 2012

2 commits

  • As we invalidate the inetpeer tree along with the routing cache now,
    we don't need a genid to reset the redirect handling when the routing
    cache is flushed.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • We initialize the routing metrics with the values cached on the
    inetpeer in rt_init_metrics(). So if we have the metrics cached on the
    inetpeer, we ignore the user configured fib_metrics.

    To fix this issue, we replace the old tree with a fresh initialized
    inet_peer_base. The old tree is removed later with a delayed work queue.

    Signed-off-by: Steffen Klassert
    Signed-off-by: David S. Miller

    Steffen Klassert
     

03 Dec, 2011

1 commit


27 Nov, 2011

1 commit

  • Now inetpeer is the place where we cache redirect information for ipv4
    destinations, we must be able to invalidate informations when a route is
    added/removed on host.

    As inetpeer is not yet namespace aware, this patch adds a shared
    redirect_genid, and a per inetpeer redirect_genid. This might be changed
    later if inetpeer becomes ns aware.

    Cache information for one inerpeer is valid as long as its
    redirect_genid has the same value than global redirect_genid.

    Reported-by: Arkadiusz Miśkiewicz
    Tested-by: Arkadiusz Miśkiewicz
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Nov, 2011

1 commit


27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

22 Jul, 2011

1 commit

  • IPv6 fragment identification generation is way beyond what we use for
    IPv4 : It uses a single generator. Its not scalable and allows DOS
    attacks.

    Now inetpeer is IPv6 aware, we can use it to provide a more secure and
    scalable frag ident generator (per destination, instead of system wide)

    This patch :
    1) defines a new secure_ipv6_id() helper
    2) extends inet_getid() to provide 32bit results
    3) extends ipv6_select_ident() with a new dest parameter

    Reported-by: Fernando Gont
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

09 Jun, 2011

2 commits

  • Profiles show false sharing in addr_compare() because refcnt/dtime
    changes dirty the first inet_peer cache line, where are lying the keys
    used at lookup time. If many cpus are calling inet_getpeer() and
    inet_putpeer(), or need frag ids, addr_compare() is in 2nd position in
    "perf top".

    Before patch, my udpflood bench (16 threads) on my 2x4x2 machine :

    5784.00 9.7% csum_partial_copy_generic [kernel]
    3356.00 5.6% addr_compare [kernel]
    2638.00 4.4% fib_table_lookup [kernel]
    2625.00 4.4% ip_fragment [kernel]
    1934.00 3.2% neigh_lookup [kernel]
    1617.00 2.7% udp_sendmsg [kernel]
    1608.00 2.7% __ip_route_output_key [kernel]
    1480.00 2.5% __ip_append_data [kernel]
    1396.00 2.3% kfree [kernel]
    1195.00 2.0% kmem_cache_free [kernel]
    1157.00 1.9% inet_getpeer [kernel]
    1121.00 1.9% neigh_resolve_output [kernel]
    1012.00 1.7% dev_queue_xmit [kernel]
    # time ./udpflood.sh

    real 0m44.511s
    user 0m20.020s
    sys 11m22.780s

    # time ./udpflood.sh

    real 0m44.099s
    user 0m20.140s
    sys 11m15.870s

    After patch, no more addr_compare() in profiles :

    4171.00 10.7% csum_partial_copy_generic [kernel]
    1787.00 4.6% fib_table_lookup [kernel]
    1756.00 4.5% ip_fragment [kernel]
    1234.00 3.2% udp_sendmsg [kernel]
    1191.00 3.0% neigh_lookup [kernel]
    1118.00 2.9% __ip_append_data [kernel]
    1022.00 2.6% kfree [kernel]
    993.00 2.5% __ip_route_output_key [kernel]
    841.00 2.2% neigh_resolve_output [kernel]
    816.00 2.1% kmem_cache_free [kernel]
    658.00 1.7% ia32_sysenter_target [kernel]
    632.00 1.6% kmem_cache_alloc_node [kernel]

    # time ./udpflood.sh

    real 0m41.587s
    user 0m19.190s
    sys 10m36.370s

    # time ./udpflood.sh

    real 0m41.486s
    user 0m19.290s
    sys 10m33.650s

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Andi Kleen and Tim Chen reported huge contention on inetpeer
    unused_peers.lock, on memcached workload on a 40 core machine, with
    disabled route cache.

    It appears we constantly flip peers refcnt between 0 and 1 values, and
    we must insert/remove peers from unused_peers.list, holding a contended
    spinlock.

    Remove this list completely and perform a garbage collection on-the-fly,
    at lookup time, using the expired nodes we met during the tree
    traversal.

    This removes a lot of code, makes locking more standard, and obsoletes
    two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also
    removes two pointers in inet_peer structure.

    There is still a false sharing effect because refcnt is in first cache
    line of object [were the links and keys used by lookups are located], we
    might move it at the end of inet_peer structure to let this first cache
    line mostly read by cpus.

    Signed-off-by: Eric Dumazet
    CC: Andi Kleen
    CC: Tim Chen
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Apr, 2011

1 commit


11 Feb, 2011

2 commits

  • Validity of the cached PMTU information is indicated by it's
    expiration value being non-zero, just as per dst->expires.

    The scheme we will use is that we will remember the pre-ICMP value
    held in the metrics or route entry, and then at expiration time
    we will restore that value.

    In this way PMTU expiration does not kill off the cached route as is
    done currently.

    Redirect information is permanent, or at least until another redirect
    is received.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Future changes will add caching information, and some of
    these new elements will be addresses.

    Since the family is implicit via the ->daddr.family member,
    replicating the family in ever address we store is entirely
    redundant.

    Signed-off-by: David S. Miller

    David S. Miller
     

05 Feb, 2011

1 commit