22 Mar, 2014

12 commits

  • With this patch a node may additionally perform the dropping or
    unicasting behaviour for a link-local IPv4 and link-local-all-nodes
    IPv6 multicast packet, too.

    The extra counter and BATADV_MCAST_WANT_ALL_UNSNOOPABLES flag is needed
    because with a future bridge snooping support integration a node with a
    bridge on top of its soft interface is not able to reliably detect its
    multicast listeners for IPv4 link-local and the IPv6
    link-local-all-nodes addresses anymore (see RFC4541, section 2.1.2.2
    and section 3).

    Even though this new flag does make "no difference" now, it'll ensure
    a seamless integration of multicast bridge support without needing to
    break compatibility later.

    Also note, that even with multicast bridge support it won't be possible
    to optimize 224.0.0.x and ff02::1 towards nodes with bridges, they will
    always receive these ranges.

    Signed-off-by: Linus Lüssing
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Linus Lüssing
     
  • With this patch a multicast packet is not always simply flooded anymore,
    the behaviour for the following cases is changed to reduce
    unnecessary overhead:

    If all nodes within the horizon of a certain node have signalized
    multicast listener announcement capability then an IPv6 multicast packet
    with a destination of IPv6 link-local scope (excluding ff02::1) coming
    from the upstream of this node...

    * ...is dropped if there is no according multicast listener in the
    translation table,
    * ...is forwarded via unicast if there is a single node with interested
    multicast listeners
    * ...and otherwise still gets flooded.

    Signed-off-by: Linus Lüssing
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Linus Lüssing
     
  • If the soft interface of a node is not part of a bridge then a node
    announces a new multicast TVLV: The existence of this TVLV
    signalizes that this node is announcing all of its multicast listeners
    via the translation table infrastructure.

    Signed-off-by: Linus Lüssing
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Linus Lüssing
     
  • The new bitfield allows us to keep track whether capability subsets of
    an originator have gone through their initialization phase yet.

    The translation table is the only user right now, but a new one will be
    added soon.

    Signed-off-by: Linus Lüssing
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Linus Lüssing
     
  • With this patch a node which has no bridge interface on top of its soft
    interface announces its local multicast listeners via the translation
    table.

    Signed-off-by: Linus Lüssing
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Linus Lüssing
     
  • Some helper functions used along the TX path have now a new
    "dst_hint" argument but the kerneldoc was missing.

    Signed-off-by: Antonio Quartulli
    Signed-off-by: Marek Lindner

    Antonio Quartulli
     
  • Reported-by: Sven Eckelmann
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Marek Lindner
     
  • Reported-by: Antonio Quartulli
    Signed-off-by: Simon Wunderlich
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Simon Wunderlich
     
  • On some architectures ether_addr_copy() is slightly faster
    than memcpy() therefore use the former when possible.

    Signed-off-by: Antonio Quartulli
    Signed-off-by: Marek Lindner

    Antonio Quartulli
     
  • Our .ndo_start_xmit handler (batadv_interface_tx()) can rely on having
    the skb mac header pointer set correctly since the following commit
    present in kernels >= 3.9:

    "net: reset mac header in dev_start_xmit()" (6d1ccff627)

    Therefore this commit removes the according, now redundant,
    skb_reset_mac_header() call in batadv_bla_tx().

    Signed-off-by: Linus Lüssing
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Linus Lüssing
     
  • Our .ndo_start_xmit handler (batadv_interface_tx()) can rely on having
    the skb mac header pointer set correctly since the following commit
    present in kernels >= 3.9:

    "net: reset mac header in dev_start_xmit()" (6d1ccff627)

    Therefore we can safely use eth_hdr() and vlan_eth_hdr() instead of
    skb->data now, which spares us some ugly type casts.

    At the same time set the mac_header in batadv_dat_snoop_incoming_arp_request()
    before sending the skb along the TX path.

    Signed-off-by: Linus Lüssing
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Linus Lüssing
     
  • net/batman-adv/network-coding.c:1535:1-7: Replace memcpy with struct assignment

    Generated by: coccinelle/misc/memcpy-assign.cocci
    Signed-off-by: Fengguang Wu
    Signed-off-by: Martin Hundebøll
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Fengguang Wu
     

21 Mar, 2014

3 commits


20 Mar, 2014

1 commit

  • Commit f9c41a62bba3f3f7ef3541b2a025e3371bcbba97 introduced
    a problem for SOCK_STREAM sockets, when only part of the
    incoming iucv message is received by user space. In this
    case the remaining data of the iucv message is lost.
    This patch makes sure an incompletely received iucv message
    is queued back to the receive queue.

    Signed-off-by: Ursula Braun
    Signed-off-by: Frank Blaschka
    Reported-by: Hendrik Brueckner
    Signed-off-by: David S. Miller

    Ursula Braun
     

19 Mar, 2014

4 commits

  • Reported-by: Stephen Rothwell
    Signed-off-by: David S. Miller

    David S. Miller
     
  • ieee802154 sockets do not properly unshare received skbs, which leads to
    panics (at least) when they are used in conjunction with 6lowpan, so
    run skb_share_check on received skbs.
    6lowpan also contains a use-after-free, which is trivially fixed by
    replacing the inlined skb_share_check with the explicit call.

    Signed-off-by: Phoebe Buckheister
    Tested-by: Alexander Aring
    Signed-off-by: David S. Miller

    Phoebe Buckheister
     
  • In commit b4e9b520ca5d ("[NET_SCHED]: Add mask support to fwmark
    classifier") Patrick added an u32 field in fw_head, making it slightly
    bigger than one page.

    Lets use 256 slots to make fw_hash() more straight forward, and move
    @mask to the beginning of the structure as we often use a small number
    of skb->mark. @mask and first hash buckets share the same cache line.

    This brings back the memory usage to less than 4000 bytes, and permits
    John to add a rcu_head at the end of the structure later without any
    worry.

    Signed-off-by: Eric Dumazet
    Cc: Thomas Graf
    Cc: John Fastabend
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Steffen Klassert says:

    ====================
    One patch to rename a newly introduced struct. The rest is
    the rework of the IPsec virtual tunnel interface for ipv6 to
    support inter address family tunneling and namespace crossing.

    1) Rename the newly introduced struct xfrm_filter to avoid a
    conflict with iproute2. From Nicolas Dichtel.

    2) Introduce xfrm_input_afinfo to access the address family
    dependent tunnel callback functions properly.

    3) Add and use a IPsec protocol multiplexer for ipv6.

    4) Remove dst_entry caching. vti can lookup multiple different
    dst entries, dependent of the configured xfrm states. Therefore
    it does not make to cache a dst_entry.

    5) Remove caching of flow informations. vti6 does not use the the
    tunnel endpoint addresses to do route and xfrm lookups.

    6) Update the vti6 to use its own receive hook.

    7) Remove the now unused xfrm_tunnel_notifier. This was used from vti
    and is replaced by the IPsec protocol multiplexer hooks.

    8) Support inter address family tunneling for vti6.

    9) Check if the tunnel endpoints of the xfrm state and the vti interface
    are matching and return an error otherwise.

    10) Enable namespace crossing for vti devices.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Mar, 2014

12 commits

  • ARRAY_SIZE(nf_conntrack_locks) is undefined if spinlock_t is an
    empty structure. Replace it by CONNTRACK_LOCKS

    Fixes: 93bb0ceb75be ("netfilter: conntrack: remove central spinlock nf_conntrack_lock")
    Reported-by: kbuild test robot
    Signed-off-by: Eric Dumazet
    Cc: Jesper Dangaard Brouer
    Cc: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • The netpoll packet receive code only becomes active if the netpoll
    rx_skb_hook is implemented, and there is not a single implementation
    of the netpoll rx_skb_hook in the kernel.

    All of the out of tree implementations I have found all call
    netpoll_poll which was removed from the kernel in 2011, so this
    change should not add any additional breakage.

    There are problems with the netpoll packet receive code. __netpoll_rx
    does not call dev_kfree_skb_irq or dev_kfree_skb_any in hard irq
    context. netpoll_neigh_reply leaks every skb it receives. Reception
    of packets does not work successfully on stacked devices (aka bonding,
    team, bridge, and vlans).

    Given that the netpoll packet receive code is buggy, there are no
    out of tree users that will be merged soon, and the code has
    not been used for in tree for a decade let's just remove it.

    Reverting this commit can server as a starting point for anyone
    who wants to resurrect netpoll packet reception support.

    Acked-by: Eric Dumazet
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Make rx_skb_hook, and rx in struct netpoll depend on
    CONFIG_NETPOLL_TRAP Make rx_lock, rx_np, and neigh_tx in struct
    netpoll_info depend on CONFIG_NETPOLL_TRAP

    Make the functions netpoll_rx_on, netpoll_rx, and netpoll_receive_skb
    no-ops when CONFIG_NETPOLL_TRAP is not set.

    Only build netpoll_neigh_reply, checksum_udp service_neigh_queue,
    pkt_is_ns, and __netpoll_rx when CONFIG_NETPOLL_TRAP is defined.

    Add helper functions netpoll_trap_setup, netpoll_trap_setup_info,
    netpoll_trap_cleanup, and netpoll_trap_cleanup_info that initialize
    and cleanup the struct netpoll and struct netpoll_info receive
    specific fields when CONFIG_NETPOLL_TRAP is enabled and do nothing
    otherwise.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Move the bond slave device neigh_tx handling into service_neigh_queue.

    In connection with neigh_tx processing remove unnecessary tests of
    a NULL netpoll_info. As the netpoll_poll_dev has already used
    and thus verified the existince of the netpoll_info.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Now that we no longer need to receive packets to safely drain the
    network drivers receive queue move netpoll_trap and netpoll_set_trap
    under CONFIG_NETPOLL_TRAP

    Making netpoll_trap and netpoll_set_trap noop inline functions
    when CONFIG_NETPOLL_TRAP is not set.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Change the strategy of netpoll from dropping all packets received
    during netpoll_poll_dev to calling napi poll with a budget of 0
    (to avoid processing drivers rx queue), and to ignore packets received
    with netif_rx (those will safely be placed on the backlog queue).

    All of the netpoll supporting drivers have been reviewed to ensure
    either thay use netif_rx or that a budget of 0 is supported by their
    napi poll routine and that a budget of 0 will not process the drivers
    rx queues.

    Not dropping packets makes NETPOLL_RX_DROP unnecesary so it is removed.

    npinfo->rx_flags is removed as rx_flags with just the NETPOLL_RX_ENABLED
    flag becomes just a redundant mirror of list_empty(&npinfo->rx_np).

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Add a helper netpoll_rx_processing that reports when netpoll has
    receive side processing to perform.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • There is already a warning for this case in the normal netpoll path,
    but put a copy here in case how netpoll calls the poll functions
    causes a differenet result.

    netpoll will shortly call the napi poll routine with a budget 0 to
    avoid any rx packets being processed. As nothing does that today
    we may encounter drivers that have problems so a netpoll specific
    warning seems desirable.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • In poll_napi loop through all of the napi handlers even when the
    budget falls to 0 to ensure that we process all of the tx_queues, and
    so that we continue to call into drivers when our initial budget is 0.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • This moves the control logic to the top level in netpoll_poll_dev
    instead of having it dispersed throughout netpoll_poll_dev,
    poll_napi and poll_one_napi.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Today netpoll depends on setting NETPOLL_RX_DROP before networking
    drivers receive packets in interrupt context so that the packets can
    be dropped. Move this setting into netpoll_poll_dev from
    poll_one_napi so that if ndo_poll_controller happens to receive
    packets we will drop the packets on the floor instead of letting the
    packets bounce through the networking stack and potentially cause problems.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains Netfilter/IPVS updates for net-next,
    most relevantly they are:

    * cleanup to remove double semicolon from stephen hemminger.

    * calm down sparse warning in xt_ipcomp, from Fan Du.

    * nf_ct_labels support for nf_tables, from Florian Westphal.

    * new macros to simplify rcu dereferences in the scope of nfnetlink
    and nf_tables, from Patrick McHardy.

    * Accept queue and drop (including reason for drop) to verdict
    parsing in nf_tables, also from Patrick.

    * Remove unused random seed initialization in nfnetlink_log, from
    Florian Westphal.

    * Allow to attach user-specific information to nf_tables rules, useful
    to attach user comments to rule, from me.

    * Return errors in ipset according to the manpage documentation, from
    Jozsef Kadlecsik.

    * Fix coccinelle warnings related to incorrect bool type usage for ipset,
    from Fengguang Wu.

    * Add hash:ip,mark set type to ipset, from Vytas Dauksa.

    * Fix message for each spotted by ipset for each netns that is created,
    from Ilia Mirkin.

    * Add forceadd option to ipset, which evicts a random entry from the set
    if it becomes full, from Josh Hunt.

    * Minor IPVS cleanups and fixes from Andi Kleen and Tingwei Liu.

    * Improve conntrack scalability by removing a central spinlock, original
    work from Eric Dumazet. Jesper Dangaard Brouer took them over to address
    remaining issues. Several patches to prepare this change come in first
    place.

    * Rework nft_hash to resolve bugs (leaking chain, missing rcu synchronization
    on element removal, etc. from Patrick McHardy.

    * Restore context in the rule deletion path, as we now release rule objects
    synchronously, from Patrick McHardy. This gets back event notification for
    anonymous sets.

    * Fix NAT family validation in nft_nat, also from Patrick.

    * Improve scalability of xt_connlimit by using an array of spinlocks and
    by introducing a rb-tree of hashtables for faster lookup of accounted
    objects per network. This patch was preceded by several patches and
    refactorizations to accomodate this change including the use of kmem_cache,
    from Florian Westphal.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

17 Mar, 2014

3 commits

  • With current match design every invocation of the connlimit_match
    function means we have to perform (number_of_conntracks % 256) lookups
    in the conntrack table [ to perform GC/delete stale entries ].
    This is also the reason why ____nf_conntrack_find() in perf top has
    > 20% cpu time per core.

    This patch changes the storage to rbtree which cuts down the number of
    ct objects that need testing.

    When looking up a new tuple, we only test the connections of the host
    objects we visit while searching for the wanted host/network (or
    the leaf we need to insert at).

    The slot count is reduced to 32. Increasing slot count doesn't
    speed up things much because of rbtree nature.

    before patch (50kpps rx, 10kpps tx):
    + 20.95% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find
    + 20.50% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find
    + 20.27% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find
    + 5.76% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw
    + 5.39% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw
    + 5.35% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw

    after (90kpps, 51kpps tx):
    + 17.24% swapper [nf_conntrack] [k] ____nf_conntrack_find
    + 6.60% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find
    + 2.73% swapper [nf_conntrack] [k] hash_conntrack_raw
    + 2.36% swapper [xt_connlimit] [k] count_tree

    Obvious disadvantages to previous version are the increase in code
    complexity and the increased memory cost.

    Partially based on Eric Dumazets fq scheduler.

    Reviewed-by: Jesper Dangaard Brouer
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • currently returns 1 if they're the same. Make it work like mem/strcmp
    so it can be used as rbtree search function.

    Reviewed-by: Jesper Dangaard Brouer
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • connlimit currently suffers from spinlock contention, example for
    4-core system with rps enabled:

    + 20.84% ksoftirqd/2 [kernel.kallsyms] [k] _raw_spin_lock_bh
    + 20.76% ksoftirqd/1 [kernel.kallsyms] [k] _raw_spin_lock_bh
    + 20.42% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_lock_bh
    + 6.07% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find
    + 6.07% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find
    + 5.97% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find
    + 2.47% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw
    + 2.45% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw
    + 2.44% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw

    May allow parallel lookup/insert/delete if the entry is hashed to
    another slot. With patch:

    + 20.95% ksoftirqd/0 [nf_conntrack] [k] ____nf_conntrack_find
    + 20.50% ksoftirqd/1 [nf_conntrack] [k] ____nf_conntrack_find
    + 20.27% ksoftirqd/2 [nf_conntrack] [k] ____nf_conntrack_find
    + 5.76% ksoftirqd/1 [nf_conntrack] [k] hash_conntrack_raw
    + 5.39% ksoftirqd/2 [nf_conntrack] [k] hash_conntrack_raw
    + 5.35% ksoftirqd/0 [nf_conntrack] [k] hash_conntrack_raw
    + 2.00% ksoftirqd/1 [kernel.kallsyms] [k] __rcu_read_unlock

    Improved rx processing rate from ~35kpps to ~50 kpps.

    Reviewed-by: Jesper Dangaard Brouer
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

15 Mar, 2014

5 commits