12 Aug, 2015

1 commit

  • The gw_factor is divided by BATADV_TQ_LOCAL_WINDOW_SIZE ** 2 * 64. But the
    rest of the calculation has nothing to do with the tq window size and
    therefore the calculation is just (tmp_gw_factor / (64 ** 3)).

    Replace it with a simple shift to avoid a costly 64-bit divide when the
    max_gw_factor is changed from u32 to u64. This type change is necessary
    to avoid an overflow bug.

    Signed-off-by: Sven Eckelmann
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli

    Sven Eckelmann
     

11 Aug, 2015

7 commits

  • When using a cluster of switches, some topologies will have an MDIO
    bus per switch, not one for the whole cluster. Allow this to be
    represented in the device tree, by adding an optional mii-bus property
    at the switch level. The old platform_device method of instantiation
    supports this already, so only the device tree binding needs extending
    with an additional optional phandle.

    Signed-off-by: Andrew Lunn
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • Support for sharing GREPROTO_CISCO port was added so that
    OVS gre port and kernel GRE devices can co-exist. After
    flow-based tunneling patches OVS GRE protocol processing
    is completely moved to ip_gre module. so there is no need
    for GRE protocol hook. Following patch consolidates
    GRE protocol related functions into ip_gre module.

    Signed-off-by: Pravin B Shelar
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • Using GRE tunnel meta data collection feature, we can implement
    OVS GRE vport. This patch removes all of the OVS
    specific GRE code and make OVS use a ip_gre net_device.
    Minimal GRE vport is kept to handle compatibility with
    current userspace application.

    Signed-off-by: Pravin B Shelar
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • Following patch create new tunnel flag which enable
    tunnel metadata collection on given device.

    Signed-off-by: Pravin B Shelar
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • This function will be used in gre and geneve vport implementations.

    Signed-off-by: Pravin B Shelar
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • Add an explicit neighbour table overflow message (ratelimited) and
    statistic to make diagnosing neighbour table overflows tractable in
    the wild.

    Diagnosing a neighbour table overflow can be quite difficult in the wild
    because there is no explicit dmesg logged. Callers to neighbour code
    seem to use net_dbg_ratelimit when the neighbour call fails which means
    the "base message" is not emitted and the callback suppressed messages
    from the ratelimiting can end-up juxtaposed with unrelated messages.
    Further, a forced garbage collection will increment a stat on each call
    whether it was successful in freeing-up a table entry or not, so that
    statistic is only a hint. So, add a net_info_ratelimited message and
    explicit statistic to the neighbour code.

    Signed-off-by: Rick Jones
    Signed-off-by: David S. Miller

    Rick Jones
     
  • This patch adds the ability to toggle the vlan filtering support via
    netlink. Since we're already running with rtnl in .changelink() we don't
    need to take any additional locks.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

10 Aug, 2015

6 commits

  • This patch fix double word "the the" in
    Documentation/DocBook/networking/API-eth-get-headlen.html
    Documentation/DocBook/networking/netdev.html
    Documentation/DocBook/networking.xml

    These files are generated from comment in source,
    so I have to fix comment in net/ethernet/eth.c.

    Signed-off-by: Masanari Iida
    Signed-off-by: David S. Miller

    Masanari Iida
     
  • RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only
    label on the stack, then after popping the resulting packet must be
    treated as a IPv4 packet and forwarded based on the IPv4 header. The
    same is true for IPv6 Explicit NULL with an IPv6 packet following.

    Therefore, when installing the IPv4/IPv6 Explicit NULL label routes,
    add an attribute that specifies the expected payload type for use at
    forwarding time for determining the type of the encapsulated packet
    instead of inspecting the first nibble of the packet.

    Signed-off-by: Robert Shearman
    Signed-off-by: David S. Miller

    Robert Shearman
     
  • Remove the fdb_{add,del,getnext} function pointer in favor of new
    port_fdb_{add,del,getnext}.

    Implement the switchdev_port_obj_{add,del,dump} functions in DSA to
    support the SWITCHDEV_OBJ_PORT_FDB objects.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • This patch adds a is_static boolean to the switchdev_obj_fdb structure,
    in order to set the ndm_state to either NUD_NOARP or NUD_REACHABLE.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • The address in the switchdev_obj_fdb structure is currently represented
    as a pointer. Replacing it for a 6-byte array allows switchdev to carry
    addresses directly read from hardware registers, not stored by the
    switch chip driver (as in Rocker).

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • This patch fix a double word "the the"
    in Documentation/DocBook/networking.xml and
    Documentation/DocBook/networking/API-Wimax-report-rfkill-sw.html.

    These files are generated from comment in source, so I had to
    fix the typo in net/wimax/io-rfkill.c

    Signed-off-by: Masanari Iida
    Signed-off-by: David S. Miller

    Masanari Iida
     

08 Aug, 2015

5 commits

  • There is a race condition in store_rps_map that allows jump label
    count in rps_needed to go below zero. This can happen when
    concurrently attempting to set and a clear map.

    Scenario:

    1. rps_needed count is zero
    2. New map is assigned by setting thread, but rps_needed count _not_ yet
    incremented (rps_needed count still zero)
    2. Map is cleared by second thread, old_map set to that just assigned
    3. Second thread performs static_key_slow_dec, rps_needed count now goes
    negative

    Fix is to increment or decrement rps_needed under the spinlock.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • When sampling rate is 1, the sampling probability is UINT32_MAX. The packet
    should be sampled even the prandom32() generate the number of UINT32_MAX.
    And none packet need be sampled when the probability is 0.

    Signed-off-by: Wenyu Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Wenyu Zhang
     
  • IFLA_VXLAN_FLOWBASED is useless without IFLA_VXLAN_COLLECT_METADATA,
    so combine them into single IFLA_VXLAN_COLLECT_METADATA flag.
    'flowbased' doesn't convey real meaning of the vxlan tunnel mode.
    This mode can be used by routing, tc+bpf and ovs.
    Only ovs is strictly flow based, so 'collect metadata' is a better
    name for this tunnel mode.

    Signed-off-by: Alexei Starovoitov
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • Register pernet subsys init/stop functions that will set up
    and tear down per-net RDS-TCP listen endpoints. Unregister
    pernet subusys functions on 'modprobe -r' to clean up these
    end points.

    Enable keepalive on both accept and connect socket endpoints.
    The keepalive timer expiration will ensure that client socket
    endpoints will be removed as appropriate from the netns when
    an interface is removed from a namespace.

    Register a device notifier callback that will clean up all
    sockets (and thus avoid the need to wait for keepalive timeout)
    when the loopback device is unregistered from the netns indicating
    that the netns is getting deleted.

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     
  • Open the sockets calling sock_create_kern() with the correct struct net
    pointer, and use that struct net pointer when verifying the
    address passed to rds_bind().

    Signed-off-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

07 Aug, 2015

2 commits

  • This patch adds null dev check for the 'cfg->rc_via_table ==
    NEIGH_LINK_TABLE or dev_get_by_index() failed' case

    Reported-by: Dan Carpenter
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Roopa Prabhu
     
  • We recently changed this code from returning NULL to returning ERR_PTR.
    There are some left over NULL assignments which we can remove. We can
    preserve the error code from ip_route_output() instead of always
    returning -ENODEV. Also these functions use a mix of gotos and direct
    returns. There is no cleanup necessary so I changed the gotos to
    direct returns.

    Signed-off-by: Dan Carpenter
    Acked-by: Roopa Prabhu
    Acked-by: Robert Shearman
    Signed-off-by: David S. Miller

    Dan Carpenter
     

05 Aug, 2015

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for net-next, they are:

    1) A couple of cleanups for the netfilter core hook from Eric Biederman.

    2) Net namespace hook registration, also from Eric. This adds a dependency with
    the rtnl_lock. This should be fine by now but we have to keep an eye on this
    because if we ever get the per-subsys nfnl_lock before rtnl we have may
    problems in the future. But we have room to remove this in the future by
    propagating the complexity to the clients, by registering hooks for the init
    netns functions.

    3) Update nf_tables to use the new net namespace hook infrastructure, also from
    Eric.

    4) Three patches to refine and to address problems from the new net namespace
    hook infrastructure.

    5) Switch to alternate jumpstack in xtables iff the packet is reentering. This
    only applies to a very special case, the TEE target, but Eric Dumazet
    reports that this is slowing down things for everyone else. So let's only
    switch to the alternate jumpstack if the tee target is in used through a
    static key. This batch also comes with offline precalculation of the
    jumpstack based on the callchain depth. From Florian Westphal.

    6) Minimal SCTP multihoming support for our conntrack helper, from Michal
    Kubecek.

    7) Reduce nf_bridge_info per skbuff scratchpad area to 32 bytes, from Florian
    Westphal.

    8) Fix several checkpatch errors in bridge netfilter, from Bernhard Thaler.

    9) Get rid of useless debug message in ip6t_REJECT, from Subash Abhinov.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

04 Aug, 2015

8 commits

  • Make it similar to reject_tg() in ipt_REJECT.

    Suggested-by: Pablo Neira Ayuso
    Signed-off-by: Subash Abhinov Kasiviswanathan
    Signed-off-by: Pablo Neira Ayuso

    Subash Abhinov Kasiviswanathan
     
  • In multiple locations there are checks for whether the label in hand
    is a reserved label or not using the arbritray value of 16. Factor
    this out into a #define for better maintainability and for
    documentation.

    Signed-off-by: Robert Shearman
    Acked-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Robert Shearman
     
  • lwtunnel encap is applied for forwarded packets, but not for
    locally-generated packets. This is because the output function is not
    overridden in __mkroute_output, unlike it is in __mkroute_input.

    The lwtunnel state is correctly set on the rth through the call to
    rt_set_nexthop, so all that needs to be done is to override the dst
    output function to be lwtunnel_output if there is lwtunnel state
    present and it requires output redirection.

    Signed-off-by: Robert Shearman
    Signed-off-by: David S. Miller

    Robert Shearman
     
  • In the locally-generated packet path skb->protocol may not be set and
    this is required for the lwtunnel encap in order to get the lwtstate.

    This would otherwise have been set by ip_output or ip6_output so set
    skb->protocol prior to calling the lwtunnel encap
    function. Additionally set skb->dev in case it is needed further down
    the transmit path.

    Signed-off-by: Robert Shearman
    Signed-off-by: David S. Miller

    Robert Shearman
     
  • Instead of trying to access br->vlan_enabled directly use the provided
    helper br_vlan_enabled().

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Since the introduction of the BPF action in d23b8ad8ab23 ("tc: add BPF
    based action"), late binding was not working as expected. I.e. setting
    the action part for a classifier only via 'bpf index ', where
    is the index of an existing action, is being rejected by the kernel due
    to other missing parameters.

    It doesn't make sense to require these parameters such as BPF opcodes
    etc, as they are not going to be used anyway: in this case, they're just
    allocated/parsed and then freed again w/o doing anything meaningful.

    Instead, parse and verify the remaining parameters *after* the test on
    tcf_hash_check(), when we really know that we're dealing with creation
    of a new action or replacement of an existing one and where late binding
    is thus irrelevant.

    After patch, test case is now working:

    FOO="1,6 0 0 4294967295,"
    tc actions add action bpf bytecode "$FOO"
    tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action bpf index 1
    tc actions show action bpf
    action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
    index 1 ref 2 bind 1
    tc filter show dev foo
    filter protocol all pref 49152 bpf
    filter protocol all pref 49152 bpf handle 0x1 flowid 1:1 bytecode '1,6 0 0 4294967295'
    action order 1: bpf bytecode '1,6 0 0 4294967295' default-action pipe
    index 1 ref 2 bind 1

    Late binding of a BPF action can be useful for preloading maps (e.g. before
    they hit traffic) in case of eBPF programs, or to share a single eBPF action
    with multiple classifiers.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Before this patch when a vid was not specified, the entry was added with
    vid 0 which is useless when vlan_filtering is enabled. This patch makes
    the entry to be added on all configured vlans when vlan filtering is
    enabled and respectively deleted from all, if the entry vid is 0.
    This is also closer to the way fdb works with regard to vid 0 and vlan
    filtering.

    Example:
    Setup:
    $ bridge vlan add vid 256 dev eth4
    $ bridge vlan add vid 1024 dev eth4
    $ bridge vlan add vid 64 dev eth3
    $ bridge vlan add vid 128 dev eth3
    $ bridge vlan
    port vlan ids
    eth3 1 PVID Egress Untagged
    64
    128

    eth4 1 PVID Egress Untagged
    256
    1024
    $ echo 1 > /sys/class/net/br0/bridge/vlan_filtering

    Before:
    $ bridge mdb add dev br0 port eth3 grp 239.0.0.1
    $ bridge mdb
    dev br0 port eth3 grp 239.0.0.1 temp

    After:
    $ bridge mdb add dev br0 port eth3 grp 239.0.0.1
    $ bridge mdb
    dev br0 port eth3 grp 239.0.0.1 temp vid 1
    dev br0 port eth3 grp 239.0.0.1 temp vid 128
    dev br0 port eth3 grp 239.0.0.1 temp vid 64

    Signed-off-by: Satish Ashok
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Satish Ashok
     
  • Bridge devices don't need to segment multiple tagged packets since thier
    ports can segment them.

    Signed-off-by: Toshiaki Makita
    Signed-off-by: David S. Miller

    Toshiaki Makita
     

03 Aug, 2015

1 commit

  • Add skb->hash to the __sk_buff offset map, so it can be accessed from
    an eBPF program. We currently already do this for classic BPF filters,
    but not yet on eBPF, it might be useful as a demuxer in combination with
    helpers like bpf_clone_redirect(), toy example:

    __section("cls-lb") int ingress_main(struct __sk_buff *skb)
    {
    unsigned int which = 3 + (skb->hash & 7);
    /* bpf_skb_store_bytes(skb, ...); */
    /* bpf_l{3,4}_csum_replace(skb, ...); */
    bpf_clone_redirect(skb, which, 0);
    return -1;
    }

    I was thinking whether to add skb_get_hash(), but then concluded the
    raw skb->hash seems fine in this case: we can directly access the hash
    w/o extra eBPF helper function call, it's filled out by many NICs on
    ingress, and in case the entropy level would not be sufficient, people
    can still implement their own specific sw fallback hash mix anyway.

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

01 Aug, 2015

9 commits

  • Conflicts:
    arch/s390/net/bpf_jit_comp.c
    drivers/net/ethernet/ti/netcp_ethss.c
    net/bridge/br_multicast.c
    net/ipv4/ip_fragment.c

    All four conflicts were cases of simple overlapping
    changes.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pull networking fixes from David Miller:

    1) Must teardown SR-IOV before unregistering netdev in igb driver, from
    Alex Williamson.

    2) Fix ipv6 route unreachable crash in IPVS, from Alex Gartrell.

    3) Default route selection in ipv4 should take the prefix length, table
    ID, and TOS into account, from Julian Anastasov.

    4) sch_plug must have a reset method in order to purge all buffered
    packets when the qdisc is reset, likewise for sch_choke, from WANG
    Cong.

    5) Fix deadlock and races in slave_changelink/br_setport in bridging.
    From Nikolay Aleksandrov.

    6) mlx4 bug fixes (wrong index in port even propagation to VFs,
    overzealous BUG_ON assertion, etc.) from Ido Shamay, Jack
    Morgenstein, and Or Gerlitz.

    7) Turn off klog message about SCTP userspace interface compat that
    makes no sense at all, from Daniel Borkmann.

    8) Fix unbounded restarts of inet frag eviction process, causing NMI
    watchdog soft lockup messages, from Florian Westphal.

    9) Suspend/resume fixes for r8152 from Hayes Wang.

    10) Fix busy loop when MSG_WAITALL|MSG_PEEK is used in TCP recv, from
    Sabrina Dubroca.

    11) Fix performance regression when removing a lot of routes from the
    ipv4 routing tables, from Alexander Duyck.

    12) Fix device leak in AF_PACKET, from Lars Westerhoff.

    13) AF_PACKET also has a header length comparison bug due to signedness,
    from Alexander Drozdov.

    14) Fix bug in EBPF tail call generation on x86, from Daniel Borkmann.

    15) Memory leaks, TSO stats, watchdog timeout and other fixes to
    thunderx driver from Sunil Goutham and Thanneeru Srinivasulu.

    16) act_bpf can leak memory when replacing programs, from Daniel
    Borkmann.

    17) WOL packet fixes in gianfar driver, from Claudiu Manoil.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits)
    stmmac: fix missing MODULE_LICENSE in stmmac_platform
    gianfar: Enable device wakeup when appropriate
    gianfar: Fix suspend/resume for wol magic packet
    gianfar: Fix warning when CONFIG_PM off
    act_pedit: check binding before calling tcf_hash_release()
    net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket
    net: sched: fix refcount imbalance in actions
    r8152: reset device when tx timeout
    r8152: add pre_reset and post_reset
    qlcnic: Fix corruption while copying
    act_bpf: fix memory leaks when replacing bpf programs
    net: thunderx: Fix for crash while BGX teardown
    net: thunderx: Add PCI driver shutdown routine
    net: thunderx: Fix crash when changing rss with mutliple traffic flows
    net: thunderx: Set watchdog timeout value
    net: thunderx: Wakeup TXQ only if CQE_TX are processed
    net: thunderx: Suppress alloc_pages() failure warnings
    net: thunderx: Fix TSO packet statistic
    net: thunderx: Fix memory leak when changing queue count
    net: thunderx: Fix RQ_DROP miscalculation
    ...

    Linus Torvalds
     
  • Per RFC6437 stateful flow labels (e.g. labels set by flow label manager)
    cannot "disturb" nodes taking part in stateless flow labels. While the
    ranges only reduce the flow label entropy by one bit, it is conceivable
    that this might bias the algorithm on some routers causing a load
    imbalance. For best results on the Internet we really need the full
    20 bits.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • Change the meaning of net.ipv6.auto_flowlabels to provide a mode for
    automatic flow labels generation. There are four modes:

    0: flow labels are disabled
    1: flow labels are enabled, sockets can opt-out
    2: flow labels are allowed, sockets can opt-in
    3: flow labels are enabled and enforced, no opt-out for sockets

    np->autoflowlabel is initialized according to the sysctl value.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • We can't call skb_get_hash here since the packet is not complete to do
    flow_dissector. Create hash based on flowi6 instead.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • Add skb_get_hash_flowi6 and skb_get_hash_flowi4 which derive an sk_buff
    hash from flowi6 and flowi4 structures respectively. These functions
    can be called when creating a packet in the output path where the new
    sk_buff does not yet contain a fully formed packet that is parsable by
    flow dissector.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     
  • Add support for using DSA slave network devices with netconsole, which
    requires us to allocate and free custom netpoll instances and invoke the
    parent network device poll controller callback.

    In order for netconsole to work, we need to construct the DSA tag, but
    not queue the skb for transmission on the master network device xmit
    function.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • All tagging protocols do the same thing: increment device statistics,
    make room for the tag to be inserted, create the tag, invoke the parent
    network device transmit function.

    In order to prepare for adding netpoll support, which requires the tag
    creation, but not using the parent network device transmit function, do
    some little refactoring which eliminates duplication between the 4
    tagging protocols supported.

    We need to return a sk_buff pointer back to the caller because the tag
    specific transmit function may have to reallocate the original skb (e.g:
    tag_trailer.c) and this is the one we should be transmitting, not the
    original sk_buff we were passed.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Use vsprintf extension %pI4 instead.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches