12 Apr, 2018

1 commit

  • [ Upstream commit 58b35f27689b5eb514fc293c332966c226b1b6e4 ]

    arp_filter performs an ip_route_output search for arp source address and
    checks if output device is the same where the arp request was received,
    if it is not, the arp request is not answered.

    This route lookup is always done on main route table so l3slave devices
    never find the proper route and arp is not answered.

    Passing l3mdev_master_ifindex_rcu(dev) return value as oif fixes the
    lookup for l3slave devices while maintaining same behavior for non
    l3slave devices as this function returns 0 in that case.

    Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX")
    Signed-off-by: Miguel Fadon Perlines
    Acked-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Miguel Fadon Perlines
     

31 Jan, 2018

1 commit

  • [ Upstream commit cd9ff4de0107c65d69d02253bb25d6db93c3dbc1 ]

    Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
    to avoid making an entry for every remote ip the device needs to talk to.

    This used the be the old behavior but became broken in a263b3093641f
    (ipv4: Make neigh lookups directly in output packet path) and later removed
    in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point
    devices) because it was broken.

    Signed-off-by: Jim Westfall
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jim Westfall
     

30 Aug, 2017

1 commit

  • Florian reported UDP xmit drops that could be root caused to the
    too small neigh limit.

    Current limit is 64 KB, meaning that even a single UDP socket would hit
    it, since its default sk_sndbuf comes from net.core.wmem_default
    (~212992 bytes on 64bit arches).

    Once ARP/ND resolution is in progress, we should allow a little more
    packets to be queued, at least for one producer.

    Once neigh arp_queue is filled, a rogue socket should hit its sk_sndbuf
    limit and either block in sendmsg() or return -EAGAIN.

    Signed-off-by: Eric Dumazet
    Reported-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Eric Dumazet
     

16 Jun, 2017

1 commit

  • It seems like a historic accident that these return unsigned char *,
    and in many places that means casts are required, more often than not.

    Make these functions (skb_put, __skb_put and pskb_put) return void *
    and remove all the casts across the tree, adding a (u8 *) cast only
    where the unsigned char pointer was used directly, all done with the
    following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_put, __skb_put };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_put, __skb_put };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    which actually doesn't cover pskb_put since there are only three
    users overall.

    A handful of stragglers were converted manually, notably a macro in
    drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
    instances in net/bluetooth/hci_sock.c. In the former file, I also
    had to fix one whitespace problem spatch introduced.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

05 Jun, 2017

1 commit

  • The command
    # arp -s 62.2.0.1 a:b:c:d:e:f dev eth2
    adds an entry like the following (listed by "arp -an")
    ? (62.2.0.1) at 0a:0b:0c:0d:0e:0f [ether] PERM on eth2
    but the symmetric deletion command
    # arp -i eth2 -d 62.2.0.1
    does not remove the PERM entry from the table, and instead leaves behind
    ? (62.2.0.1) at on eth2

    The reason is that there is a refcnt of 1 for the arp_tbl itself
    (neigh_alloc starts off the entry with a refcnt of 1), thus
    the neigh_release() call from arp_invalidate() will (at best) just
    decrement the ref to 1, but will never actually free it from the
    table.

    To fix this, we need to do something like neigh_forced_gc: if
    the refcnt is 1 (i.e., on the table's ref), remove the entry from
    the table and free it. This patch refactors and shares common code
    between neigh_forced_gc and the newly added neigh_remove_one.

    A similar issue exists for IPv6 Neighbor Cache entries, and is fixed
    in a similar manner by this patch.

    Signed-off-by: Sowmini Varadhan
    Reviewed-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Sowmini Varadhan
     

26 May, 2017

1 commit

  • Commit 7d472a59c0e5ec117220a05de6b370447fb6cb66 ("arp: always override
    existing neigh entries with gratuitous ARP") introduced a compiler
    warning:

    net/ipv4/arp.c:880:35: warning: 'addr_type' may be used uninitialized in
    this function [-Wmaybe-uninitialized]

    While the code logic seems to be correct and doesn't allow the variable
    to be used uninitialized, and the warning is not consistently
    reproducible, it's still worth fixing it for other people not to waste
    time looking at the warning in case it pops up in the build environment.
    Yes, compiler is probably at fault, but we will need to accommodate.

    Fixes: 7d472a59c0e5 ("arp: always override existing neigh entries with gratuitous ARP")
    Signed-off-by: Ihar Hrachyshka
    Signed-off-by: David S. Miller

    Ihar Hrachyshka
     

22 May, 2017

4 commits

  • Currently, when arp_accept is 1, we always override existing neigh
    entries with incoming gratuitous ARP replies. Otherwise, we override
    them only if new replies satisfy _locktime_ conditional (packets arrive
    not earlier than _locktime_ seconds since the last update to the neigh
    entry).

    The idea behind locktime is to pick the very first (=> close) reply
    received in a unicast burst when ARP proxies are used. This helps to
    avoid ARP thrashing where Linux would switch back and forth from one
    proxy to another.

    This logic has nothing to do with gratuitous ARP replies that are
    generally not aligned in time when multiple IP address carriers send
    them into network.

    This patch enforces overriding of existing neigh entries by all incoming
    gratuitous ARP packets, irrespective of their time of arrival. This will
    make the kernel honour all incoming gratuitous ARP packets.

    Signed-off-by: Ihar Hrachyshka
    Signed-off-by: David S. Miller

    Ihar Hrachyshka
     
  • The addr_type retrieval can be costly, so it's worth trying to avoid its
    calculation as much as possible. This patch makes it calculated only
    for gratuitous ARP packets. This is especially important since later we
    may want to move is_garp calculation outside of arp_accept block, at
    which point the costly operation will be executed for all setups.

    The patch is the result of a discussion in net-dev:
    http://marc.info/?l=linux-netdev&m=149506354216994

    Suggested-by: Julian Anastasov
    Signed-off-by: Ihar Hrachyshka
    Signed-off-by: David S. Miller

    Ihar Hrachyshka
     
  • The code is quite involving already to earn a separate function for
    itself. If anything, it helps arp_process readability.

    Signed-off-by: Ihar Hrachyshka
    Signed-off-by: David S. Miller

    Ihar Hrachyshka
     
  • the is_garp code deals just with gratuitous ARP packets, not every
    unsolicited packet.

    This patch is a result of a discussion in netdev:
    http://marc.info/?l=linux-netdev&m=149506354216994

    Suggested-by: Julian Anastasov
    Signed-off-by: Ihar Hrachyshka
    Signed-off-by: David S. Miller

    Ihar Hrachyshka
     

17 May, 2017

1 commit

  • When arp_accept is 1, gratuitous ARPs are supposed to override matching
    entries irrespective of whether they arrive during locktime. This was
    implemented in commit 56022a8fdd87 ("ipv4: arp: update neighbour address
    when a gratuitous arp is received and arp_accept is set")

    There is a glitch in the patch though. RFC 2002, section 4.6, "ARP,
    Proxy ARP, and Gratuitous ARP", defines gratuitous ARPs so that they can
    be either of Request or Reply type. Those Reply gratuitous ARPs can be
    triggered with standard tooling, for example, arping -A option does just
    that.

    This patch fixes the glitch, making both Request and Reply flavours of
    gratuitous ARPs to behave identically.

    As per RFC, if gratuitous ARPs are of Reply type, their Target Hardware
    Address field should also be set to the link-layer address to which this
    cache entry should be updated. The field is present in ARP over Ethernet
    but not in IEEE 1394. In this patch, I don't consider any broadcasted
    ARP replies as gratuitous if the field is not present, to conform the
    standard. It's not clear whether there is such a thing for IEEE 1394 as
    a gratuitous ARP reply; until it's cleared up, we will ignore such
    broadcasts. Note that they will still update existing ARP cache entries,
    assuming they arrive out of locktime time interval.

    Signed-off-by: Ihar Hrachyshka
    Signed-off-by: David S. Miller

    Ihar Hrachyshka
     

23 Mar, 2017

1 commit

  • neigh notifications today carry pid 0 for nlmsg_pid
    in all cases. This patch fixes it to carry calling process
    pid when available. Applications (eg. quagga) rely on
    nlmsg_pid to ignore notifications generated by their own
    netlink operations. This patch follows the routing subsystem
    which already sets this correctly.

    Reported-by: Vivek Venkatraman
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Roopa Prabhu
     

14 Feb, 2017

1 commit

  • When sending ARP requests over AX.25 links the hwaddress in the neighbour
    cache are not getting initialized. For such an incomplete arp entry
    ax2asc2 will generate an empty string resulting in /proc/net/arp output
    like the following:

    $ cat /proc/net/arp
    IP address HW type Flags HW address Mask Device
    192.168.122.1 0x1 0x2 52:54:00:00:5d:5f * ens3
    172.20.1.99 0x3 0x0 * bpq0

    The missing field will confuse the procfs parsing of arp(8) resulting in
    incorrect output for the device such as the following:

    $ arp
    Address HWtype HWaddress Flags Mask Iface
    gateway ether 52:54:00:00:5d:5f C ens3
    172.20.1.99 (incomplete) ens3

    This changes the content of /proc/net/arp to:

    $ cat /proc/net/arp
    IP address HW type Flags HW address Mask Device
    172.20.1.99 0x3 0x0 * * bpq0
    192.168.122.1 0x1 0x2 52:54:00:00:5d:5f * ens3

    To do so it change ax2asc to put the string "*" in buf for a NULL address
    argument. Finally the HW address field is left aligned in a 17 character
    field (the length of an ethernet HW address in the usual hex notation) for
    readability.

    Signed-off-by: Ralf Baechle
    Signed-off-by: David S. Miller

    Ralf Baechle
     

28 Apr, 2016

1 commit


08 Mar, 2016

1 commit

  • Currently, arp_rcv() always return zero on a packet delivery upcall.

    To make its behavior more compliant with the way this API should be
    used, this patch changes this to let it return NET_RX_SUCCESS when the
    packet is proper handled, and NET_RX_DROP otherwise.

    v1->v2:
    If sanity check is failed, call kfree_skb() instead of consume_skb(), then
    return the correct return value.

    Signed-off-by: Zhang Shengju
    Signed-off-by: David S. Miller

    Zhang Shengju
     

11 Feb, 2016

1 commit

  • In certain 802.11 wireless deployments, there will be ARP proxies
    that use knowledge of the network to correctly answer requests.
    To prevent gratuitous ARP frames on the shared medium from being
    a problem, on such deployments wireless needs to drop them.

    Enable this by providing an option called "drop_gratuitous_arp".

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

20 Oct, 2015

1 commit


05 Oct, 2015

1 commit

  • There are cases when the created metadata reply is not used. Ensure the
    allocated memory is freed also in such cases.

    Fixes: 63d008a4e9ee ("ipv4: send arp replies to the correct tunnel")
    Reported-by: Hannes Frederic Sowa
    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc
     

27 Sep, 2015

1 commit


25 Sep, 2015

1 commit

  • When using ip lwtunnels, the additional data for xmit (basically, the actual
    tunnel to use) are carried in ip_tunnel_info either in dst->lwtstate or in
    metadata dst. When replying to ARP requests, we need to send the reply to
    the same tunnel the request came from. This means we need to construct
    proper metadata dst for ARP replies.

    We could perform another route lookup to get a dst entry with the correct
    lwtstate. However, this won't always ensure that the outgoing tunnel is the
    same as the incoming one, and it won't work anyway for IPv4 duplicate
    address detection.

    The only thing to do is to "reverse" the ip_tunnel_info.

    Signed-off-by: Jiri Benc
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Jiri Benc
     

18 Sep, 2015

3 commits

  • This is immediately motivated by the bridge code that chains functions that
    call into netfilter. Without passing net into the okfns the bridge code would
    need to guess about the best expression for the network namespace to process
    packets in.

    As net is frequently one of the first things computed in continuation functions
    after netfilter has done it's job passing in the desired network namespace is in
    many cases a code simplification.

    To support this change the function dst_output_okfn is introduced to
    simplify passing dst_output as an okfn. For the moment dst_output_okfn
    just silently drops the struct net.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Pass a network namespace parameter into the netfilter hooks. At the
    call site of the netfilter hooks the path a packet is taking through
    the network stack is well known which allows the network namespace to
    be easily and reliabily.

    This allows the replacement of magic code like
    "dev_net(state->in?:state->out)" that appears at the start of most
    netfilter hooks with "state->net".

    In almost all cases the network namespace passed in is derived
    from the first network device passed in, guaranteeing those
    paths will not see any changes in practice.

    The exceptions are:
    xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm)
    ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp)
    ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp)
    ipv4/raw.c:raw_send_hdrinc() sock_net(sk)
    ipv6/ip6_output.c:ip6_xmit() sock_net(sk)
    ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev)
    ipv6/raw.c:raw6_send_hdrinc() sock_net(sk)
    br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev

    In all cases these exceptions seem to be a better expression for the
    network namespace the packet is being processed in then the historic
    "dev_net(in?in:out)". I am documenting them in case something odd
    pops up and someone starts trying to track down what happened.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • The function dev_queue_xmit_skb_sk is unncessary and very confusing.
    Introduce arp_xmit_finish to remove the need for dev_queue_xmit_skb_sk,
    and have arp_xmit_finish call dev_queue_xmit.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

14 Aug, 2015

1 commit

  • Currently inet_addr_type and inet_dev_addr_type expect local addresses
    to be in the local table. With the VRF device local routes for devices
    associated with a VRF will be in the table associated with the VRF.
    Provide an alternate inet_addr lookup to use a specific table rather
    than defaulting to the local table.

    inet_addr_type_dev_table keeps the same semantics as inet_addr_type but
    if the passed in device is enslaved to a VRF then the table for that VRF
    is used for the lookup.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

01 Aug, 2015

1 commit


29 Jul, 2015

1 commit

  • When arp is off on a device, and ioctl(SIOCGARP) is queried,
    a buggy answer is given with MAC address of the device, instead
    of the mac address of the destination/gateway.

    We filter out NUD_NOARP neighbours for /proc/net/arp,
    we must do the same for SIOCGARP ioctl.

    Tested:

    lpaa23:~# ./arp 10.246.7.190
    MAC=00:01:e8:22:cb:1d // correct answer

    lpaa23:~# ip link set dev eth0 arp off
    lpaa23:~# cat /proc/net/arp # check arp table is now 'empty'
    IP address HW type Flags HW address Mask Device
    lpaa23:~# ./arp 10.246.7.190
    MAC=00:1a:11:c3:0d:7f // buggy answer before patch (this is eth0 mac)

    After patch :

    lpaa23:~# ip link set dev eth0 arp off
    lpaa23:~# ./arp 10.246.7.190
    ioctl(SIOCGARP) failed: No such device or address

    Signed-off-by: Eric Dumazet
    Reported-by: Vytautas Valancius
    Cc: Willem de Bruijn
    Signed-off-by: David S. Miller

    Eric Dumazet
     

22 Jul, 2015

1 commit


08 Apr, 2015

1 commit

  • On the output paths in particular, we have to sometimes deal with two
    socket contexts. First, and usually skb->sk, is the local socket that
    generated the frame.

    And second, is potentially the socket used to control a tunneling
    socket, such as one the encapsulates using UDP.

    We do not want to disassociate skb->sk when encapsulating in order
    to fix this, because that would break socket memory accounting.

    The most extreme case where this can cause huge problems is an
    AF_PACKET socket transmitting over a vxlan device. We hit code
    paths doing checks that assume they are dealing with an ipv4
    socket, but are actually operating upon the AF_PACKET one.

    Signed-off-by: David S. Miller

    David Miller
     

04 Apr, 2015

2 commits

  • The ipv4 code uses a mixture of coding styles. In some instances check
    for non-NULL pointer is done as x != NULL and sometimes as x. x is
    preferred according to checkpatch and this patch makes the code
    consistent by adopting the latter form.

    No changes detected by objdiff.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     
  • The ipv4 code uses a mixture of coding styles. In some instances check
    for NULL pointer is done as x == NULL and sometimes as !x. !x is
    preferred according to checkpatch and this patch makes the code
    consistent by adopting the latter form.

    No changes detected by objdiff.

    Signed-off-by: Ian Morris
    Signed-off-by: David S. Miller

    Ian Morris
     

04 Mar, 2015

1 commit

  • While looking at the mpls code I found myself writing yet another
    version of neigh_lookup_noref. We currently have __ipv4_lookup_noref
    and __ipv6_lookup_noref.

    So to make my work a little easier and to make it a smidge easier to
    verify/maintain the mpls code in the future I stopped and wrote
    ___neigh_lookup_noref. Then I rewote __ipv4_lookup_noref and
    __ipv6_lookup_noref in terms of this new function. I tested my new
    version by verifying that the same code is generated in
    ip_finish_output2 and ip6_finish_output2 where these functions are
    inlined.

    To get to ___neigh_lookup_noref I added a new neighbour cache table
    function key_eq. So that the static size of the key would be
    available.

    I also added __neigh_lookup_noref for people who want to to lookup
    a neighbour table entry quickly but don't know which neibhgour table
    they are going to look up.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

03 Mar, 2015

3 commits


12 Nov, 2014

1 commit

  • Currently there are only three neigh tables in the whole kernel:
    arp table, ndisc table and decnet neigh table. What's more,
    we don't support registering multiple tables per family.
    Therefore we can just make these tables statically built-in.

    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

29 Sep, 2014

1 commit


02 Jan, 2014

1 commit

  • Gratuitous arp packets are useful in switchover scenarios to update
    client arp tables as quickly as possible. Currently, the mac address
    of a neighbour is only updated after a locktime period has elapsed
    since the last update. In most use cases such delays are unacceptable
    for network admins. Moreover, the "updated" field of the neighbour
    stucture doesn't record the last time the address of a neighbour
    changed but records any change that happens to the neighbour. This is
    clearly a bug since locktime uses that field as meaning "addr_updated".
    With this observation, I was able to perpetuate a stale address by
    sending a stream of gratuitous arp packets spaced less than locktime
    apart. With this change the address is updated when a gratuitous arp
    is received and the arp_accept sysctl is set.

    Signed-off-by: Salam Noureddine
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Salam Noureddine
     

29 Dec, 2013

1 commit


12 Dec, 2013

1 commit

  • Help of this function says: "in_dev: only on this interface, 0=any interface",
    but since commit 39a6d0630012 ("[NETNS]: Process inet_confirm_addr in the
    correct namespace."), the code supposes that it will never be NULL. This
    function is never called with in_dev == NULL, but it's exported and may be used
    by an external module.

    Because this patch restore the ability to call inet_confirm_addr() with in_dev
    == NULL, I partially revert the above commit, as suggested by Julian.

    CC: Julian Anastasov
    Signed-off-by: Nicolas Dichtel
    Reviewed-by: Julian Anastasov
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

10 Dec, 2013

1 commit