25 Dec, 2012
1 commit
-
Sedat reported the following commit caused a regression:
commit 9650388b5c56578fdccc79c57a8c82fb92b8e7f1
Author: Eric Dumazet
Date: Fri Dec 21 07:32:10 2012 +0000ipv4: arp: fix a lockdep splat in arp_solicit
This is due to the 6th parameter of arp_send() needs to be NULL
for the broadcast case, the above commit changed it to an all-zero
array by mistake.Reported-by: Sedat Dilek
Tested-by: Sedat Dilek
Cc: Sedat Dilek
Cc: Eric Dumazet
Cc: David S. Miller
Cc: Julian Anastasov
Signed-off-by: Cong Wang
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
22 Dec, 2012
1 commit
-
Yan Burman reported following lockdep warning :
=============================================
[ INFO: possible recursive locking detected ]
3.7.0+ #24 Not tainted
---------------------------------------------
swapper/1/0 is trying to acquire lock:
(&n->lock){++--..}, at: [] __neigh_event_send
+0x2e/0x2f0but task is already holding lock:
(&n->lock){++--..}, at: [] arp_solicit+0x1d4/0x280other info that might help us debug this:
Possible unsafe locking scenario:CPU0
----
lock(&n->lock);
lock(&n->lock);*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by swapper/1/0:
#0: (((&n->timer))){+.-...}, at: []
call_timer_fn+0x0/0x1c0
#1: (&n->lock){++--..}, at: [] arp_solicit
+0x1d4/0x280
#2: (rcu_read_lock_bh){.+....}, at: []
dev_queue_xmit+0x0/0x5d0
#3: (rcu_read_lock_bh){.+....}, at: []
ip_finish_output+0x13e/0x640stack backtrace:
Pid: 0, comm: swapper/1 Not tainted 3.7.0+ #24
Call Trace:
[] validate_chain+0xdcc/0x11f0
[] ? __lock_acquire+0x440/0xc30
[] ? kmem_cache_free+0xe5/0x1c0
[] __lock_acquire+0x440/0xc30
[] ? inet_getpeer+0x40/0x600
[] ? __lock_acquire+0x440/0xc30
[] ? __neigh_event_send+0x2e/0x2f0
[] lock_acquire+0x95/0x140
[] ? __neigh_event_send+0x2e/0x2f0
[] ? __lock_acquire+0x440/0xc30
[] _raw_write_lock_bh+0x3b/0x50
[] ? __neigh_event_send+0x2e/0x2f0
[] __neigh_event_send+0x2e/0x2f0
[] neigh_resolve_output+0x16b/0x270
[] ip_finish_output+0x34d/0x640
[] ? ip_finish_output+0x13e/0x640
[] ? vxlan_xmit+0x556/0xbec [vxlan]
[] ip_output+0x80/0xf0
[] ip_local_out+0x28/0x80
[] vxlan_xmit+0x66a/0xbec [vxlan]
[] ? vxlan_xmit+0x556/0xbec [vxlan]
[] ? skb_gso_segment+0x2b0/0x2b0
[] ? _raw_spin_unlock_irqrestore+0x65/0x80
[] ? dev_queue_xmit_nit+0x207/0x270
[] dev_hard_start_xmit+0x298/0x5d0
[] dev_queue_xmit+0x2f3/0x5d0
[] ? dev_hard_start_xmit+0x5d0/0x5d0
[] arp_xmit+0x58/0x60
[] arp_send+0x3b/0x40
[] arp_solicit+0x204/0x280
[] ? neigh_add+0x310/0x310
[] neigh_probe+0x45/0x70
[] neigh_timer_handler+0x1a0/0x2a0
[] call_timer_fn+0x7f/0x1c0
[] ? detach_if_pending+0x120/0x120
[] run_timer_softirq+0x238/0x2b0
[] ? neigh_add+0x310/0x310
[] __do_softirq+0x101/0x280
[] call_softirq+0x1c/0x30
[] do_softirq+0x85/0xc0
[] irq_exit+0x9e/0xc0
[] smp_apic_timer_interrupt+0x68/0xa0
[] apic_timer_interrupt+0x6f/0x80
[] ? mwait_idle+0xa4/0x1c0
[] ? mwait_idle+0x9b/0x1c0
[] cpu_idle+0x89/0xe0
[] start_secondary+0x1b2/0x1b6Bug is from arp_solicit(), releasing the neigh lock after arp_send()
In case of vxlan, we eventually need to write lock a neigh lock later.Its a false positive, but we can get rid of it without lockdep
annotations.We can instead use neigh_ha_snapshot() helper.
Reported-by: Yan Burman
Signed-off-by: Eric Dumazet
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller
19 Nov, 2012
1 commit
-
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.In general policy and network stack state changes are allowed
while resource control is left unchanged.Allow creating raw sockets.
Allow the SIOCSARP ioctl to control the arp cache.
Allow the SIOCSIFFLAG ioctl to allow setting network device flags.
Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address.
Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address.
Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address.
Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask.
Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes.Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting gre tunnels.Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting ipip tunnels.Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting ipsec virtual tunnel interfaces.Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC,
MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing
sockets.Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and
arbitrary ip options.Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option.
Allow setting the IP_TRANSPARENT ipv4 socket option.
Allow setting the TCP_REPAIR socket option.
Allow setting the TCP_CONGESTION socket option.Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller
19 Sep, 2012
1 commit
-
Since route cache deletion (89aef8921bfbac22f), delay is no
more used. Remove it.Signed-off-by: Nicolas Dichtel
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
27 Jul, 2012
1 commit
-
With the routing cache removal we lost the "noref" code paths on
input, and this can kill some routing workloads.Reinstate the noref path when we hit a cached route in the FIB
nexthops.With help from Eric Dumazet.
Reported-by: Alexander Duyck
Signed-off-by: David S. Miller
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
21 Jul, 2012
2 commits
-
In order to allow prefixed routes, we have to adjust how rt_gateway
is set and interpreted.The new interpretation is:
1) rt_gateway == 0, destination is on-link, nexthop is iph->daddr
2) rt_gateway != 0, destination requires a nexthop gateway
Abstract the fetching of the proper nexthop value using a new
inline helper, rt_nexthop(), as suggested by Joe Perches.Signed-off-by: David S. Miller
Tested-by: Vijay Subramanian -
The "noref" argument to ip_route_input_common() is now always ignored
because we do not cache routes, and in that case we must always grab
a reference to the resulting 'dst'.Signed-off-by: David S. Miller
28 Jun, 2012
2 commits
-
This reverts commit c074da2810c118b3812f32d6754bd9ead2f169e7.
This change has several unwanted side effects:
1) Sockets will cache the DST_NOCACHE route in sk->sk_rx_dst and we'll
thus never create a real cached route.2) All TCP traffic will use DST_NOCACHE and never use the routing
cache at all.Signed-off-by: David S. Miller
-
DDOS synflood attacks hit badly IP route cache.
On typical machines, this cache is allowed to hold up to 8 Millions dst
entries, 256 bytes for each, for a total of 2GB of memory.rt_garbage_collect() triggers and tries to cleanup things.
Eventually route cache is disabled but machine is under fire and might
OOM and crash.This patch exploits the new TCP early demux, to set a nocache
boolean in case incoming TCP frame is for a not yet ESTABLISHED or
TIMEWAIT socket.This 'nocache' boolean is then used in case dst entry is not found in
route cache, to create an unhashed dst entry (DST_NOCACHE)SYN-cookie-ACK sent use a similar mechanism (ipv4: tcp: dont cache
output dst for syncookies), so after this patch, a machine is able to
absorb a DDOS synflood attack without polluting its IP route cache.Signed-off-by: Eric Dumazet
Cc: Hans Schillstrom
Signed-off-by: David S. Miller
13 Jun, 2012
1 commit
-
Routing of 127/8 is tradtionally forbidden, we consider
packets from that address block martian when routing and do
not process corresponding ARP requests.This is a sane default but renders a huge address space
practically unuseable.The RFC states that no address within the 127/8 block should
ever appear on any network anywhere but it does not forbid
the use of such addresses outside of the loopback device in
particular. For example to address a pool of virtual guests
behind a load balancer.This patch adds a new interface option 'route_localnet'
enabling routing of the 127/8 address block and processing
of ARP requests on a specific interface.Note that for the feature to work, the default local route
covering 127/8 dev lo needs to be removed.Example:
$ sysctl -w net.ipv4.conf.eth0.route_localnet=1
$ ip route del 127.0.0.0/8 dev lo table local
$ ip addr add 127.1.0.1/16 dev eth0
$ ip route flush cacheV2: Fix invalid check to auto flush cache (thanks davem)
Signed-off-by: Thomas Graf
Acked-by: Neil Horman
Signed-off-by: David S. Miller
16 May, 2012
3 commits
-
Use the current debugging style and enable dynamic_debug.
Signed-off-by: Joe Perches
Signed-off-by: David S. Miller -
We are going to delete the Token ring support. This removes any
special processing in the core networking for token ring, (aside
from net/tr.c itself), leaving the drivers and remaining tokenring
support present but inert.The mass removal of the drivers and net/tr.c will be in a separate
commit, so that the history of these files that we still care
about won't have the giant deletion tied into their history.Signed-off-by: Paul Gortmaker
16 Apr, 2012
1 commit
-
Use of "unsigned int" is preferred to bare "unsigned" in net tree.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
29 Mar, 2012
1 commit
-
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:perl -p -i -e 's!^#\s*include\s*.*\n!!' `grep -Irl '^#\s*include\s*' *`
Signed-off-by: David Howells
17 Mar, 2012
1 commit
-
I found recently that the arp_process function which handles all of our received
arp frames, is using IPV4_DEVCONF_ALL macro to check the state of the arp_process
flag. This seems wrong, as it implies that either none or all of the network
interfaces accept gratuitous arps. This patch corrects that, allowing
per-interface arp_accept configuration to deviate from the all setting. Note
this also brings us into line with the way the arp_filter setting is handled
during arp_process execution.Tested this myself on my home network, and confirmed it works as expected.
Signed-off-by: Neil Horman
CC: "David S. Miller"
Signed-off-by: David S. Miller
11 Feb, 2012
1 commit
-
Commit 653241 (net: RFC3069, private VLAN proxy arp support) changed
the behavior of arp proxy to send arp replies back out on the interface
the request came in even if the private VLAN feature is disabled.Previously we checked rt->dst.dev != skb->dev for in scenarios, when
proxy arp is enabled on for the netdevice and also when individual proxy
neighbour entries have been added.This patch adds the check back for the pneigh_lookup() scenario.
Signed-off-by: Thomas Graf
Acked-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller
29 Dec, 2011
1 commit
-
In order to perform a proper universal hash on a vector of integers,
we have to use different universal hashes on each vector element.Which means we need 4 different hash randoms for ipv6.
Signed-off-by: David S. Miller
06 Dec, 2011
1 commit
-
Use "IS_ENABLED(CONFIG_FOO)" macro instead of
"defined(CONFIG_FOO) || defined(CONFIG_FOO_MODULE)"Signed-off-by: Igor Maravic
Signed-off-by: David S. Miller
01 Dec, 2011
2 commits
-
Instead of instantiating an entire new neigh_table instance
just for ATM handling, use the neigh device private facility.Signed-off-by: David S. Miller
-
Let the core self-size the neigh entry based upon the key length.
Signed-off-by: David S. Miller
19 Nov, 2011
1 commit
-
ipv4: Remove all uses of LL_ALLOCATED_SPACE
The macro LL_ALLOCATED_SPACE was ill-conceived. It applies the
alignment to the sum of needed_headroom and needed_tailroom. As
the amount that is then reserved for head room is needed_headroom
with alignment, this means that the tail room left may be too small.This patch replaces all uses of LL_ALLOCATED_SPACE in net/ipv4
with the macro LL_RESERVED_SPACE and direct reference to
needed_tailroom.This also fixes the problem with needed_headroom changing between
allocating the skb and reserving the head room.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
14 Nov, 2011
1 commit
-
Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
> From: David Miller
> Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
>
> > From: Eric Dumazet
> > Date: Wed, 09 Nov 2011 12:14:09 +0100
> >
> >> unres_qlen is the number of frames we are able to queue per unresolved
> >> neighbour. Its default value (3) was never changed and is responsible
> >> for strange drops, especially if IP fragments are used, or multiple
> >> sessions start in parallel. Even a single tcp flow can hit this limit.
> > ...
> >
> > Ok, I've applied this, let's see what happens :-)
>
> Early answer, build fails.
>
> Please test build this patch with DECNET enabled and resubmit. The
> decnet neigh layer still refers to the removed ->queue_len member.
>
> Thanks.Ouch, this was fixed on one machine yesterday, but not the other one I
used this morning, sorry.[PATCH V5 net-next] neigh: new unresolved queue limits
unres_qlen is the number of frames we are able to queue per unresolved
neighbour. Its default value (3) was never changed and is responsible
for strange drops, especially if IP fragments are used, or multiple
sessions start in parallel. Even a single tcp flow can hit this limit.$ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108
PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data.
8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 msSigned-off-by: David S. Miller
18 Jul, 2011
1 commit
-
This will get us closer to being able to do "neigh stuff"
completely independent of the underlying dst_entry for
protocols (ipv4/ipv6) that wish to do so.We will also be able to make dst entries neigh-less.
Signed-off-by: David S. Miller
17 Jul, 2011
2 commits
-
It is always dev_queue_xmit().
Signed-off-by: David S. Miller
-
It's always dev_queue_xmit().
Signed-off-by: David S. Miller
13 Jul, 2011
1 commit
-
Get rid of all of the useless and costly indirection
by doing the neigh hash table lookup directly inside
of the neighbour binding.Rename from arp_bind_neighbour to rt_bind_neighbour.
Use new helpers {__,}ipv4_neigh_lookup()
In rt_bind_neighbour() get rid of useless tests which
are never true in the context this function is called,
namely dev is never NULL and the dst->neighbour is
always NULL.Signed-off-by: David S. Miller
11 Jul, 2011
1 commit
-
We need to make sure the multiplier is odd.
Signed-off-by: David S. Miller
30 Mar, 2011
1 commit
-
My commit 6d55cb91a0020ac0 (gre: fix hard header destination
address checking) broke multicast.The reason is that ip_gre used to get ipgre_header() calls with
zero destination if we have NOARP or multicast destination. Instead
the actual target was decided at ipgre_tunnel_xmit() time based on
per-protocol dissection.Instead of allowing the "abuse" of ->header() calls with invalid
destination, this creates multicast mappings for ip_gre. This also
fixes "ip neigh show nud noarp" to display the proper multicast
mappings used by the gre device.Reported-by: Doug Kehn
Signed-off-by: Timo Teräs
Acked-by: Doug Kehn
Signed-off-by: David S. Miller
13 Mar, 2011
1 commit
-
The idea here is this minimizes the number of places one has to edit
in order to make changes to how flows are defined and used.Signed-off-by: David S. Miller
03 Mar, 2011
1 commit
-
Instead of on the stack.
Signed-off-by: David S. Miller
25 Jan, 2011
1 commit
-
Commit 941666c2e3e0 "net: RCU conversion of dev_getbyhwaddr() and
arp_ioctl()" introduced a regression, reported by Jamie Heilman.
"arp -Ds 192.168.2.41 eth0 pub" triggered the ASSERT_RTNL() assert
in pneigh_lookup()Removing RTNL requirement from arp_ioctl() was a mistake, just revert
that part.Reported-by: Jamie Heilman
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
11 Jan, 2011
1 commit
-
IPv4 over firewire needs to be able to remove ARP entries
from the ARP cache that belong to nodes that are removed, because
IPv4 over firewire uses ARP packets for private information
about nodes.This information becomes invalid as soon as node drops
off the bus and when it reconnects, its only possible
to start talking to it after it responded to an ARP packet.
But ARP cache prevents such packets from being sent.Signed-off-by: Maxim Levitsky
Signed-off-by: David S. Miller
09 Dec, 2010
1 commit
-
Le dimanche 05 décembre 2010 à 09:19 +0100, Eric Dumazet a écrit :
> Hmm..
>
> If somebody can explain why RTNL is held in arp_ioctl() (and therefore
> in arp_req_delete()), we might first remove RTNL use in arp_ioctl() so
> that your patch can be applied.
>
> Right now it is not good, because RTNL wont be necessarly held when you
> are going to call arp_invalidate() ?While doing this analysis, I found a refcount bug in llc, I'll send a
patch for net-2.6Meanwhile, here is the patch for net-next-2.6
Your patch then can be applied after mine.
Thanks
[PATCH] net: RCU conversion of dev_getbyhwaddr() and arp_ioctl()
dev_getbyhwaddr() was called under RTNL.
Rename it to dev_getbyhwaddr_rcu() and change all its caller to now use
RCU locking instead of RTNL.Change arp_ioctl() to use RCU instead of RTNL locking.
Note: this fix a dev refcount bug in llc
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
07 Dec, 2010
1 commit
-
Only when dont_send is 0, arp_filter() is consulted, so we can simply
assign the return value of arp_filter() to dont_send instead.Signed-off-by: Changli Gao
Signed-off-by: David S. Miller
18 Nov, 2010
1 commit
-
Use the macros defined for the members of flowi to clean the code up.
Signed-off-by: Changli Gao
Signed-off-by: David S. Miller
12 Oct, 2010
1 commit
-
Add a seqlock in struct neighbour to protect neigh->ha[], and avoid
dirtying neighbour in stress situation (many different flows / dsts)Dirtying takes place because of read_lock(&n->lock) and n->used writes.
Switching to a seqlock, and writing n->used only on jiffies changes
permits less dirtying.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
06 Oct, 2010
1 commit
-
David
This is the first step for RCU conversion of neigh code.
Next patches will convert hash_buckets[] and "struct neighbour" to RCU
protected objects.Thanks
[PATCH net-next] net neigh: RCU conversion of neigh hash table
Instead of storing hash_buckets, hash_mask and hash_rnd in "struct
neigh_table", a new structure is defined :struct neigh_hash_table {
struct neighbour **hash_buckets;
unsigned int hash_mask;
__u32 hash_rnd;
struct rcu_head rcu;
};And "struct neigh_table" has an RCU protected pointer to such a
neigh_hash_table.This means the signature of (*hash)() function changed: We need to add a
third parameter with the actual hash_rnd value, since this is not
anymore a neigh_table field.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
30 Sep, 2010
1 commit
-
arp_broken_ops is only used in arp.c
Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller
24 Sep, 2010
1 commit
-
Change "return (EXPR);" to "return EXPR;"
return is not a function, parentheses are not required.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller