29 Dec, 2011
1 commit
-
In order to perform a proper universal hash on a vector of integers,
we have to use different universal hashes on each vector element.Which means we need 4 different hash randoms for ipv6.
Signed-off-by: David S. Miller
20 Dec, 2011
1 commit
-
This reverts commit 5c3ddec73d01a1fae9409c197078cb02c42238c3.
S390 qeth driver actually still uses the setup ops.
Reported-by: Frank Blaschka
Signed-off-by: David S. Miller
14 Dec, 2011
1 commit
-
It's simpler to just keep these things out until there is a real user
of them, so we can see what the needs actually are, rather than keep
these things around as useless overhead.Signed-off-by: David S. Miller
06 Dec, 2011
1 commit
-
To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.Signed-off-by: David S. Miller
Acked-by: Roland Dreier
01 Dec, 2011
3 commits
-
If the neigh entry has device private state, it will need
constructor/destructor ops.Signed-off-by: David S. Miller
-
netdev->neigh_priv_len records the private area length.
This will trigger for neigh_table objects which set tbl->entry_size
to zero, and the first instances of this will be forthcoming.Signed-off-by: David S. Miller
-
We are going to alloc for device specific private areas for
neighbour entries, and in order to do that we have to move
away from the fixed allocation size enforced by using
neigh_table->kmem_cachepAs a nice side effect we can now use kfree_rcu().
Signed-off-by: David S. Miller
27 Nov, 2011
1 commit
-
Conflicts:
net/ipv4/inet_diag.c
26 Nov, 2011
1 commit
-
Skip entries from foreign network namespaces.
Signed-off-by: Jorge Boncompte [DTI2]
Signed-off-by: David S. Miller
14 Nov, 2011
1 commit
-
Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
> From: David Miller
> Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
>
> > From: Eric Dumazet
> > Date: Wed, 09 Nov 2011 12:14:09 +0100
> >
> >> unres_qlen is the number of frames we are able to queue per unresolved
> >> neighbour. Its default value (3) was never changed and is responsible
> >> for strange drops, especially if IP fragments are used, or multiple
> >> sessions start in parallel. Even a single tcp flow can hit this limit.
> > ...
> >
> > Ok, I've applied this, let's see what happens :-)
>
> Early answer, build fails.
>
> Please test build this patch with DECNET enabled and resubmit. The
> decnet neigh layer still refers to the removed ->queue_len member.
>
> Thanks.Ouch, this was fixed on one machine yesterday, but not the other one I
used this morning, sorry.[PATCH V5 net-next] neigh: new unresolved queue limits
unres_qlen is the number of frames we are able to queue per unresolved
neighbour. Its default value (3) was never changed and is responsible
for strange drops, especially if IP fragments are used, or multiple
sessions start in parallel. Even a single tcp flow can hit this limit.$ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108
PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data.
8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 msSigned-off-by: David S. Miller
02 Nov, 2011
1 commit
-
Whatever situations make this state legitimate when SMP
also would be legitimate when !SMP and f.e. preemption is
enabled.This is dubious enough that we should just delete it entirely. If we
want to add debugging for neigh timer races, better more thorough
mechanisms are needed.Signed-off-by: David S. Miller
20 Oct, 2011
1 commit
-
when use dst_get_neighbour to get neighbour, we need
rcu_read_lock to protect, since dst_get_neighbour uses
rcu_dereference.The bug was reported by Ari Savolainen
[ 105.612095]
[ 105.612096] ===================================================
[ 105.612100] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 105.612101] ---------------------------------------------------
[ 105.612103] include/net/dst.h:91 invoked rcu_dereference_check()
without protection!
[ 105.612105]
[ 105.612106] other info that might help us debug this:
[ 105.612106]
[ 105.612108]
[ 105.612108] rcu_scheduler_active = 1, debug_locks = 0
[ 105.612110] 1 lock held by dnsmasq/2618:
[ 105.612111] #0: (rtnl_mutex){+.+.+.}, at: []
rtnl_lock+0x17/0x20
[ 105.612120]
[ 105.612121] stack backtrace:
[ 105.612123] Pid: 2618, comm: dnsmasq Not tainted 3.1.0-rc1 #41
[ 105.612125] Call Trace:
[ 105.612129] [] lockdep_rcu_dereference+0xbb/0xc0
[ 105.612132] [] neigh_update+0x4f9/0x5f0
[ 105.612135] [] ? neigh_lookup+0xe1/0x220
[ 105.612139] [] arp_req_set+0xb8/0x230
[ 105.612142] [] arp_ioctl+0x1bf/0x310
[ 105.612146] [] ? lock_hrtimer_base.isra.26+0x30/0x60
[ 105.612150] [] inet_ioctl+0x85/0x90
[ 105.612154] [] sock_do_ioctl+0x30/0x70
[ 105.612157] [] sock_ioctl+0x73/0x280
[ 105.612162] [] do_vfs_ioctl+0x98/0x570
[ 105.612165] [] ? fget_light+0x340/0x3a0
[ 105.612168] [] sys_ioctl+0x4f/0x80
[ 105.612172] [] system_call_fastpath+0x16/0x1bReported-by: Ari Savolainen
Signed-off-by: RongQing
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
22 Sep, 2011
1 commit
-
Conflicts:
MAINTAINERS
drivers/net/Kconfig
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
drivers/net/ethernet/broadcom/tg3.c
drivers/net/wireless/iwlwifi/iwl-pci.c
drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c
drivers/net/wireless/rt2x00/rt2800usb.c
drivers/net/wireless/wl12xx/main.c
25 Aug, 2011
1 commit
-
Dave Jones reported a lockdep splat triggered by an arp_process() call
from parp_redo().Commit faa9dcf793be (arp: RCU changes) is the origin of the bug, since
it assumed arp_process() was called under rcu_read_lock(), which is not
true in this particular path.Instead of adding rcu_read_lock() in parp_redo(), I chose to add it in
neigh_proxy_process() to take care of IPv6 side too.===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
include/linux/inetdevice.h:209 invoked rcu_dereference_check() without
protection!other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
4 locks held by setfiles/2123:
#0: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: []
walk_component+0x1ef/0x3e8
#1: (&isec->lock){+.+.+.}, at: []
inode_doinit_with_dentry+0x3f/0x41f
#2: (&tbl->proxy_timer){+.-...}, at: []
run_timer_softirq+0x157/0x372
#3: (class){+.-...}, at: [] neigh_proxy_process
+0x36/0x103stack backtrace:
Pid: 2123, comm: setfiles Tainted: G W
3.1.0-0.rc2.git7.2.fc16.x86_64 #1
Call Trace:
[] lockdep_rcu_dereference+0xa7/0xaf
[] __in_dev_get_rcu+0x55/0x5d
[] arp_process+0x25/0x4d7
[] parp_redo+0xe/0x10
[] neigh_proxy_process+0x9a/0x103
[] run_timer_softirq+0x218/0x372
[] ? run_timer_softirq+0x157/0x372
[] ? neigh_stat_seq_open+0x41/0x41
[] ? mark_held_locks+0x6d/0x95
[] __do_softirq+0x112/0x25a
[] call_softirq+0x1c/0x30
[] do_softirq+0x4b/0xa2
[] irq_exit+0x5d/0xcf
[] smp_apic_timer_interrupt+0x7c/0x8a
[] apic_timer_interrupt+0x73/0x80
[] ? trace_hardirqs_on_caller+0x121/0x158
[] ? __slab_free+0x30/0x24c
[] ? __slab_free+0x2e/0x24c
[] ? inode_doinit_with_dentry+0x2e9/0x41f
[] ? inode_doinit_with_dentry+0x2e9/0x41f
[] ? inode_doinit_with_dentry+0x2e9/0x41f
[] kfree+0x108/0x131
[] inode_doinit_with_dentry+0x2e9/0x41f
[] selinux_d_instantiate+0x1c/0x1e
[] security_d_instantiate+0x21/0x23
[] d_instantiate+0x5c/0x61
[] d_splice_alias+0xbc/0xd2
[] ext4_lookup+0xba/0xeb
[] d_alloc_and_lookup+0x45/0x6b
[] walk_component+0x215/0x3e8
[] lookup_last+0x3b/0x3d
[] path_lookupat+0x82/0x2af
[] ? might_fault+0xa5/0xac
[] ? might_fault+0x5c/0xac
[] ? getname_flags+0x31/0x1ca
[] do_path_lookup+0x28/0x97
[] user_path_at+0x59/0x96
[] ? cp_new_stat+0xf7/0x10d
[] vfs_fstatat+0x44/0x6e
[] vfs_lstat+0x1e/0x20
[] sys_newlstat+0x1a/0x33
[] ? trace_hardirqs_on_caller+0x121/0x158
[] ? trace_hardirqs_on_thunk+0x3a/0x3f
[] system_call_fastpath+0x16/0x1bReported-by: Dave Jones
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
12 Aug, 2011
1 commit
-
Remove the artificial HZ latency on arp resolution.
Instead of firing a timer in one jiffy (up to 10 ms if HZ=100), lets
send the ARP message immediately.Before patch :
# arp -d 192.168.20.108 ; ping -c 3 192.168.20.108
PING 192.168.20.108 (192.168.20.108) 56(84) bytes of data.
64 bytes from 192.168.20.108: icmp_seq=1 ttl=64 time=9.91 ms
64 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.065 ms
64 bytes from 192.168.20.108: icmp_seq=3 ttl=64 time=0.061 msAfter patch :
$ arp -d 192.168.20.108 ; ping -c 3 192.168.20.108
PING 192.168.20.108 (192.168.20.108) 56(84) bytes of data.
64 bytes from 192.168.20.108: icmp_seq=1 ttl=64 time=0.152 ms
64 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.064 ms
64 bytes from 192.168.20.108: icmp_seq=3 ttl=64 time=0.074 msSigned-off-by: Eric Dumazet
Signed-off-by: David S. Miller
18 Jul, 2011
2 commits
-
dst_{get,set}_neighbour()
Signed-off-by: David S. Miller
-
This will get us closer to being able to do "neigh stuff"
completely independent of the underlying dst_entry for
protocols (ipv4/ipv6) that wish to do so.We will also be able to make dst entries neigh-less.
Signed-off-by: David S. Miller
17 Jul, 2011
4 commits
-
It is always dev_queue_xmit().
Signed-off-by: David S. Miller
-
It's just taking on one of two possible values, either
neigh_ops->output or dev_queue_xmit(). And this is purely depending
upon whether nud_state has NUD_CONNECTED set or not.Signed-off-by: David S. Miller
-
It's always dev_queue_xmit().
Signed-off-by: David S. Miller
-
Now that hh_cache entries are embedded inside of neighbour
entries, their lifetimes and accesses are now synchronous
to that of the encompassing neighbour object.Therefore we don't need to hook up the blackhole op to
hh_output on destroy.Signed-off-by: David S. Miller
14 Jul, 2011
1 commit
-
Now that there is a one-to-one correspondance between neighbour
and hh_cache entries, we no longer need:1) dynamic allocation
2) attachment to dst->hh
3) refcountingInitialization of the hh_cache entry is indicated by hh_len
being non-zero, and such initialization is always done with
the neighbour's lock held as a writer.Signed-off-by: David S. Miller
13 Jul, 2011
2 commits
-
This never, ever, happens.
Neighbour entries are always tied to one address family, and therefore
one set of dst_ops, and therefore one dst_ops->protocol "hh_type"
value.This capability was blindly imported by Alexey Kuznetsov when he wrote
the neighbour layer.Signed-off-by: David S. Miller
-
Signed-off-by: David S. Miller
11 Jul, 2011
2 commits
-
We need to make sure the multiplier is odd.
Signed-off-by: David S. Miller
-
And mask the hash function result by simply shifting
down the "->hash_shift" most significant bits.Currently which bits we use is arbitrary since jhash
produces entropy evenly across the whole hash function
result.But soon we'll be using universal hashing functions,
and in those cases more entropy exists in the higher
bits than the lower bits, because they use multiplies.Signed-off-by: David S. Miller
10 Jun, 2011
1 commit
-
The message size allocated for rtnl ifinfo dumps was limited to
a single page. This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.Signed-off-by: Greg Rose
Signed-off-by: Jeff Kirsher
21 Jan, 2011
1 commit
-
fix some minor issues and sparse (__rcu) warnings
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
20 Dec, 2010
1 commit
-
These macros never be used, so remove them.
Signed-off-by: Shan Wei
Signed-off-by: David S. Miller
21 Oct, 2010
1 commit
-
flush_scheduled_work() is going away. Prepare for it.
Signed-off-by: Tejun Heo
Signed-off-by: David S. Miller
12 Oct, 2010
2 commits
-
Add a seqlock in struct neighbour to protect neigh->ha[], and avoid
dirtying neighbour in stress situation (many different flows / dsts)Dirtying takes place because of read_lock(&n->lock) and n->used writes.
Switching to a seqlock, and writing n->used only on jiffies changes
permits less dirtying.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
When a new dst is used to send a frame, neigh_resolve_output() tries to
associate an struct hh_cache to this dst, calling neigh_hh_init() with
the neigh rwlock write locked.Most of the time, hh_cache is already known and linked into neighbour,
so we find it and increment its refcount.This patch changes the logic so that we call neigh_hh_init() with
neighbour lock read locked only, so that fast path can be run in
parallel by concurrent cpus.This brings part of the speedup we got with commit c7d4426a98a5f
(introduce DST_NOCACHE flag) for non cached dsts, even for cached ones,
removing one of the contention point that routers hit on multiqueue
enabled machines.Further improvements would need to use a seqlock instead of an rwlock to
protect neigh->ha[], to not dirty neigh too often and remove two atomic
ops.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
07 Oct, 2010
1 commit
-
This is the second step for neighbour RCU conversion.
(first was commit d6bf7817 : RCU conversion of neigh hash table)
neigh_lookup() becomes lockless, but still take a reference on found
neighbour. (no more read_lock()/read_unlock() on tbl->lock)struct neighbour gets an additional rcu_head field and is freed after an
RCU grace period.Future work would need to eventually not take a reference on neighbour
for temporary dst (DST_NOCACHE), but this would need dst->_neighbour to
use a noref bit like we did for skb->_dst.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
06 Oct, 2010
2 commits
-
David
This is the first step for RCU conversion of neigh code.
Next patches will convert hash_buckets[] and "struct neighbour" to RCU
protected objects.Thanks
[PATCH net-next] net neigh: RCU conversion of neigh hash table
Instead of storing hash_buckets, hash_mask and hash_rnd in "struct
neigh_table", a new structure is defined :struct neigh_hash_table {
struct neighbour **hash_buckets;
unsigned int hash_mask;
__u32 hash_rnd;
struct rcu_head rcu;
};And "struct neigh_table" has an RCU protected pointer to such a
neigh_hash_table.This means the signature of (*hash)() function changed: We need to add a
third parameter with the actual hash_rnd value, since this is not
anymore a neigh_table field.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
neigh_delete() and neigh_add() dont need to touch device refcount,
we hold RTNL when calling them, so device cannot disappear under us.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
04 Oct, 2010
1 commit
-
While doing stress tests with IP route cache disabled, and multi queue
devices, I noticed a very high contention on one rwlock used in
neighbour code.When many cpus are trying to send frames (possibly using a high
performance multiqueue device) to the same neighbour, they fight for the
neigh->lock rwlock in order to call neigh_hh_init(), and fight on
hh->hh_refcnt (a pair of atomic_inc/atomic_dec_and_test())But we dont need to call neigh_hh_init() for dst that are used only
once. It costs four atomic operations at least, on two contended cache
lines, plus the high contention on neigh->lock rwlock.Introduce a new dst flag, DST_NOCACHE, that is set when dst was not
inserted in route cache.With the stress test bench, sending 160000000 frames on one neighbour,
results are :Before patch:
real 2m28.406s
user 0m11.781s
sys 36m17.964sAfter patch:
real 1m26.532s
user 0m12.185s
sys 20m3.903sSigned-off-by: Eric Dumazet
Signed-off-by: David S. Miller
24 Sep, 2010
1 commit
-
Change "return (EXPR);" to "return EXPR;"
return is not a function, parentheses are not required.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
15 Jul, 2010
1 commit
-
When configuring DMVPN (GRE + openNHRP) and a GRE remote
address is configured a kernel Oops is observed. The
obserseved Oops is caused by a NULL header_ops pointer
(neigh->dev->header_ops) in neigh_update_hhs() whenvoid (*update)(struct hh_cache*, const struct net_device*, const unsigned char *)
= neigh->dev->header_ops->cache_update;is executed. The dev associated with the NULL header_ops is
the GRE interface. This patch guards against the
possibility that header_ops is NULL.This Oops was first observed in kernel version 2.6.26.8.
Signed-off-by: Doug Kehn
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
28 May, 2010
1 commit
-
commit 7fee226ad23 (net: add a noref bit on skb dst) missed one spot
where an skb is enqueued, with a possibly not refcounted dst entry.__neigh_event_send() inserts skb into arp_queue, so we must make sure
dst entry is refcounted, or dst entry can be freed by garbage collector
after caller exits from rcu protected section.Reported-by: Ingo Molnar
Tested-by: Ingo Molnar
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
30 Mar, 2010
1 commit
-
…it slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>