29 Dec, 2011

1 commit


20 Dec, 2011

1 commit


14 Dec, 2011

1 commit


06 Dec, 2011

1 commit


01 Dec, 2011

3 commits


27 Nov, 2011

1 commit


26 Nov, 2011

1 commit


14 Nov, 2011

1 commit

  • Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
    > From: David Miller
    > Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
    >
    > > From: Eric Dumazet
    > > Date: Wed, 09 Nov 2011 12:14:09 +0100
    > >
    > >> unres_qlen is the number of frames we are able to queue per unresolved
    > >> neighbour. Its default value (3) was never changed and is responsible
    > >> for strange drops, especially if IP fragments are used, or multiple
    > >> sessions start in parallel. Even a single tcp flow can hit this limit.
    > > ...
    > >
    > > Ok, I've applied this, let's see what happens :-)
    >
    > Early answer, build fails.
    >
    > Please test build this patch with DECNET enabled and resubmit. The
    > decnet neigh layer still refers to the removed ->queue_len member.
    >
    > Thanks.

    Ouch, this was fixed on one machine yesterday, but not the other one I
    used this morning, sorry.

    [PATCH V5 net-next] neigh: new unresolved queue limits

    unres_qlen is the number of frames we are able to queue per unresolved
    neighbour. Its default value (3) was never changed and is responsible
    for strange drops, especially if IP fragments are used, or multiple
    sessions start in parallel. Even a single tcp flow can hit this limit.

    $ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108
    PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data.
    8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms

    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Nov, 2011

1 commit

  • Whatever situations make this state legitimate when SMP
    also would be legitimate when !SMP and f.e. preemption is
    enabled.

    This is dubious enough that we should just delete it entirely. If we
    want to add debugging for neigh timer races, better more thorough
    mechanisms are needed.

    Signed-off-by: David S. Miller

    David S. Miller
     

20 Oct, 2011

1 commit

  • when use dst_get_neighbour to get neighbour, we need
    rcu_read_lock to protect, since dst_get_neighbour uses
    rcu_dereference.

    The bug was reported by Ari Savolainen

    [ 105.612095]
    [ 105.612096] ===================================================
    [ 105.612100] [ INFO: suspicious rcu_dereference_check() usage. ]
    [ 105.612101] ---------------------------------------------------
    [ 105.612103] include/net/dst.h:91 invoked rcu_dereference_check()
    without protection!
    [ 105.612105]
    [ 105.612106] other info that might help us debug this:
    [ 105.612106]
    [ 105.612108]
    [ 105.612108] rcu_scheduler_active = 1, debug_locks = 0
    [ 105.612110] 1 lock held by dnsmasq/2618:
    [ 105.612111] #0: (rtnl_mutex){+.+.+.}, at: []
    rtnl_lock+0x17/0x20
    [ 105.612120]
    [ 105.612121] stack backtrace:
    [ 105.612123] Pid: 2618, comm: dnsmasq Not tainted 3.1.0-rc1 #41
    [ 105.612125] Call Trace:
    [ 105.612129] [] lockdep_rcu_dereference+0xbb/0xc0
    [ 105.612132] [] neigh_update+0x4f9/0x5f0
    [ 105.612135] [] ? neigh_lookup+0xe1/0x220
    [ 105.612139] [] arp_req_set+0xb8/0x230
    [ 105.612142] [] arp_ioctl+0x1bf/0x310
    [ 105.612146] [] ? lock_hrtimer_base.isra.26+0x30/0x60
    [ 105.612150] [] inet_ioctl+0x85/0x90
    [ 105.612154] [] sock_do_ioctl+0x30/0x70
    [ 105.612157] [] sock_ioctl+0x73/0x280
    [ 105.612162] [] do_vfs_ioctl+0x98/0x570
    [ 105.612165] [] ? fget_light+0x340/0x3a0
    [ 105.612168] [] sys_ioctl+0x4f/0x80
    [ 105.612172] [] system_call_fastpath+0x16/0x1b

    Reported-by: Ari Savolainen
    Signed-off-by: RongQing
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    roy.qing.li@gmail.com
     

22 Sep, 2011

1 commit

  • Conflicts:
    MAINTAINERS
    drivers/net/Kconfig
    drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
    drivers/net/ethernet/broadcom/tg3.c
    drivers/net/wireless/iwlwifi/iwl-pci.c
    drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c
    drivers/net/wireless/rt2x00/rt2800usb.c
    drivers/net/wireless/wl12xx/main.c

    David S. Miller
     

25 Aug, 2011

1 commit

  • Dave Jones reported a lockdep splat triggered by an arp_process() call
    from parp_redo().

    Commit faa9dcf793be (arp: RCU changes) is the origin of the bug, since
    it assumed arp_process() was called under rcu_read_lock(), which is not
    true in this particular path.

    Instead of adding rcu_read_lock() in parp_redo(), I chose to add it in
    neigh_proxy_process() to take care of IPv6 side too.

    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    include/linux/inetdevice.h:209 invoked rcu_dereference_check() without
    protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    4 locks held by setfiles/2123:
    #0: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: []
    walk_component+0x1ef/0x3e8
    #1: (&isec->lock){+.+.+.}, at: []
    inode_doinit_with_dentry+0x3f/0x41f
    #2: (&tbl->proxy_timer){+.-...}, at: []
    run_timer_softirq+0x157/0x372
    #3: (class){+.-...}, at: [] neigh_proxy_process
    +0x36/0x103

    stack backtrace:
    Pid: 2123, comm: setfiles Tainted: G W
    3.1.0-0.rc2.git7.2.fc16.x86_64 #1
    Call Trace:
    [] lockdep_rcu_dereference+0xa7/0xaf
    [] __in_dev_get_rcu+0x55/0x5d
    [] arp_process+0x25/0x4d7
    [] parp_redo+0xe/0x10
    [] neigh_proxy_process+0x9a/0x103
    [] run_timer_softirq+0x218/0x372
    [] ? run_timer_softirq+0x157/0x372
    [] ? neigh_stat_seq_open+0x41/0x41
    [] ? mark_held_locks+0x6d/0x95
    [] __do_softirq+0x112/0x25a
    [] call_softirq+0x1c/0x30
    [] do_softirq+0x4b/0xa2
    [] irq_exit+0x5d/0xcf
    [] smp_apic_timer_interrupt+0x7c/0x8a
    [] apic_timer_interrupt+0x73/0x80
    [] ? trace_hardirqs_on_caller+0x121/0x158
    [] ? __slab_free+0x30/0x24c
    [] ? __slab_free+0x2e/0x24c
    [] ? inode_doinit_with_dentry+0x2e9/0x41f
    [] ? inode_doinit_with_dentry+0x2e9/0x41f
    [] ? inode_doinit_with_dentry+0x2e9/0x41f
    [] kfree+0x108/0x131
    [] inode_doinit_with_dentry+0x2e9/0x41f
    [] selinux_d_instantiate+0x1c/0x1e
    [] security_d_instantiate+0x21/0x23
    [] d_instantiate+0x5c/0x61
    [] d_splice_alias+0xbc/0xd2
    [] ext4_lookup+0xba/0xeb
    [] d_alloc_and_lookup+0x45/0x6b
    [] walk_component+0x215/0x3e8
    [] lookup_last+0x3b/0x3d
    [] path_lookupat+0x82/0x2af
    [] ? might_fault+0xa5/0xac
    [] ? might_fault+0x5c/0xac
    [] ? getname_flags+0x31/0x1ca
    [] do_path_lookup+0x28/0x97
    [] user_path_at+0x59/0x96
    [] ? cp_new_stat+0xf7/0x10d
    [] vfs_fstatat+0x44/0x6e
    [] vfs_lstat+0x1e/0x20
    [] sys_newlstat+0x1a/0x33
    [] ? trace_hardirqs_on_caller+0x121/0x158
    [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [] system_call_fastpath+0x16/0x1b

    Reported-by: Dave Jones
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Aug, 2011

1 commit

  • Remove the artificial HZ latency on arp resolution.

    Instead of firing a timer in one jiffy (up to 10 ms if HZ=100), lets
    send the ARP message immediately.

    Before patch :

    # arp -d 192.168.20.108 ; ping -c 3 192.168.20.108
    PING 192.168.20.108 (192.168.20.108) 56(84) bytes of data.
    64 bytes from 192.168.20.108: icmp_seq=1 ttl=64 time=9.91 ms
    64 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.065 ms
    64 bytes from 192.168.20.108: icmp_seq=3 ttl=64 time=0.061 ms

    After patch :

    $ arp -d 192.168.20.108 ; ping -c 3 192.168.20.108
    PING 192.168.20.108 (192.168.20.108) 56(84) bytes of data.
    64 bytes from 192.168.20.108: icmp_seq=1 ttl=64 time=0.152 ms
    64 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.064 ms
    64 bytes from 192.168.20.108: icmp_seq=3 ttl=64 time=0.074 ms

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

18 Jul, 2011

2 commits


17 Jul, 2011

4 commits


14 Jul, 2011

1 commit

  • Now that there is a one-to-one correspondance between neighbour
    and hh_cache entries, we no longer need:

    1) dynamic allocation
    2) attachment to dst->hh
    3) refcounting

    Initialization of the hh_cache entry is indicated by hh_len
    being non-zero, and such initialization is always done with
    the neighbour's lock held as a writer.

    Signed-off-by: David S. Miller

    David S. Miller
     

13 Jul, 2011

2 commits


11 Jul, 2011

2 commits


10 Jun, 2011

1 commit

  • The message size allocated for rtnl ifinfo dumps was limited to
    a single page. This is not enough for additional interface info
    available with devices that support SR-IOV and caused a bug in
    which VF info would not be displayed if more than approximately
    40 VFs were created per interface.

    Implement a new function pointer for the rtnl_register service that will
    calculate the amount of data required for the ifinfo dump and allocate
    enough data to satisfy the request.

    Signed-off-by: Greg Rose
    Signed-off-by: Jeff Kirsher

    Greg Rose
     

21 Jan, 2011

1 commit


20 Dec, 2010

1 commit


21 Oct, 2010

1 commit


12 Oct, 2010

2 commits

  • Add a seqlock in struct neighbour to protect neigh->ha[], and avoid
    dirtying neighbour in stress situation (many different flows / dsts)

    Dirtying takes place because of read_lock(&n->lock) and n->used writes.

    Switching to a seqlock, and writing n->used only on jiffies changes
    permits less dirtying.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • When a new dst is used to send a frame, neigh_resolve_output() tries to
    associate an struct hh_cache to this dst, calling neigh_hh_init() with
    the neigh rwlock write locked.

    Most of the time, hh_cache is already known and linked into neighbour,
    so we find it and increment its refcount.

    This patch changes the logic so that we call neigh_hh_init() with
    neighbour lock read locked only, so that fast path can be run in
    parallel by concurrent cpus.

    This brings part of the speedup we got with commit c7d4426a98a5f
    (introduce DST_NOCACHE flag) for non cached dsts, even for cached ones,
    removing one of the contention point that routers hit on multiqueue
    enabled machines.

    Further improvements would need to use a seqlock instead of an rwlock to
    protect neigh->ha[], to not dirty neigh too often and remove two atomic
    ops.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

07 Oct, 2010

1 commit

  • This is the second step for neighbour RCU conversion.

    (first was commit d6bf7817 : RCU conversion of neigh hash table)

    neigh_lookup() becomes lockless, but still take a reference on found
    neighbour. (no more read_lock()/read_unlock() on tbl->lock)

    struct neighbour gets an additional rcu_head field and is freed after an
    RCU grace period.

    Future work would need to eventually not take a reference on neighbour
    for temporary dst (DST_NOCACHE), but this would need dst->_neighbour to
    use a noref bit like we did for skb->_dst.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 Oct, 2010

2 commits

  • David

    This is the first step for RCU conversion of neigh code.

    Next patches will convert hash_buckets[] and "struct neighbour" to RCU
    protected objects.

    Thanks

    [PATCH net-next] net neigh: RCU conversion of neigh hash table

    Instead of storing hash_buckets, hash_mask and hash_rnd in "struct
    neigh_table", a new structure is defined :

    struct neigh_hash_table {
    struct neighbour **hash_buckets;
    unsigned int hash_mask;
    __u32 hash_rnd;
    struct rcu_head rcu;
    };

    And "struct neigh_table" has an RCU protected pointer to such a
    neigh_hash_table.

    This means the signature of (*hash)() function changed: We need to add a
    third parameter with the actual hash_rnd value, since this is not
    anymore a neigh_table field.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • neigh_delete() and neigh_add() dont need to touch device refcount,
    we hold RTNL when calling them, so device cannot disappear under us.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 Oct, 2010

1 commit

  • While doing stress tests with IP route cache disabled, and multi queue
    devices, I noticed a very high contention on one rwlock used in
    neighbour code.

    When many cpus are trying to send frames (possibly using a high
    performance multiqueue device) to the same neighbour, they fight for the
    neigh->lock rwlock in order to call neigh_hh_init(), and fight on
    hh->hh_refcnt (a pair of atomic_inc/atomic_dec_and_test())

    But we dont need to call neigh_hh_init() for dst that are used only
    once. It costs four atomic operations at least, on two contended cache
    lines, plus the high contention on neigh->lock rwlock.

    Introduce a new dst flag, DST_NOCACHE, that is set when dst was not
    inserted in route cache.

    With the stress test bench, sending 160000000 frames on one neighbour,
    results are :

    Before patch:

    real 2m28.406s
    user 0m11.781s
    sys 36m17.964s

    After patch:

    real 1m26.532s
    user 0m12.185s
    sys 20m3.903s

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Sep, 2010

1 commit


15 Jul, 2010

1 commit

  • When configuring DMVPN (GRE + openNHRP) and a GRE remote
    address is configured a kernel Oops is observed. The
    obserseved Oops is caused by a NULL header_ops pointer
    (neigh->dev->header_ops) in neigh_update_hhs() when

    void (*update)(struct hh_cache*, const struct net_device*, const unsigned char *)
    = neigh->dev->header_ops->cache_update;

    is executed. The dev associated with the NULL header_ops is
    the GRE interface. This patch guards against the
    possibility that header_ops is NULL.

    This Oops was first observed in kernel version 2.6.26.8.

    Signed-off-by: Doug Kehn
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Doug Kehn
     

28 May, 2010

1 commit

  • commit 7fee226ad23 (net: add a noref bit on skb dst) missed one spot
    where an skb is enqueued, with a possibly not refcounted dst entry.

    __neigh_event_send() inserts skb into arp_queue, so we must make sure
    dst entry is refcounted, or dst entry can be freed by garbage collector
    after caller exits from rcu protected section.

    Reported-by: Ingo Molnar
    Tested-by: Ingo Molnar
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo