03 Aug, 2010

1 commit

  • This reverts commit 15e83ed78864d0625e87a85f09b297c0919a4797.

    As explained by Johannes Berg, the optimization made here is
    invalid. Or, at best, incomplete.

    Not only destructor invocation, but conntract entry releasing
    must be executed outside of hw IRQ context.

    So just checking "skb->destructor" is insufficient.

    Signed-off-by: David S. Miller

    David S. Miller
     

13 Jul, 2010

1 commit


25 Jun, 2010

1 commit


16 Jun, 2010

5 commits

  • This patch adds the functions __netpoll_setup/__netpoll_cleanup
    which is designed to be called recursively through ndo_netpoll_seutp.

    They must be called with RTNL held, and the caller must initialise
    np->dev and ensure that it has a valid reference count.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • This patch adds ndo_netpoll_setup as the initialisation primitive
    to complement ndo_netpoll_cleanup.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • As it stands, netpoll_setup and netpoll_cleanup have no locking
    protection whatsoever. So chaos ensures if two entities try to
    perform them on the same device.

    This patch adds RTNL to the equation. The code has been rearranged so
    that bits that do not need RTNL protection are now moved to the top of
    netpoll_setup.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The use of RCU in netpoll is incorrect in a number of places:

    1) The initial setting is lacking a write barrier.
    2) The synchronize_rcu is in the wrong place.
    3) Read barriers are missing.
    4) Some places are even missing rcu_read_lock.
    5) npinfo is zeroed after freeing.

    This patch fixes those issues. As most users are in BH context,
    this also converts the RCU usage to the BH variant.

    Signed-off-by: Herbert Xu
    Acked-by: Paul E. McKenney
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Since we have to NULL npinfo regardless of whether there is a
    ndo_netpoll_cleanup, it makes sense to do this unconditionally
    in netpoll_cleanup rather than having every driver do it by
    themselves.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

31 May, 2010

1 commit

  • netpoll does an interesting work in zap_completion_queue(), but this was
    before we did skb orphaning before delivering packets to device.

    It now makes sense to add a test in dev_kfree_skb_irq() to not queue a
    skb if already orphaned, and to remove netpoll zap_completion_queue() as
    a bonus.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

06 May, 2010

1 commit

  • This whole patchset is for adding netpoll support to bridge and bonding
    devices. I already tested it for bridge, bonding, bridge over bonding,
    and bonding over bridge. It looks fine now.

    To make bridge and bonding support netpoll, we need to adjust
    some netpoll generic code. This patch does the following things:

    1) introduce two new priv_flags for struct net_device:
    IFF_IN_NETPOLL which identifies we are processing a netpoll;
    IFF_DISABLE_NETPOLL is used to disable netpoll support for a device
    at run-time;

    2) introduce one new method for netdev_ops:
    ->ndo_netpoll_cleanup() is used to clean up netpoll when a device is
    removed.

    3) introduce netpoll_poll_dev() which takes a struct net_device * parameter;
    export netpoll_send_skb() and netpoll_poll_dev() which will be used later;

    4) hide a pointer to struct netpoll in struct netpoll_info, ditto.

    5) introduce ->real_dev for struct netpoll.

    6) introduce a new status NETDEV_BONDING_DESLAE, which is used to disable
    netconsole before releasing a slave, to avoid deadlocks.

    Cc: David Miller
    Cc: Neil Horman
    Signed-off-by: WANG Cong
    Signed-off-by: David S. Miller

    WANG Cong
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

23 Mar, 2010

1 commit

  • v2: update according to Frans' comments.

    Currently, if we leave spaces before dst port,
    netconsole will silently accept it as 0. Warn about this.

    Also, when spaces appear in other places, make them
    visible in error messages.

    Signed-off-by: WANG Cong
    Cc: David Miller
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Amerigo Wang
     

17 Mar, 2010

1 commit

  • Stanse found that one error path in netpoll_setup dereferences npinfo
    even though it is NULL. Avoid that by adding new label and go to that
    instead.

    Signed-off-by: Jiri Slaby
    Cc: Daniel Borkmann
    Cc: David S. Miller
    Acked-by: chavey@google.com
    Acked-by: Matt Mackall
    Signed-off-by: David S. Miller

    Jiri Slaby
     

14 Jan, 2010

1 commit


02 Sep, 2009

1 commit


24 Aug, 2009

1 commit


10 Jul, 2009

1 commit


09 Jul, 2009

2 commits

  • Using early netconsole and gianfar driver this error pops up:

    netconsole: timeout waiting for carrier

    It appears that net/core/netpoll.c:netpoll_setup() is using
    cond_resched() in a loop waiting for a carrier.

    The thing is that cond_resched() is a no-op when system_state !=
    SYSTEM_RUNNING, and so drivers/net/phy/phy.c's state_queue is never
    scheduled, therefore link detection doesn't work.

    I belive that the main problem is in cond_resched()[1], but despite
    how the cond_resched() story ends, it might be a good idea to call
    msleep(1) instead of cond_resched(), as suggested by Andrew Morton.

    [1] http://lkml.org/lkml/2009/7/7/463

    Signed-off-by: Anton Vorontsov
    Signed-off-by: David S. Miller

    Anton Vorontsov
     
  • Some PHYs require longer timeouts for carrier detection, and
    auto-negotiation process may take indefinite amount of time.

    It may be inconvenient to force longer timeouts for sane PHYs,
    so let's introduce a kernel command line option.

    Since we're using module_param(), the option also can be
    changed in runtime.

    Signed-off-by: Anton Vorontsov
    Signed-off-by: David S. Miller

    Anton Vorontsov
     

15 Jun, 2009

1 commit


26 May, 2009

1 commit

  • We would like to get rid of netdev->trans_start = jiffies; that about all net
    drivers have to use in their start_xmit() function, and use txq->trans_start
    instead.

    This can be done generically in core network, as suggested by David.

    Some devices, (particularly loopback) dont need trans_start update, because
    they dont have transmit watchdog. We could add a new device flag, or rely
    on fact that txq->tran_start can be updated is txq->xmit_lock_owner is
    different than -1. Use a helper function to hide our choice.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

22 May, 2009

2 commits

  • Reproted by Stephen Rothwell.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Patch to add the ability to detect drops in hardware interfaces via dropwatch.
    Adds a tracepoint to net_rx_action to signal everytime a napi instance is
    polled. The dropmon code then periodically checks to see if the rx_frames
    counter has changed, and if so, adds a drop notification to the netlink
    protocol, using the reserved all-0's vector to indicate the drop location was in
    hardware, rather than somewhere in the code.

    Signed-off-by: Neil Horman

    include/linux/net_dropmon.h | 8 ++
    include/trace/napi.h | 11 +++
    net/core/dev.c | 5 +
    net/core/drop_monitor.c | 124 ++++++++++++++++++++++++++++++++++++++++++--
    net/core/net-traces.c | 4 +
    net/core/netpoll.c | 2
    6 files changed, 149 insertions(+), 5 deletions(-)
    Signed-off-by: David S. Miller

    Neil Horman
     

18 May, 2009

1 commit


29 Mar, 2009

1 commit


16 Dec, 2008

1 commit


10 Dec, 2008

1 commit

  • A few months back a race was discused between the netpoll napi service
    path, and the fast path through net_rx_action:
    http://kerneltrap.org/mailarchive/linux-netdev/2007/10/16/345470

    A patch was submitted for that bug, but I think we missed a case.

    Consider the following scenario:

    INITIAL STATE
    CPU0 has one napi_struct A on its poll_list
    CPU1 is calling netpoll_send_skb and needs to call poll_napi on the same
    napi_struct A that CPU0 has on its list

    CPU0 CPU1
    net_rx_action poll_napi
    !list_empty (returns true) locks poll_lock for A
    poll_one_napi
    napi->poll
    netif_rx_complete
    __napi_complete
    (removes A from poll_list)
    list_entry(list->next)

    In the above scenario, net_rx_action assumes that the per-cpu poll_list is
    exclusive to that cpu. netpoll of course violates that, and because the netpoll
    path can dequeue from the poll list, its possible for CPU0 to detect a non-empty
    list at the top of the while loop in net_rx_action, but have it become empty by
    the time it calls list_entry. Since the poll_list isn't surrounded by any other
    structure, the returned data from that list_entry call in this situation is
    garbage, and any number of crashes can result based on what exactly that garbage
    is.

    Given that its not fasible for performance reasons to place exclusive locks
    arround each cpus poll list to provide that mutal exclusion, I think the best
    solution is modify the netpoll path in such a way that we continue to guarantee
    that the poll_list for a cpu is in fact exclusive to that cpu. To do this I've
    implemented the patch below. It adds an additional bit to the state field in
    the napi_struct. When executing napi->poll from the netpoll_path, this bit will
    be set. When a driver calls netif_rx_complete, if that bit is set, it will not
    remove the napi_struct from the poll_list. That work will be saved for the next
    iteration of net_rx_action.

    I've tested this and it seems to work well. About the biggest drawback I can
    see to it is the fact that it might result in an extra loop through
    net_rx_action in the event that the device is actually contended for (i.e. the
    netpoll path actually preforms all the needed work no the device, and the call
    to net_rx_action winds up doing nothing, except removing the napi_struct from
    the poll_list. However I think this is probably a small price to pay, given
    that the alternative is a crash.

    Signed-off-by: Neil Horman
    Signed-off-by: David S. Miller

    Neil Horman
     

21 Nov, 2008

1 commit

  • This patch moves neigh_setup and hard_start_xmit into the network device ops
    structure. For bisection, fix all the previously converted drivers as well.
    Bonding driver took the biggest hit on this.

    Added a prefetch of the hard_start_xmit in the fast path to try and reduce
    any impact this would have.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

20 Nov, 2008

2 commits

  • This patch changes the network device internal API to move adminstrative
    operations out of the network device structure and into a separate structure.

    This patch involves some hackery to maintain compatablity between the
    new and old model, so all 300+ drivers don't have to be changed at once.
    For drivers that aren't converted yet, the netdevice_ops virt function list
    still resides in the net_device structure. For old protocols, the new
    net_device_ops are copied out to the old net_device pointers.

    After the transistion is completed the nag message can be changed to
    an WARN_ON, and the compatiablity code can be made configurable.

    Some function pointers aren't moved:
    * destructor can't be in net_device_ops because
    it may need to be referenced after the module is unloaded.
    * neighbor setup is manipulated in a couple of places that need special
    consideration
    * hard_start_xmit is in the fast path for transmit.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • The first argument to csum_partial is const void *
    casts to char/u8 * are not necessary

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

28 Oct, 2008

1 commit

  • This converts pretty much everything to print_mac. There were
    a few things that had conflicts which I have just dropped for
    now, no harm done.

    I've built an allyesconfig with this and looked at the files
    that weren't built very carefully, but it's a huge patch.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

01 Aug, 2008

1 commit

  • When support for multiple TX queues were added, the
    netif_tx_lock() routines we converted to iterate over
    all TX queues and grab each queue's spinlock.

    This causes heartburn for lockdep and it's not a healthy
    thing to do with lots of TX queues anyways.

    So modify this to use a top-level lock and a "frozen"
    state for the individual TX queues.

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Jul, 2008

1 commit

  • This effectively "flips the switch" by making the core networking
    and multiqueue-aware drivers use the new TX multiqueue structures.

    Non-multiqueue drivers need no changes. The interfaces they use such
    as netif_stop_queue() degenerate into an operation on TX queue zero.
    So everything "just works" for them.

    Code that really wants to do "X" to all TX queues now invokes a
    routine that does so, such as netif_tx_wake_all_queues(),
    netif_tx_stop_all_queues(), etc.

    pktgen and netpoll required a little bit more surgery than the others.

    In particular the pktgen changes, whilst functional, could be largely
    improved. The initial check in pktgen_xmit() will sometimes check the
    wrong queue, which is mostly harmless. The thing to do is probably to
    invoke fill_packet() earlier.

    The bulk of the netpoll changes is to make the code operate solely on
    the TX queue indicated by by the SKB queue mapping.

    Setting of the SKB queue mapping is entirely confined inside of
    net/core/dev.c:dev_pick_tx(). If we end up needing any kind of
    special semantics (drops, for example) it will be implemented here.

    Finally, we now have a "real_num_tx_queues" which is where the driver
    indicates how many TX queues are actually active.

    With IGB changes from Jeff Kirsher.

    Signed-off-by: David S. Miller

    David S. Miller
     

13 May, 2008

1 commit

  • This patch adds needed_headroom/needed_tailroom members to struct
    net_device and updates many places that allocate sbks to use them. Not
    all of them can be converted though, and I'm sure I missed some (I
    mostly grepped for LL_RESERVED_SPACE)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

21 Mar, 2008

2 commits


06 Mar, 2008

1 commit


05 Mar, 2008

1 commit

  • Based upon a report by Andrew Morton and code analysis done
    by Jarek Poplawski.

    This reverts 33f807ba0d9259e7c75c7a2ce8bd2787e5b540c7 ("[NETPOLL]:
    Kill NETPOLL_RX_DROP, set but never tested.") and
    c7b6ea24b43afb5749cb704e143df19d70e23dea ("[NETPOLL]: Don't need
    rx_flags.").

    The rx_flags did get tested for zero vs. non-zero and therefore we do
    need those tests and that code which sets NETPOLL_RX_DROP et al.

    Signed-off-by: David S. Miller

    David S. Miller
     

04 Mar, 2008

1 commit

  • There are some place, that calculate the ARP header length. These
    calculations are correct, but
    a) some operate with "magic" constants,
    b) enlarge the code length (sometimes at the cost of coding style),
    c) are not informative from the first glance.

    The proposal is to introduce a helper, that includes all the good
    sides of these calculations.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

29 Jan, 2008

1 commit