10 Dec, 2013

30 commits

  • Instead of open-coding a PHY reset through the MII BMCR register, use
    phy_init_hw() which does this for us and ensures that PHY device fixups
    are also applied. We also remove a call to ethernet_phy_reset() which is
    now unncessary since phy_attach() calls phy_attach_direct() which in
    turns calls phy_init_hw().

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Instead of open-coding a PHY reset through the MII BMCR register, use
    phy_init_hw() which does that for us and will also make sure that PHY
    fixups are applied if required. We also remove a call to phy_reset()
    due to the following sequence of calls in the driver:

    phy_scan()
    -> phy_connect()
    -> phy_connect_direct()
    -> phy_attach_direct()
    -> phy_init_hw()

    and we only have a call to phy_init() after phy_scan().

    Signed-off-by: Florian Fainelli
    Tested-by: Sebastian Hesselbarth
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • There are quite a lot of drivers touching a PHY device MII_BMCR
    register to reset the PHY without taking care of:

    1) ensuring that BMCR_RESET is cleared after a given timeout
    2) the PHY state machine resuming to the proper state and re-applying
    potentially changed settings such as auto-negotiation

    Introduce phy_poll_reset() which will take care of polling the MII_BMCR
    for the BMCR_RESET bit to be cleared after a given timeout or return a
    timeout error code.

    In order to make sure the PHY is in a correct state, phy_init_hw() first
    issues a software reset through MII_BMCR and then applies any fixups.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • The PHY is already reset during driver probing, and this manual reset
    after calling phy_start() will wipe out board-specific PHY fixups and
    driver specific configuration initialization. Remove that explicit PHY
    reset.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • In case the greth driver is bound to anything but the Generic PHY
    driver or the PHY has a special read_status callback implemented,
    unexpected things will happen. Make sure we that we use
    phy_read_status() which does the proper abstraction of calling the
    driver specific read_status() callback for a given PHY.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Use phy_init_hw() instead of open-coding it in phy_mii_ioctl(), this
    improves consistenty and makes sure that we will not duplicate the same
    routine somewhere else.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • The PHY library already reads the MII_STAT1000 and MII_LPA registers in
    genphy_read_status(), so extend it to also populate the PHY device link
    partner advertised features such that we can feed this back into ethtool
    when asked for it in phy_ethtool_gset().

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • By checking related codes, it is impossible that ret > len or total_len,
    so we should remove some useless codes in both above functions.

    Signed-off-by: Zhi Yong Wu
    Signed-off-by: David S. Miller

    Zhi Yong Wu
     
  • By checking related codes, it is impossible that ret > len or total_len,
    so we should remove some useless coeds in both above functions.

    Signed-off-by: Zhi Yong Wu
    Signed-off-by: David S. Miller

    Zhi Yong Wu
     
  • The way that flow control works without this patch is that, in start_xmit()
    the code uses xenvif_count_skb_slots() to predict how many slots
    xenvif_gop_skb() will consume and then adds this to a 'req_cons_peek'
    counter which it then uses to determine if the shared ring has that amount
    of space available by checking whether 'req_prod' has passed that value.
    If the ring doesn't have space the tx queue is stopped.
    xenvif_gop_skb() will then consume slots and update 'req_cons' and issue
    responses, updating 'rsp_prod' as it goes. The frontend will consume those
    responses and post new requests, by updating req_prod. So, req_prod chases
    req_cons which chases rsp_prod, and can never exceed that value. Thus if
    xenvif_count_skb_slots() ever returns a number of slots greater than
    xenvif_gop_skb() uses, req_cons_peek will get to a value that req_prod cannot
    possibly achieve (since it's limited by the 'real' req_cons) and, if this
    happens enough times, req_cons_peek gets more than a ring size ahead of
    req_cons and the tx queue then remains stopped forever waiting for an
    unachievable amount of space to become available in the ring.

    Having two routines trying to calculate the same value is always going to be
    fragile, so this patch does away with that. All we essentially need to do is
    make sure that we have 'enough stuff' on our internal queue without letting
    it build up uncontrollably. So start_xmit() makes a cheap optimistic check
    of how much space is needed for an skb and only turns the queue off if that
    is unachievable. net_rx_action() is the place where we could do with an
    accurate predicition but, since that has proven tricky to calculate, a cheap
    worse-case (but not too bad) estimate is all we really need since the only
    thing we *must* prevent is xenvif_gop_skb() consuming more slots than are
    available.

    Without this patch I can trivially stall netback permanently by just doing
    a large guest to guest file copy between two Windows Server 2008R2 VMs on a
    single host.

    Patch tested with frontends in:
    - Windows Server 2008R2
    - CentOS 6.0
    - Debian Squeeze
    - Debian Wheezy
    - SLES11

    Signed-off-by: Paul Durrant
    Cc: Wei Liu
    Cc: Ian Campbell
    Cc: David Vrabel
    Cc: Annie Li
    Cc: Konrad Rzeszutek Wilk
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Paul Durrant
     
  • struct 'tipc_bearer' is a generic representation of the underlying
    media type, and exists in a one-to-one relationship to each interface
    TIPC is using. The struct contains a 'blocked' flag that mirrors the
    operational and execution state of the represented interface, and is
    updated through notification calls from the latter. The users of
    tipc_bearer are checking this flag before each attempt to send a
    packet via the interface.

    This state mirroring serves no purpose in the current code base. TIPC
    links will not discover a media failure any faster through this
    mechanism, and in reality the flag only adds overhead at packet
    sending and reception.

    Furthermore, the fact that the flag needs to be protected by a spinlock
    aggregated into tipc_bearer has turned out to cause a serious and
    completely unnecessary deadlock problem.

    CPU0 CPU1
    ---- ----
    Time 0: bearer_disable() link_timeout()
    Time 1: spin_lock_bh(&b_ptr->lock) tipc_link_push_queue()
    Time 2: tipc_link_delete() tipc_bearer_blocked(b_ptr)
    Time 3: k_cancel_timer(&req->timer) spin_lock_bh(&b_ptr->lock)
    Time 4: del_timer_sync(&req->timer)

    I.e., del_timer_sync() on CPU0 never returns, because the timer handler
    on CPU1 is waiting for the bearer lock.

    We eliminate the 'blocked' flag from struct tipc_bearer, along with all
    tests on this flag. This not only resolves the deadlock, but also
    simplifies and speeds up the data path execution of TIPC. It also fits
    well into our ongoing effort to make the locking policy simpler and
    more manageable.

    An effect of this change is that we can get rid of functions such as
    tipc_bearer_blocked(), tipc_continue() and tipc_block_bearer().
    We replace the latter with a new function, tipc_reset_bearer(), which
    resets all links associated to the bearer immediately after an
    interface goes down.

    A user might notice one slight change in link behaviour after this
    change. When an interface goes down, (e.g. through a NETDEV_DOWN
    event) all attached links will be reset immediately, instead of
    leaving it to each link to detect the failure through a timer-driven
    mechanism. We consider this an improvement, and see no obvious risks
    with the new behavior.

    Signed-off-by: Erik Hugne
    Reviewed-by: Ying Xue
    Reviewed-by: Paul Gortmaker
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Erik Hugne
     
  • use pr_ instead of printk(LEVEL)

    Suggested-by: Joe Perches
    Signed-off-by: Wang Weidong
    Signed-off-by: David S. Miller

    wangweidong
     
  • This patch introduces a PACKET_QDISC_BYPASS socket option, that
    allows for using a similar xmit() function as in pktgen instead
    of taking the dev_queue_xmit() path. This can be very useful when
    PF_PACKET applications are required to be used in a similar
    scenario as pktgen, but with full, flexible packet payload that
    needs to be provided, for example.

    On default, nothing changes in behaviour for normal PF_PACKET
    TX users, so everything stays as is for applications. New users,
    however, can now set PACKET_QDISC_BYPASS if needed to prevent
    own packets from i) reentering packet_rcv() and ii) to directly
    push the frame to the driver.

    In doing so we can increase pps (here 64 byte packets) for
    PF_PACKET a bit:

    # CPUs -- QDISC_BYPASS -- qdisc path -- qdisc path[**]
    1 CPU == 1,509,628 pps -- 1,208,708 -- 1,247,436
    2 CPUs == 3,198,659 pps -- 2,536,012 -- 1,605,779
    3 CPUs == 4,787,992 pps -- 3,788,740 -- 1,735,610
    4 CPUs == 6,173,956 pps -- 4,907,799 -- 1,909,114
    5 CPUs == 7,495,676 pps -- 5,956,499 -- 2,014,422
    6 CPUs == 9,001,496 pps -- 7,145,064 -- 2,155,261
    7 CPUs == 10,229,776 pps -- 8,190,596 -- 2,220,619
    8 CPUs == 11,040,732 pps -- 9,188,544 -- 2,241,879
    9 CPUs == 12,009,076 pps -- 10,275,936 -- 2,068,447
    10 CPUs == 11,380,052 pps -- 11,265,337 -- 1,578,689
    11 CPUs == 11,672,676 pps -- 11,845,344 -- 1,297,412
    [...]
    20 CPUs == 11,363,192 pps -- 11,014,933 -- 1,245,081

    [**]: qdisc path with packet_rcv(), how probably most people
    seem to use it (hopefully not anymore if not needed)

    The test was done using a modified trafgen, sending a simple
    static 64 bytes packet, on all CPUs. The trick in the fast
    "qdisc path" case, is to avoid reentering packet_rcv() by
    setting the RAW socket protocol to zero, like:
    socket(PF_PACKET, SOCK_RAW, 0);

    Tradeoffs are documented as well in this patch, clearly, if
    queues are busy, we will drop more packets, tc disciplines are
    ignored, and these packets are not visible to taps anymore. For
    a pktgen like scenario, we argue that this is acceptable.

    The pointer to the xmit function has been placed in packet
    socket structure hole between cached_dev and prot_hook that
    is hot anyway as we're working on cached_dev in each send path.

    Done in joint work together with Jesper Dangaard Brouer.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • As we need it elsewhere, move the inline helper function of
    skb_needs_linearize() over to skbuff.h include file. While
    at it, also convert the return to 'bool' instead of 'int'
    and add a proper kernel doc.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Merge 'net' into 'net-next' to get the AF_PACKET bug fix that
    Daniel's direct transmit changes depend upon.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Commit e40526cb20b5 introduced a cached dev pointer, that gets
    hooked into register_prot_hook(), __unregister_prot_hook() to
    update the device used for the send path.

    We need to fix this up, as otherwise this will not work with
    sockets created with protocol = 0, plus with sll_protocol = 0
    passed via sockaddr_ll when doing the bind.

    So instead, assign the pointer directly. The compiler can inline
    these helper functions automagically.

    While at it, also assume the cached dev fast-path as likely(),
    and document this variant of socket creation as it seems it is
    not widely used (seems not even the author of TX_RING was aware
    of that in his reference example [1]). Tested with reproducer
    from e40526cb20b5.

    [1] http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap#Example

    Fixes: e40526cb20b5 ("packet: fix use after free race in send path when dev is released")
    Signed-off-by: Daniel Borkmann
    Tested-by: Salam Noureddine
    Tested-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Commit 6da7c8fcbcbd ("qdisc: allow setting default queuing discipline")
    added the ability to change default qdisc from pfifo_fast to say fq

    But as most modern ethernet devices are multiqueue, we cant really
    see all the statistics from "tc -s qdisc show", as the default root
    qdisc is mq.

    This patch adds the calls to qdisc_list_add() to mq and mqprio

    Signed-off-by: Eric Dumazet
    Cc: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Jeff Kirsher says:

    ====================
    Intel Wired LAN Driver Updates

    This series contains updates to i40e only.

    Jacob provides a i40e patch to get 1588 work correctly by separating
    TSYNVALID and TSYNINDX fields in the receive descriptor.

    Jesse provides several i40e patches, first to correct the checking
    of the multi-bit state. The hash is reported correctly in the RSS
    field if and only if the filter status is 3. Other values of the
    filter status mean different things and we should not depend on a
    bitwise result. Then provides a patch to enable a couple of
    workarounds based on revision ID that allow the driver to work
    more fully on early hardware.

    Shannon provides several i40e patches as well. First sets the media
    type in the hardware structure based on the external connection type.
    Then provides a patch to only setup the rings that will be used. Lastly
    provides a fix where the TESTING state was still set when exiting the
    ethtool diagnostics.

    Kevin Scott provides one i40e patch to add a new flag to the i40e_add_veb()
    which allows the driver to request the hardware to filter on layer 2
    parameters.

    Anjali provides four i40e patches, first refactors the reset code in
    order to re-size queues and vectors while the interface is still up.
    Then provides a patch to enable all PCTYPEs expect FCoE for RSS. Adds
    a message to notify the user of how many VFs are initialized on each
    port. Lastly adds a new variable to track the number of PF instances,
    this is a global counter on purpose so that each PF loaded has a
    unique ID.

    Catherine bumps the driver version.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The driver core clears the driver data to NULL after device_release
    or on probe failure. Thus, it is not needed to manually clear the
    device driver data to NULL.

    Signed-off-by: Jingoo Han
    Signed-off-by: David S. Miller

    Jingoo Han
     
  • The driver core clears the driver data to NULL after device_release
    or on probe failure. Thus, it is not needed to manually clear the
    device driver data to NULL.

    Signed-off-by: Jingoo Han
    Signed-off-by: David S. Miller

    Jingoo Han
     
  • The driver core clears the driver data to NULL after device_release
    or on probe failure. Thus, it is not needed to manually clear the
    device driver data to NULL.

    Signed-off-by: Jingoo Han
    Signed-off-by: David S. Miller

    Jingoo Han
     
  • The driver core clears the driver data to NULL after device_release
    or on probe failure. Thus, it is not needed to manually clear the
    device driver data to NULL.

    Signed-off-by: Jingoo Han
    Signed-off-by: David S. Miller

    Jingoo Han
     
  • The driver core clears the driver data to NULL after device_release
    or on probe failure. Thus, it is not needed to manually clear the
    device driver data to NULL.

    Signed-off-by: Jingoo Han
    Signed-off-by: David S. Miller

    Jingoo Han
     
  • The driver core clears the driver data to NULL after device_release
    or on probe failure. Thus, it is not needed to manually clear the
    device driver data to NULL.

    Signed-off-by: Jingoo Han
    Signed-off-by: David S. Miller

    Jingoo Han
     
  • The driver core clears the driver data to NULL after device_release
    or on probe failure. Thus, it is not needed to manually clear the
    device driver data to NULL.

    Signed-off-by: Jingoo Han
    Signed-off-by: David S. Miller

    Jingoo Han
     
  • The driver core clears the driver data to NULL after device_release
    or on probe failure. Thus, it is not needed to manually clear the
    device driver data to NULL.

    Signed-off-by: Jingoo Han
    Signed-off-by: David S. Miller

    Jingoo Han
     
  • The driver core clears the driver data to NULL after device_release
    or on probe failure. Thus, it is not needed to manually clear the
    device driver data to NULL.

    Signed-off-by: Jingoo Han
    Signed-off-by: David S. Miller

    Jingoo Han
     
  • The driver core clears the driver data to NULL after device_release
    or on probe failure. Thus, it is not needed to manually clear the
    device driver data to NULL.

    Signed-off-by: Jingoo Han
    Signed-off-by: David S. Miller

    Jingoo Han
     
  • The driver core clears the driver data to NULL after device_release
    or on probe failure. Thus, it is not needed to manually clear the
    device driver data to NULL.

    Signed-off-by: Jingoo Han
    Signed-off-by: David S. Miller

    Jingoo Han
     
  • The driver core clears the driver data to NULL after device_release
    or on probe failure. Thus, it is not needed to manually clear the
    device driver data to NULL.

    Signed-off-by: Jingoo Han
    Signed-off-by: David S. Miller

    Jingoo Han
     

07 Dec, 2013

10 commits

  • Track the number of physical functions (PFs) found, this is a global counter
    on purpose so that each pf loaded has a unique ID.

    Change-Id: I74d618520afbce4a774d0235449e3b5f97ff6d4a
    Signed-off-by: Anjali Singhai Jain
    Signed-off-by: Jesse Brandeburg
    Tested-by: Kavindya Deegala
    Signed-off-by: Jeff Kirsher

    Anjali Singhai Jain
     
  • Print a message to notify the user of how many VFs are initialized on each
    port.

    Change-Id: I29ac2acc478ee4e588fd6ffcc35133d4c6607ca9
    Signed-off-by: Anjali Singhai Jain
    Signed-off-by: Jesse Brandeburg
    Tested-by: Kavindya Deegala
    Signed-off-by: Jeff Kirsher

    Anjali Singhai Jain
     
  • Put the print and reset statements in the actual test functions to make
    them more self-contained, and only run the reset for tests that need it.

    Change-Id: Ic70f49b11bf8bae82e59d8fd25b46215c90c4510
    Signed-off-by: Shannon Nelson
    Signed-off-by: Jesse Brandeburg
    Tested-by: Kavindya Deegala
    Signed-off-by: Jeff Kirsher

    Shannon Nelson
     
  • Fix a bug where the TESTING state was still set when
    exiting the ethtool diagnostics.

    Change-Id: Ic47950d2e86a67167d1d282256d477cecd86d820
    Signed-off-by: Shannon Nelson
    Signed-off-by: Jesse Brandeburg
    Tested-by: Kavindya Deegala
    Signed-off-by: Jeff Kirsher

    Shannon Nelson
     
  • The VSI may be allocated more queues (alloc_queue_pairs) than actually
    are to be used (num_queue_pairs), so only allocate rings for the queues
    to be used. The numbers will likely be the same for most VSIs, but can
    be different based on how TCs are assigned and enabled.

    Change-Id: Ie40f7ad0affbc4b45d6f049bcf02ee2fa24edc74
    Signed-off-by: Shannon Nelson
    Signed-off-by: Jesse Brandeburg
    Tested-by: Kavindya Deegala
    Signed-off-by: Jeff Kirsher

    Shannon Nelson
     
  • RSS can steer packets based on recognition of all
    sorts of different headers. Enable some more of them.

    Change-Id: I2264dedae66fb0bceca6fb6e772e050e3ca8efc8
    Signed-off-by: Anjali Singhai Jain
    Signed-off-by: Jesse Brandeburg
    Tested-by: Kavindya Deegala
    Signed-off-by: Jeff Kirsher

    Anjali Singhai Jain
     
  • In order to re-size queues and vectors while the interface is
    still up, we need to be able to call functions to free and
    re-allocate without bringing down the VSI.

    We also need to reset the existing setup, update the
    configuration and then rebuild again. This requires us to have
    the reset flow broken down into two parts.

    Change-Id: I374dd25aabf769decda69b676491c7b7730a4635
    Signed-off-by: Anjali Singhai Jain
    Signed-off-by: Jesse Brandeburg
    Tested-by: Kavindya Deegala
    Signed-off-by: Jeff Kirsher

    Anjali Singhai Jain
     
  • Update the driver version to 0.3.12-k

    Signed-off-by: Catherine Sullivan
    Signed-off-by: Jesse Brandeburg
    Tested-by: Kavindya Deegala
    Signed-off-by: Jeff Kirsher

    Catherine Sullivan
     
  • Whitespace fixes

    Change-Id: I95f4d02e4a2a92d6b6fca3ae2b7865c4b916a9bb
    Signed-off-by: Jeff Kirsher
    Signed-off-by: Jesse Brandeburg
    Tested-by: Kavindya Deegala

    Jeff Kirsher
     
  • Enable a couple of workarounds based on revision ID that allow the
    driver to work more fully on early hardware.

    Signed-off-by: Jesse Brandeburg
    Tested-by: Kavindya Deegala
    Signed-off-by: Jeff Kirsher

    Jesse Brandeburg