11 Dec, 2012

6 commits


09 Dec, 2012

11 commits

  • Use the device model to get just the name, rather than using the
    ethtool API to get all driver information.

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • In cfusbl_device_notify(), the usbnet and usbdev variables are
    initialised before the driver name has been checked. In case the
    device's driver is not cdc_ncm, this may result in reading beyond the
    end of the netdev private area. Move the initialisation below the
    driver name check.

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • This patch implements the ethtool_{set|get}_channels method of virtio-net to
    allow user to change the number of queues when the device is running on demand.

    Signed-off-by: Jason Wang
    Signed-off-by: David S. Miller

    Jason Wang
     
  • This patch adds the multiqueue (VIRTIO_NET_F_MQ) support to virtio_net
    driver. VIRTIO_NET_F_MQ capable device could allow the driver to do packet
    transmission and reception through multiple queue pairs and does the packet
    steering to get better performance. By default, one one queue pair is used, user
    could change the number of queue pairs by ethtool in the next patch.

    When multiple queue pairs is used and the number of queue pairs is equal to the
    number of vcpus. Driver does the following optimizations to implement per-cpu
    virt queue pairs:

    - select the txq based on the smp processor id.
    - smp affinity hint to the cpu that owns the queue pairs.

    This could be used with the flow steering support of the device to guarantee the
    packets of a single flow is handled by the same cpu.

    Signed-off-by: Krishna Kumar
    Signed-off-by: Jason Wang
    Signed-off-by: David S. Miller

    Jason Wang
     
  • To support multiqueue transmitq/receiveq, the first step is to separate queue
    related structure from virtnet_info. This patch introduce send_queue and
    receive_queue structure and use the pointer to them as the parameter in
    functions handling sending/receiving.

    Signed-off-by: Krishna Kumar
    Signed-off-by: Jason Wang
    Signed-off-by: David S. Miller

    Jason Wang
     
  • This patch adds capability in vxlan to identify received
    checksummed inner packets and signal them to the upper layers of
    the stack. The driver needs to set the skb->encapsulation bit
    and also set the skb->ip_summed to CHECKSUM_UNNECESSARY.

    Signed-off-by: Joseph Gasparakis
    Signed-off-by: David S. Miller

    Joseph Gasparakis
     
  • Allow VXLAN to make use of Tx checksum offloading and Tx scatter-gather.
    The advantage to these two changes is that it also allows the VXLAN to
    make use of GSO.

    Signed-off-by: Joseph Gasparakis
    Signed-off-by: Peter P Waskiewicz Jr
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Joseph Gasparakis
     
  • This change allows the VXLAN to enable Tx checksum offloading even on
    devices that do not support encapsulated checksum offloads. The
    advantage to this is that it allows for the lower device to change due
    to routing table changes without impacting features on the VXLAN itself.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This patch adds support in the kernel for offloading in the NIC Tx and Rx
    checksumming for encapsulated packets (such as VXLAN and IP GRE).

    For Tx encapsulation offload, the driver will need to set the right bits
    in netdev->hw_enc_features. The protocol driver will have to set the
    skb->encapsulation bit and populate the inner headers, so the NIC driver will
    use those inner headers to calculate the csum in hardware.

    For Rx encapsulation offload, the driver will need to set again the
    skb->encapsulation flag and the skb->ip_csum to CHECKSUM_UNNECESSARY.
    In that case the protocol driver should push the decapsulated packet up
    to the stack, again with CHECKSUM_UNNECESSARY. In ether case, the protocol
    driver should set the skb->encapsulation flag back to zero. Finally the
    protocol driver should have NETIF_F_RXCSUM flag set in its features.

    Signed-off-by: Joseph Gasparakis
    Signed-off-by: Peter P Waskiewicz Jr
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Joseph Gasparakis
     
  • GigaMAC registers have been reported left unitialized in several
    situations:
    - after cold boot from power-off state
    - after S3 resume

    Tweaking rtl_hw_phy_config takes care of both.

    This patch removes an excess entry (",") at the end of the exgmac_reg
    array as well.

    Signed-off-by: Francois Romieu
    Signed-off-by: Wang YanQing
    Cc: Hayes Wang
    Signed-off-by: David S. Miller

    françois romieu
     
  • Paul Gortmaker says:

    ====================
    Changes since v1:
    -get rid of essentially unused variable spotted by
    Neil Horman (patch #2)

    -drop patch #3; defer it for 3.9 content, so Neil,
    Jon and Ying can discuss its specifics at their
    leisure while net-next is closed. (It had no
    direct dependencies to the rest of the series, and
    was just an optimization)

    -fix indentation of accept() code directly in place
    vs. forking it out to a separate function (was patch
    #10, now patch #9).

    Rebuilt and re-ran tests just to ensure nothing odd happened.

    Original v1 text follows, updated pull information follows that.

    ---------

    Here is another batch of TIPC changes. The most interesting
    thing is probably the non-blocking socket connect - I'm told
    there were several users looking forward to seeing this.

    Also there were some resource limitation changes that had
    the right intent back in 2005, but were now apparently causing
    needless limitations to people's real use cases; those have
    been relaxed/removed.

    There is a lockdep splat fix, but no need for a stable backport,
    since it is virtually impossible to trigger in mainline; you
    have to essentially modify code to force the probabilities
    in your favour to see it.

    The rest can largely be categorized as general cleanup of things
    seen in the process of getting the above changes done.

    Tested between 64 and 32 bit nodes with the test suite. I've
    also compile tested all the individual commits on the chain.

    I'd originally figured on this queue not being ready for 3.8, but
    the extended stabilization window of 3.7 has changed that. On
    the other hand, this can still be 3.9 material, if that simply
    works better for folks - no problem for me to defer it to 2013.
    If anyone spots any problems then I'll definitely defer it,
    rather than rush a last minute respin.
    ===================

    Signed-off-by: David S. Miller

    David S. Miller
     

08 Dec, 2012

23 commits

  • In TIPC's accept() routine, there is a large block of code relating
    to initialization of a new socket, all within an if condition checking
    if the allocation succeeded.

    Here, we simply flip the check of the if, so that the main execution
    path stays at the same indentation level, which improves readability.
    If the allocation fails, we jump to an already existing exit label.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • TIPC accept() call grabs the socket lock on a newly allocated
    socket while holding the socket lock on an old socket. But lockdep
    worries that this might be a recursive lock attempt:

    [ INFO: possible recursive locking detected ]
    ---------------------------------------------
    kworker/u:0/6 is trying to acquire lock:
    (sk_lock-AF_TIPC){+.+.+.}, at: [] accept+0x15c/0x310 [tipc]

    but task is already holding lock:
    (sk_lock-AF_TIPC){+.+.+.}, at: [] accept+0x28/0x310 [tipc]

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(sk_lock-AF_TIPC);
    lock(sk_lock-AF_TIPC);

    *** DEADLOCK ***

    May be due to missing lock nesting notation
    [...]

    Tell lockdep that this locking is safe by using lock_sock_nested().
    This is similar to what was done in commit 5131a184a3458d9 for
    SCTP code ("SCTP: lock_sock_nested in sctp_sock_migrate").

    Also note that this is isn't something that is seen normally,
    as it was uncovered with some experimental work-in-progress
    code not yet ready for mainline. So no need for stable
    backports or similar of this commit.

    Signed-off-by: Ying Xue
    Signed-off-by: Paul Gortmaker

    Ying Xue
     
  • As connection setup is now completed asynchronously in BH context,
    in the function filter_connect(), the corresponding code in recv_msg()
    becomes redundant.

    Signed-off-by: Ying Xue
    Signed-off-by: Jon Maloy
    Signed-off-by: Paul Gortmaker

    Ying Xue
     
  • TIPC has so far only supported blocking connect(), meaning that a call
    to connect() doesn't return until either the connection is fully
    established, or an error occurs. This has proved insufficient for many
    users, so we now introduce non-blocking connect(), analogous to how
    this is done in TCP and other protocols.

    With this feature, if a connection cannot be established instantly,
    connect() will return the error code "-EINPROGRESS".
    If the user later calls connect() again, he will either have the
    return code "-EALREADY" or "-EISCONN", depending on whether the
    connection has been established or not.

    The user must have explicitly set the socket to be non-blocking
    (SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
    for some reason they had set this already (the socket would anyway
    remain blocking in current TIPC) this change should be completely
    backwards compatible.

    It is also now possible to call select() or poll() to wait for the
    completion of a connection.

    An effect of the above is that the actual completion of a connection
    may now be performed asynchronously, independent of the calls from
    user space. Therefore, we now execute this code in BH context, in
    the function filter_rcv(), which is executed upon reception of
    messages in the socket.

    Signed-off-by: Ying Xue
    Signed-off-by: Jon Maloy
    [PG: minor refactoring for improved connect/disconnect function names]
    Signed-off-by: Paul Gortmaker

    Ying Xue
     
  • Handling of connection-related message reception is currently scattered
    around at different places in the code. This makes it harder to verify
    that things are handled correctly in all possible scenarios.
    So we consolidate the existing processing of connection-oriented
    message reception in a single routine. In the process, we convert the
    chain of if/else into a switch/case for improved readability.

    A cast on the socket_state in the switch is needed to avoid compile
    warnings on 32 bit, like "net/tipc/socket.c:1252:2: warning: case value
    ‘4294967295’ not in enumerated type". This happens because existing
    tipc code pseudo extends the default linux socket state values with:

    #define SS_LISTENING -1 /* socket is listening */
    #define SS_READY -2 /* socket is connectionless */

    It may make sense to add these as _positive_ values to the existing
    socket state enum list someday, vs. these already existing defines.

    Signed-off-by: Ying Xue
    Signed-off-by: Jon Maloy
    [PG: add cast to fix warning; remove returns from middle of switch]
    Signed-off-by: Paul Gortmaker

    Ying Xue
     
  • Currently we have tipc_disconnect and tipc_disconnect_port. It is
    not clear from the names alone, what they do or how they differ.
    It turns out that tipc_disconnect just deals with the port locking
    and then calls tipc_disconnect_port which does all the work.

    If we rename as follows: tipc_disconnect_port --> __tipc_disconnect
    then we will be following typical linux convention, where:

    __tipc_disconnect: "raw" function that does all the work.

    tipc_disconnect: wrapper that deals with locking and then calls
    the real core __tipc_disconnect function

    With this, the difference is immediately evident, and locking
    violations are more apt to be spotted by chance while working on,
    or even just while reading the code.

    On the connect side of things, we currently only have the single
    "tipc_connect2port" function. It does both the locking at enter/exit,
    and the core of the work. Pending changes will make it desireable to
    have the connect be a two part locking wrapper + worker function,
    just like the disconnect is already.

    Here, we make the connect look just like the updated disconnect case,
    for the above reason, and for consistency. In the process, we also
    get rid of the "2port" suffix that was on the original name, since
    it adds no descriptive value.

    On close examination, one might notice that the above connect
    changes implicitly move the call to tipc_link_get_max_pkt() to be
    within the scope of tipc_port_lock() protected region; when it was
    not previously. We don't see any issues with this, and it is in
    keeping with __tipc_connect doing the work and tipc_connect just
    handling the locking.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • The sk_recv_queue upper limit for connectionless sockets has empirically
    turned out to be too low. When we double the current limit we get much
    fewer rejected messages and no noticable negative side-effects.

    Signed-off-by: Jon Maloy
    Signed-off-by: Paul Gortmaker

    Jon Maloy
     
  • Since commit 2c60db037034 ('net: provide a default dev->ethtool_ops')
    all devices have a non-null ethtool_ops. Test only
    dev->ethtool_ops->get_link in both places where we care.

    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Ben Hutchings
     
  • V5: fix two bugs pointed out by Thomas
    remove seq check for now, mark it as TODO

    V4: remove some useless #include
    some coding style fix

    V3: drop debugging printk's
    update selinux perm table as well

    V2: drop patch 1/2, export ifindex directly
    Redesign netlink attributes
    Improve netlink seq check
    Handle IPv6 addr as well

    This patch exports bridge multicast database via netlink
    message type RTM_GETMDB. Similar to fdb, but currently bridge-specific.
    We may need to support modify multicast database too (RTM_{ADD,DEL}MDB).

    (Thanks to Thomas for patient reviews)

    Cc: Herbert Xu
    Cc: Stephen Hemminger
    Cc: "David S. Miller"
    Cc: Thomas Graf
    Cc: Jesper Dangaard Brouer
    Signed-off-by: Cong Wang
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Cong Wang
     
  • 'secluded' is used to describe places, not suitable here.

    Suggested-by: Ben Hutchings
    Signed-off-by: Shan Wei
    Signed-off-by: David S. Miller

    Shan Wei
     
  • Correct a mistake made in the previous commit due to reckless
    copy-and-pasting.

    Signed-off-by: Patrick Trantham
    Signed-off-by: David S. Miller

    Patrick Trantham
     
  • The __dev* removal patches for the network drivers ended up messing up
    the function prototypes for a bunch of drivers. This patch fixes all of
    them back up to be properly aligned.

    Bonus is that this almost removes 100 lines of code, always a nice
    surprise.

    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: David S. Miller

    Greg Kroah-Hartman
     
  • As a complement to the per-socket sk_recv_queue limit, TIPC keeps a
    global atomic counter for the sum of sk_recv_queue sizes across all
    tipc sockets. When incremented, the counter is compared to an upper
    threshold value, and if this is reached, the message is rejected
    with error code TIPC_OVERLOAD.

    This check was originally meant to protect the node against
    buffer exhaustion and general CPU overload. However, all experience
    indicates that the feature not only is redundant on Linux, but even
    harmful. Users run into the limit very often, causing disturbances
    for their applications, while removing it seems to have no negative
    effects at all. We have also seen that overall performance is
    boosted significantly when this bottleneck is removed.

    Furthermore, we don't see any other network protocols maintaining
    such a mechanism, something strengthening our conviction that this
    control can be eliminated.

    As a result, the atomic variable tipc_queue_size is now unused
    and so it can be deleted. There is a getsockopt call that used
    to allow reading it; we retain that but just return zero for
    maximum compatibility.

    Signed-off-by: Ying Xue
    Signed-off-by: Jon Maloy
    Cc: Neil Horman
    [PG: phase out tipc_queue_size as pointed out by Neil Horman]
    Signed-off-by: Paul Gortmaker

    Ying Xue
     
  • peer.transport_addr_list is currently only protected by sk_sock
    which is inpractical to acquire for procfs dumping purposes.

    This patch adds RCU protection allowing for the procfs readers to
    enter RCU read-side critical sections.

    Modification of the list continues to be serialized via sk_lock.

    V2: Use list_del_rcu() in sctp_association_free() to be safe
    Skip transports marked dead when dumping for procfs

    Cc: Vlad Yasevich
    Cc: Neil Horman
    Signed-off-by: Thomas Graf
    Acked-by: Vlad Yasevich
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • address_list is protected via the socket lock or RCU. Since we don't want
    to take the socket lock for each assoc we dump in procfs a RCU read-side
    critical section must be entered.

    V2: Skip local addresses marked as dead

    Cc: Vlad Yasevich
    Cc: Neil Horman
    Signed-off-by: Thomas Graf
    Acked-by: Vlad Yasevich
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • John W. Linville says:

    ====================
    This pull request is intended for 3.8...

    This includes a Bluetooth pull. Gustavo says:

    "A few more patches to 3.8, I hope they can still make it to mainline!
    The most important ones are the socket option for the SCO protocol to allow
    accept/refuse new connections from userspace. Other than that I added some
    fixes and Andrei did more AMP work."

    Also, a mac80211 pull. Johannes says:

    "If you think there's any chance this might make it still, please pull my
    mac80211-next tree (per below). This contains a relatively large number
    of fixes to the previous code, as well as a few small features:
    * VHT association in mac80211
    * some new debugfs files
    * P2P GO powersave configuration
    * masked MAC address verification

    The biggest patch is probably the BSS struct changes to use RCU for
    their IE buffers to fix potential races. I've not tagged this for stable
    because it's pretty invasive and nobody has ever seen any bugs in this
    area as far as I know."

    Several other drivers get some attention, including ath9k, brcmfmac,
    brcmsmac, and a number of others. Also, Hauke gives us a series that
    improves watchdog support for the bcma and ssb busses. Finally, Bill
    Pemberton delivers a group of "remove __dev* attributes" for wireless
    drivers -- these generate some "section mismatch" warnings, but Greg
    K-H assures me that they will disappear by the time -rc1 is released.

    This also includes a pull of the wireless tree to avoid merge
    conflicts.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • On error, the error code from tun_flow_init() is lost inside
    tun_set_iff(), this patch fixes this by assigning the tun_flow_init()
    error code to the "err" variable which is returned by
    the tun_flow_init() function on error.

    Signed-off-by: Paul Moore
    Acked-by: Jason Wang
    Signed-off-by: David S. Miller

    Paul Moore
     
  • …wireless-next into for-davem

    John W. Linville
     
  • vhost changes for 3.8 from Michael S. Tsirkin

    Signed-off-by: David S. Miller

    David S. Miller
     
  • WARNING: net/sctp/sctp.o(.text+0x72f1): Section mismatch in reference
    from the function sctp_net_init() to the function
    .init.text:sctp_proc_init()
    The function sctp_net_init() references
    the function __init sctp_proc_init().
    This is often because sctp_net_init lacks a __init
    annotation or the annotation of sctp_proc_init is wrong.

    And put __net_init after 'int' for sctp_proc_init - as it is done
    everywhere else in the sctp-stack.

    Signed-off-by: Christoph Paasch
    Acked-by: Neil Horman
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Christoph Paasch
     
  • The get_clock() of the chelsio driver clashes with the s390 one.
    The chelsio helper reads a timespec via ktime just to convert it
    back to ktime. I can see no different outcome from calling
    ktime_get directly.

    Remove the get_clock and use ktime_get directly.

    Signed-off-by: Jan Glauber
    Signed-off-by: David S. Miller

    Jan Glauber
     
  • It is possible that the driver is configured to operate with a certain
    link configuration which differs from the link's configuration during
    boot from SAN - this would cause the driver to flap the link.

    Said flap may be missed by specific switches, causing dcbx convergence
    to be too long and boot sequence to fail. Convergence is longer because
    switch ignores new dcbx packets due to counters mismatch, as only host
    side reset the counters due to the link flap.

    This patch causes the driver to ignore user's initial configuration during
    boot from SAN, and continues with the existing link configuration.

    Signed-off-by: Barak Witkowski
    Signed-off-by: Yuval Mintz
    Signed-off-by: Eilon Greenstein
    Signed-off-by: David S. Miller

    Barak Witkowski
     
  • A SMSC PHY in power down mode can't be used.
    If a SMSC PHY is in this mode in the config_init
    stage, the mode "all capable" is set. So the PHY
    could then be used.

    Signed-off-by: Philippe Reynes
    Signed-off-by: David S. Miller

    trem