25 Apr, 2017

40 commits

  • Christoph Paasch from Apple found another firewall issue for TFO:
    After successful 3WHS using TFO, server and client starts to exchange
    data. Afterwards, a 10s idle time occurs on this connection. After that,
    firewall starts to drop every packet on this connection.

    The fix for this issue is to extend existing firewall blackhole detection
    logic in tcp_write_timeout() by removing the mss check.

    Signed-off-by: Wei Wang
    Acked-by: Yuchung Cheng
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Wei Wang
     
  • This counter records the number of times the firewall blackhole issue is
    detected and active TFO is disabled.

    Signed-off-by: Wei Wang
    Acked-by: Yuchung Cheng
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Wei Wang
     
  • Middlebox firewall issues can potentially cause server's data being
    blackholed after a successful 3WHS using TFO. Following are the related
    reports from Apple:
    https://www.nanog.org/sites/default/files/Paasch_Network_Support.pdf
    Slide 31 identifies an issue where the client ACK to the server's data
    sent during a TFO'd handshake is dropped.
    C ---> syn-data ---> S
    C X S
    [retry and timeout]

    https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf
    Slide 5 shows a similar situation that the server's data gets dropped
    after 3WHS.
    C ---- syn-data ---> S
    C S
    S (accept & write)
    C? X
    Acked-by: Yuchung Cheng
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Wei Wang
     
  • Saeed Mahameed says:

    ====================
    mlx5-updates-2017-04-22

    Sparse and compiler warnings fixes from Stephen Hemminger.

    From Roi Dayan and Or Gerlitz, Add devlink and mlx5 support for controlling
    E-Switch encapsulation mode, this knob will enable HW support for applying
    encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • systemd-sysctl is triggering a suspicious RCU usage message when
    net.ipv4.tcp_early_demux or net.ipv4.udp_early_demux is changed via
    a sysctl config file:

    [ 33.896184] ===============================
    [ 33.899558] [ ERR: suspicious RCU usage. ]
    [ 33.900624] 4.11.0-rc7+ #104 Not tainted
    [ 33.901698] -------------------------------
    [ 33.903059] /home/dsa/kernel-2.git/net/ipv4/sysctl_net_ipv4.c:305 suspicious rcu_dereference_check() usage!
    [ 33.905724]
    other info that might help us debug this:

    [ 33.907656]
    rcu_scheduler_active = 2, debug_locks = 0
    [ 33.909288] 1 lock held by systemd-sysctl/143:
    [ 33.910373] #0: (sb_writers#5){.+.+.+}, at: [] file_start_write+0x45/0x48
    [ 33.912407]
    stack backtrace:
    [ 33.914018] CPU: 0 PID: 143 Comm: systemd-sysctl Not tainted 4.11.0-rc7+ #104
    [ 33.915631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
    [ 33.917870] Call Trace:
    [ 33.918431] dump_stack+0x81/0xb6
    [ 33.919241] lockdep_rcu_suspicious+0x10f/0x118
    [ 33.920263] proc_configure_early_demux+0x65/0x10a
    [ 33.921391] proc_udp_early_demux+0x3a/0x41

    add rcu locking to proc_configure_early_demux.

    Fixes: dddb64bcb3461 ("net: Add sysctl to toggle early demux for tcp and udp")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • …etooth/bluetooth-next

    Johan Hedberg says:

    ====================
    pull request: bluetooth-next 2017-04-22

    Here are some more Bluetooth patches (and one 802.15.4 patch) in the
    bluetooth-next tree targeting the 4.12 kernel. Most of them are pure
    fixes.

    Please let me know if there are any issues pulling. Thanks.
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • Trivial fix to spelling mistake in dev_err message and rejoin
    line.

    Signed-off-by: Colin Ian King
    Signed-off-by: David S. Miller

    Colin Ian King
     
  • Use offset_in_page() macro instead of open-coding.

    Signed-off-by: Geliang Tang
    Signed-off-by: David S. Miller

    Geliang Tang
     
  • Michael Chan says:

    ====================
    bnxt_en: Updates for net-next.

    Miscellaneous updates include passing DCBX RoCE VLAN priority to firmware,
    checking one more new firmware flag before allowing DCBX to run on the host,
    adding 100Gbps speed support, adding check to disallow speed settings on
    Multi-host NICs, and a minor fix for reporting VF attributes.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This change restricts the PF in multi-host mode from setting any port
    level PHY configuration. The settings are controlled by firmware in
    Multi-Host mode.

    Signed-off-by: Deepak Khungar
    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Deepak Khungar
     
  • Check the additional flag in bnxt_hwrm_func_qcfg() before allowing
    DCBX to be done in host mode.

    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • Added support for 100G link speed reporting for Broadcom BCM57454
    ASIC in ethtool command.

    Signed-off-by: Deepak Khungar
    Signed-off-by: Ray Jui
    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Deepak Khungar
     
  • The .ndo_get_vf_config() is returning the wrong qos attribute. Fix
    the code that checks and reports the qos and spoofchk attributes. The
    BNXT_VF_QOS and BNXT_VF_LINK_UP flags should not be set by default
    during init. time.

    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • When the driver gets the RoCE app priority set/delete call through DCBNL,
    the driver will send the information to the firmware to set up the
    priority VLAN tag for RDMA traffic.

    [ New version using the common ETH_P_IBOE constant in if_ether.h ]

    Signed-off-by: Michael Chan
    Signed-off-by: David S. Miller

    Michael Chan
     
  • Add a new optional conntrack action attribute OVS_CT_ATTR_EVENTMASK,
    which can be used in conjunction with the commit flag
    (OVS_CT_ATTR_COMMIT) to set the mask of bits specifying which
    conntrack events (IPCT_*) should be delivered via the Netfilter
    netlink multicast groups. Default behavior depends on the system
    configuration, but typically a lot of events are delivered. This can be
    very chatty for the NFNLGRP_CONNTRACK_UPDATE group, even if only some
    types of events are of interest.

    Netfilter core init_conntrack() adds the event cache extension, so we
    only need to set the ctmask value. However, if the system is
    configured without support for events, the setting will be skipped due
    to extension not being found.

    Signed-off-by: Jarno Rajahalme
    Reviewed-by: Greg Rose
    Acked-by: Joe Stringer
    Signed-off-by: David S. Miller

    Jarno Rajahalme
     
  • Fix typo in a comment.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Greg Rose
    Signed-off-by: David S. Miller

    Jarno Rajahalme
     
  • Nathan Fontenot says:

    ====================
    ibmvnic: Additional updates and bug fixes

    This set of patches is an additional set of updates and bug fixes to
    the ibmvnic driver which applies on top of the previous set of updates
    sent out on 4/19.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When an error is encountered during transmit we need to free the
    skb instead of returning TX_BUSY.

    Signed-off-by: Thomas Falcon
    Signed-off-by: David S. Miller

    Thomas Falcon
     
  • Validate that the napi structs exist before trying to disable them
    at driver close.

    Signed-off-by: Nathan Fontenot
    Signed-off-by: David S. Miller

    Nathan Fontenot
     
  • Create a common routine for setting the link state for the vnic adapter.
    This update moves the sending of the crq and waiting for the link state
    response to a common place. The new routine also adds handling of
    resending the crq in cases of getting a partial success response.

    Signed-off-by: Nathan Fontenot
    Signed-off-by: David S. Miller

    Nathan Fontenot
     
  • We should be initializing the stats token in the same place we
    initialize the other resources for the driver.

    Signed-off-by: Nathan Fontenot
    Signed-off-by: David S. Miller

    Nathan Fontenot
     
  • When handling a fatal error in the driver, there can be additional
    error information provided by the vios. This information is not
    always present, so only retrieve the additional error information
    when present.

    Signed-off-by: Nathan Fontenot
    Signed-off-by: David S. Miller

    Nathan Fontenot
     
  • This patch addresses a modification in the PAPR+ specification which now
    defines a previously reserved value for vNIC capabilities. It indicates
    whether the system firmware performs a VLAN header stripping on all VLAN
    tagged received frames, in case it does, the behavior expected is for
    the ibmvnic driver to be responsible for inserting the VLAN header.

    Reported-by: Manvanthara B. Puttashankar
    Signed-off-by: Murilo Fossa Vicentini
    Signed-off-by: David S. Miller

    Murilo Fossa Vicentini
     
  • Along with 5 TX queues, 5 RX queues are allocated at the beginning of
    device probe. However, only the real number of TX queues is set. Configure
    the real number of RX queues as well.

    Signed-off-by: Thomas Falcon
    Signed-off-by: David S. Miller

    Thomas Falcon
     
  • Mike Maloney says:

    ====================
    packet: Add option to create new fanout group with unique id.

    Fanout uses a per net global namespace. A process that intends to create a
    new fanout group can accidentally join an existing group. It is
    not possible to detect this.

    Add a socket option to specify on the first call to
    setsockopt(..., PACKET_FANOUT, ...) to ensure that a new group is created.
    Also add tests.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Create two groups with PACKET_FANOUT_FLAG_UNIQUEID, add a socket to one.
    Ensure that the groups can only be joined if all options are consistent
    with the original except for this flag.

    Signed-off-by: Mike Maloney
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Mike Maloney
     
  • Fanout uses a per net global namespace. A process that intends to create
    a new fanout group can accidentally join an existing group. It is not
    possible to detect this.

    Add socket option PACKET_FANOUT_FLAG_UNIQUEID. When specified the
    supplied fanout group id must be set to 0, and the kernel chooses an id
    that is not already in use. This is an ephemeral flag so that
    other sockets can be added to this group using setsockopt, but NOT
    specifying this flag. The current getsockopt(..., PACKET_FANOUT, ...)
    can be used to retrieve the new group id.

    We assume that there are not a lot of fanout groups and that this is not
    a high frequency call.

    The method assigns ids starting at zero and increases until it finds an
    unused id. It keeps track of the last assigned id, and uses it as a
    starting point to find new ids.

    Signed-off-by: Mike Maloney
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Mike Maloney
     
  • sock_fanout_open no longer sets the size of packet_socket ring, so stop
    passing the parameter.

    Signed-off-by: Mike Maloney
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Mike Maloney
     
  • Some boards [1] leave the PHYs at an invalid state
    during system power-up or reset thus causing unreliability
    issues with the PHY which manifests as PHY not being detected
    or link not functional. To fix this, these PHYs need to be RESET
    via a GPIO connected to the PHY's RESET pin.

    Some boards have a single GPIO controlling the PHY RESET pin of all
    PHYs on the bus whereas some others have separate GPIOs controlling
    individual PHY RESETs.

    In both cases, the RESET de-assertion cannot be done in the PHY driver
    as the PHY will not probe till its reset is de-asserted.
    So do the RESET de-assertion in the MDIO bus driver.

    [1] - am572x-idk, am571x-idk, a437x-idk

    Signed-off-by: Roger Quadros
    Signed-off-by: David S. Miller

    Roger Quadros
     
  • Stefan Hajnoczi says:

    ====================
    VSOCK: vsockmon virtual device to monitor AF_VSOCK sockets.

    v5:
    * Change vsock_deliver_tap() API to avoid unnecessary skb creation
    [Jorgen]
    * Fix skb leak when no taps are registered [Jorgen]
    * s/cpu_to_le16(pkt->hdr.op)/le16_to_cpu(pkt->hdr.op)/ [Michael]
    * Add af_vsock_tap.c and vsockmon.[ch] to MAINTAINERS
    * checkpatch.pl and sparse fixes

    v4:
    * Add explicit reserved padding field to struct af_vsockmon_hdr and
    drop __attribute__((packed)) [Michael, DaveM]
    * Call synchronize_net() before module_put() [Michael]

    v3:
    * Hook virtio_transport.c (guest driver), not just drivers/vhost/vsock.c (host
    driver)
    * Fix DEFAULT_MTU macro definition [Zhu Yanjun]
    * Rename af_vsockmon_hdr->t field ->transport for clarity
    * Update .ndo_get_stats64() return type since it has changed
    * Include missing header in af_vsock_tap.c

    This is a continuation of Gerard Garcia's work on the vsockmon packet capture
    interface for AF_VSOCK. Packet capture is an essential feature for network
    communication. Gerard began addressing this feature gap in his Google Summer
    of Code 2016 project. I have cleaned up, rebased, and retested the v2 series
    he posted previously.

    The design follows the nlmon packet capture interface closely. This is because
    vsock has the same problem as netlink: there is no netdev on which packets can
    be captured. The nlmon driver is a synthetic netdev purely for the purpose of
    enabling packet capture. We follow the same approach here with vsockmon.

    See include/uapi/linux/vsockmon.h in this series for details on the packet
    layout.

    How to try it:

    1. Build tcpdump with vsockmon patches:

    $ git clone -b vsock https://github.com/stefanha/libpcap
    $ (cd libcap && ./configure && make)
    $ git clone -b vsock https://github.com/stefanha/tcpdump
    $ (cd tcpdump && ./configure && make)

    2. Build nc-vsock (a netcat-like tool):

    $ git clone https://github.com/stefanha/nc-vsock
    $ (cd nc-vsock && make)

    3. Launch a virtual machine:

    # modprobe vhost_vsock
    # qemu-system-x86_64 -M accel=kvm -m 1024 -cpu host \
    -drive if=virtio,file=test.img,format=raw \
    -device vhost-vsock-pci,guest-cid=3

    (Assumes guest is running a kernel with this patch)

    4. Capture AF_VSOCK traffic in guest and/or host:

    # modprobe vsockmon
    # ip link add type vsockmon
    # ip link set vsockmon0 up
    # tcpdump -i vsockmon0 -vvv

    5. Communicate!

    (host)$ nc-vsock -l 1234
    (guest)$ nc-vsock 2 1234
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The virtio drivers deal with struct virtio_vsock_pkt. Add
    virtio_transport_deliver_tap_pkt(pkt) for handing packets to the
    vsockmon device.

    We call virtio_transport_deliver_tap_pkt(pkt) from
    net/vmw_vsock/virtio_transport.c and drivers/vhost/vsock.c instead of
    common code. This is because the drivers may drop packets before
    handing them to common code - we still want to capture them.

    Signed-off-by: Gerard Garcia
    Signed-off-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: David S. Miller

    Gerard Garcia
     
  • Add vsockmon virtual network device that receives packets from the vsock
    transports and exposes them to user space.

    Based on the nlmon device.

    Signed-off-by: Gerard Garcia
    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: David S. Miller

    Gerard Garcia
     
  • Add tap functions that can be used by the vsock transports to
    deliver packets to vsockmon virtual network devices.

    Signed-off-by: Gerard Garcia
    Signed-off-by: Stefan Hajnoczi
    Reviewed-by: Jorgen Hansen
    Signed-off-by: David S. Miller

    Gerard Garcia
     
  • …ub/scm/linux/kernel/git/kvalo/wireless-drivers-next

    Kalle Valo says:

    ====================
    wireless-drivers-next patches for 4.12

    Quite a lot of patches for rtlwifi and iwlwifi this time, but changes
    also for other active wireless drivers.

    Major changes:

    ath9k

    * add support for Dell Wireless 1601 PCI device

    * add debugfs file to manually override noise floor

    ath10k

    * bump up FW API to 6 for a new QCA6174 firmware branch

    wil6210

    * support 8 kB RX buffers

    iwlwifi

    * work to support A000 devices continues

    * add support for FW API 30

    * add Geographical and Dynamic Specific Absorption Rate (SAR) support

    * support a few new PCI device IDs

    rtlwifi

    * work on adding Bluetooth coexistance support, not finished yet
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • Sudarsana Reddy Kalluru says:

    ====================
    qed*: Dcbx/dcbnl enhancements.

    The series has set of enhancements for dcbx/dcbnl implementation of
    qed/qede drivers.
    - Patches (1) & (3) capture the sematic and debug changes.
    - Patch (2) adds the driver support for populating RoCEv2 dcb data.
    - Patch (4) adds the required support for reading/configuring the
    IEEE selection field (SF).
    - Patch (5) adds the support for configuring the static dcbx mode.

    Please consider applying this to 'net-next' branch.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The patch adds driver support for static/local dcbx mode. In this mode
    adapter brings up the dcbx link with locally configured parameters
    instead of performing the dcbx negotiation with the peer. The feature
    is useful when peer device/switch doesn't support dcbx.

    Signed-off-by: Sudarsana Reddy Kalluru
    Signed-off-by: Yuval Mintz
    Signed-off-by: David S. Miller

    sudarsana.kalluru@cavium.com
     
  • Signed-off-by: Sudarsana Reddy Kalluru
    Signed-off-by: Yuval Mintz
    Signed-off-by: David S. Miller

    sudarsana.kalluru@cavium.com
     
  • Signed-off-by: Sudarsana Reddy Kalluru
    Signed-off-by: Yuval Mintz
    Signed-off-by: David S. Miller

    sudarsana.kalluru@cavium.com
     
  • In the older firmware there was no distinction between RoCE and RoCEv2
    whereas the newer firmware (8.15.3.0) allows us to configure each
    independently. Driver need to populate the RoCEv2 data in its specific
    structure.

    Signed-off-by: Sudarsana Reddy Kalluru
    Signed-off-by: Yuval Mintz
    Signed-off-by: David S. Miller

    sudarsana.kalluru@cavium.com
     
  • Signed-off-by: Sudarsana Reddy Kalluru
    Signed-off-by: Yuval Mintz
    Signed-off-by: David S. Miller

    sudarsana.kalluru@cavium.com