13 Sep, 2016

1 commit


12 Sep, 2016

11 commits

  • Pull networking fixes from David Miller:
    "Mostly small sets of driver fixes scattered all over the place.

    1) Mediatek driver fixes from Sean Wang. Forward port not written
    correctly during TX map, missed handling of EPROBE_DEFER, and
    mistaken use of put_page() instead of skb_free_frag().

    2) Fix socket double-free in KCM code, from WANG Cong.

    3) QED driver fixes from Sudarsana Reddy Kalluru, including a fix for
    using the dcbx buffers before initializing them.

    4) Mellanox Switch driver fixes from Jiri Pirko, including a fix for
    double fib removals and an error handling fix in
    mlxsw_sp_module_init().

    5) Fix kernel panic when enabling LLDP in i40e driver, from Dave
    Ertman.

    6) Fix padding of TSO packets in thunderx driver, from Sunil Goutham.

    7) TCP's rcv_wup not initialized properly when using fastopen, from
    Neal Cardwell.

    8) Don't use uninitialized flow keys in flow dissector, from Gao
    Feng.

    9) Use after free in l2tp module unload, from Sabrina Dubroca.

    10) Fix interrupt registry ordering issues in smsc911x driver, from
    Jeremy Linton.

    11) Fix crashes in bonding having to do with enslaving and rx_handler,
    from Mahesh Bandewar.

    12) AF_UNIX deadlock fixes from Linus.

    13) In mlx5 driver, don't read skb->xmit_mode after it might have been
    freed from the TX reclaim path. From Tariq Toukan.

    14) Fix a bug from 2015 in TCP Yeah where the congestion window does
    not increase, from Artem Germanov.

    15) Don't pad frames on receive in NFP driver, from Jakub Kicinski.

    16) Fix chunk fragmenting in SCTP wrt. GSO, from Marcelo Ricardo
    Leitner.

    17) Fix deletion of VRF routes, from Mark Tomlinson.

    18) Fix device refcount leak when DAD fails in ipv6, from Wei Yongjun"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (101 commits)
    net/mlx4_en: Fix panic on xmit while port is down
    net/mlx4_en: Fixes for DCBX
    net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_state()
    net/mlx4_en: Fix the return value of mlx4_en_dcbnl_set_all()
    net: ethernet: renesas: sh_eth: add POST registers for rz
    drivers: net: phy: mdio-xgene: Add hardware dependency
    dwc_eth_qos: do not register semi-initialized device
    sctp: identify chunks that need to be fragmented at IP level
    mlxsw: spectrum: Set port type before setting its address
    mlxsw: spectrum_router: Fix error path in mlxsw_sp_router_init
    nfp: don't pad frames on receive
    nfp: drop support for old firmware ABIs
    nfp: remove linux/version.h includes
    tcp: cwnd does not increase in TCP YeAH
    net/mlx5e: Fix parsing of vlan packets when updating lro header
    net/mlx5e: Fix global PFC counters replication
    net/mlx5e: Prevent casting overflow
    net/mlx5e: Move an_disable_cap bit to a new position
    net/mlx5e: Fix xmit_more counter race issue
    tcp: fastopen: avoid negative sk_forward_alloc
    ...

    Linus Torvalds
     
  • Linus Torvalds
     
  • Tariq Toukan says:

    ====================
    mlx4 fixes

    This patchset contains several bug fixes from the team to the
    mlx4 Eth driver.

    Series generated against net commit:
    c2f57fb97da5 "drivers: net: phy: mdio-xgene: Add hardware dependency"

    v2:
    * excluded some cleanup patches.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When port is down, tx drop counter update is not needed.
    Updating the counter in this case can cause a kernel
    panic as when the port is down, ring can be NULL.

    Fixes: 63a664b7e92b ("net/mlx4_en: fix tx_dropped bug")
    Signed-off-by: Moshe Shemesh
    Signed-off-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Moshe Shemesh
     
  • This patch adds a capability check before enabling DCBX.
    In addition, it re-organizes the relevant data structures,
    and fixes a typo in a define.

    Fixes: af7d51852631 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands")
    Signed-off-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Tariq Toukan
     
  • mlx4_en_dcbnl_set_state() returns u8, the return value from
    mlx4_en_setup_tc() could be negative in case of failure, so fix that.

    Fixes: af7d51852631 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands")
    Signed-off-by: Kamal Heib
    Signed-off-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Kamal Heib
     
  • mlx4_en_dcbnl_set_all() returns u8, so return value can't be negative in
    case of failure.

    Fixes: af7d51852631 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands")
    Signed-off-by: Kamal Heib
    Signed-off-by: Rana Shahout
    Reported-by: Dan Carpenter
    Signed-off-by: Tariq Toukan
    Signed-off-by: David S. Miller

    Kamal Heib
     
  • While migrating the bcm_sf2 driver to use b53_common, we left a small
    piece untouched where we kept our local copy of the per-port
    port_vlan_ctl bitmask value. This value is now maintained by b53_device
    so we need to use it instead of our local (and now stale) copy of it.

    Fixes: f458995b9ad8 ("net: dsa: bcm_sf2: Utilize core B53 driver when possible")
    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • Commit aa71987472a9 ("nvme: fabrics drivers don't need the nvme-pci
    driver") removed the dependency on BLK_DEV_NVME, but the cdoe does
    depend on the block layer (which used to be an implicit dependency
    through BLK_DEV_NVME).

    Otherwise you get various errors from the kbuild test robot random
    config testing when that happens to hit a configuration with BLOCK
    device support disabled.

    Cc: Christoph Hellwig
    Cc: Jay Freyensee
    Cc: Sagi Grimberg
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Pull IIO fixes from Greg KH:
    "Here are a few small IIO fixes for 4.8-rc6.

    Nothing major, full details are in the shortlog, all of these have
    been in linux-next with no reported issues"

    * tag 'staging-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
    iio:core: fix IIO_VAL_FRACTIONAL sign handling
    iio: ensure ret is initialized to zero before entering do loop
    iio: accel: kxsd9: Fix scaling bug
    iio: accel: bmc150: reset chip at init time
    iio: fix pressure data output unit in hid-sensor-attributes
    tools:iio:iio_generic_buffer: fix trigger-less mode

    Linus Torvalds
     
  • Pull USB fixes from Greg KH:
    "Here are some small USB gadget, phy, and xhci fixes for 4.8-rc6.

    All of these resolve minor issues that have been reported, and all
    have been in linux-next with no reported issues"

    * tag 'usb-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
    usb: chipidea: udc: fix NULL ptr dereference in isr_setup_status_phase
    xhci: fix null pointer dereference in stop command timeout function
    usb: dwc3: pci: fix build warning on !PM_SLEEP
    usb: gadget: prevent potenial null pointer dereference on skb->len
    usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition
    usb: phy: phy-generic: Check clk_prepare_enable() error
    usb: gadget: udc: renesas-usb3: clear VBOUT bit in DRD_CON
    Revert "usb: dwc3: gadget: always decrement by 1"

    Linus Torvalds
     

11 Sep, 2016

28 commits

  • David Ahern says:

    ====================
    net: Convert vrf to tx hook

    The motivation for this series is that ICMP Unreachable - Fragmentation
    Needed packets are not handled properly for VRFs. Specifically, the
    FIB lookup in __ip_rt_update_pmtu fails so no nexthop exception is
    created with the reduced MTU. As a result connections stall if packets
    larger than the smallest MTU in the path are generated.

    While investigating that problem I also noticed that the MSS for all
    connections in a VRF is based on the VRF device's MTU and not the
    route the packets ultimately go through. VRF currently uses a dst
    to direct packets to the device. The first FIB lookup returns this dst
    and then the lookup in the VRF driver gets the actual output route. A
    side effect of this design is that the VRF dst is cached on sockets
    and then used for calculations like the MSS.

    This series fixes this problem by removing the hook in the FIB lookups
    that returns the dst pointing to the VRF device to the VRF and always
    doing the actual FIB lookup. This allows the real dst to be used
    throughout the stack (for example the MSS). Packets are diverted to
    the VRF device on Tx using an l3mdev hook in the output path similar to
    to what is done for Rx. The end result is a simpler implementation for
    VRF with fewer intrusions into the network stack and symmetrical packet
    handling for Rx and Tx paths.

    Comparison of netperf performance for a build without l3mdev (best case
    performance), the old vrf driver and the VRF driver from this series.
    Data are collected using VMs with virtio + vhost. The netperf client
    runs in the VM and netserver runs in the host. 1-byte RR tests are done
    as these packets exaggerate the performance hit due to the extra lookups
    done for l3mdev and VRF.

    Command: netperf -cC -H ${ip} -l 60 -t {TCP,UDP}_RR [-J red]

    TCP_RR UDP_RR
    IPv4 IPv6 IPv4 IPv6
    no l3mdev 29,996 30,601 31,638 24,336
    vrf old 27,417 27,626 29,159 24,801
    vrf new 28,036 28,372 30,110 24,857
    l3mdev, no vrf 29,534 30,465 30,670 24,346

    * Transactions per second as reported by netperf
    * netperf modified to take a bind-to-device argument -- the -J red option

    1. 'no l3mdev' == NET_L3_MASTER_DEV is unset so code is compiled out
    2. 'vrf old' == data for existing implementation
    3. 'vrf new' == data with this series
    4. 'l3mdev, no vrf' == NET_L3_MASTER_DEV is enabled but traffic is not
    going through a VRF

    About the series
    - patch 1 adds the flow update (changing oif or iif to L3 master device
    and setting the flag to skip the oif check) to ipv4 and ipv6 paths just
    before hitting the rules. This catches all code paths in a single spot.

    - patch 2 adds the Tx hook to push the packet to the l3mdev if relevant

    - patch 3 adds some checks so the vrf device can act as a vrf-local
    loopback. These changes were not needed before since the vrf dst was
    returned from the lookup.

    - patches 4 and 5 flip the ipv4 and ipv6 stacks to the tx hook leaving
    the route lookup to be the real one. The dst flip happens at the
    beginning of the L3 output path so the VRFs can have device based
    features such as netfilter, tc and tcpdump.

    - patches 6-11 remove no longer needed l3mdev code

    v2
    - properly handle IPv6 link scope addresses

    - keep the device xmit path and associated dst which is switched in by
    the l3_out hook. packets still need to go through the xmit path in
    case the user puts a qdisc on the vrf device and to allow tc rules.
    version 1 short circuited the tx handling and only covered netfilter
    and tcpdump.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • No longer used

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • No longer used

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • No longer used

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • No longer needed

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • No longer needed

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • A previous patch added l3mdev flow update making these hooks
    redundant. Remove them.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Flip the IPv6 output path to use the l3mdev tx out hook. The VRF dst
    is not returned on the first FIB lookup. Instead, the dst on the
    skb is switched at the beginning of the IPv6 output processing to
    send the packet to the VRF driver on xmit.

    Link scope addresses (linklocal and multicast) need special handling:
    specifically the oif the flow struct can not be changed because we
    want the lookup tied to the enslaved interface. ie., the source address
    and the returned route MUST point to the interface scope passed in.
    Convert the existing vrf_get_rt6_dst to handle only link scope addresses.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Flip the IPv4 output path to use the l3mdev tx out hook. The VRF dst
    is not returned on the first FIB lookup. Instead, the dst on the
    skb is switched at the beginning of the IPv4 output processing to
    send the packet to the VRF driver on xmit.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Allow an L3 master device to act as the loopback for that L3 domain.
    For IPv4 the device can also have the address 127.0.0.1.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • This patch adds the infrastructure to the output path to pass an skb
    to an l3mdev device if it has a hook registered. This is the Tx parallel
    to l3mdev_ip{6}_rcv in the receive path and is the basis for removing
    the existing hook that returns the vrf dst on the fib lookup.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Add l3mdev hook to set FLOWI_FLAG_SKIP_NH_OIF flag and update oif/iif
    in flow struct if its oif or iif points to a device enslaved to an L3
    Master device. Only 1 needs to be converted to match the l3mdev FIB
    rule. This moves the flow adjustment for l3mdev to a single point
    catching all lookups. It is redundant for existing hooks (those are
    removed in later patches) but is needed for missed lookups such as
    PMTU updates.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Adjust the indentation for a call of the macro "DPRINTK" in this function.

    Signed-off-by: Markus Elfring
    Signed-off-by: David S. Miller

    Markus Elfring
     
  • * The script "checkpatch.pl" can point information out like the following.

    WARNING: Prefer kcalloc over kzalloc with multiply

    Thus fix the affected source code place.

    * Replace the specification of a data type by a pointer dereference
    to make the corresponding size determination a bit safer according to
    the Linux coding style convention.

    * Delete the local variable "size" which became unnecessary with
    this refactoring.

    Signed-off-by: Markus Elfring
    Signed-off-by: David S. Miller

    Markus Elfring
     
  • Replace the specification of a data structure by a pointer dereference
    as the parameter for the operator "sizeof" to make the corresponding size
    determination a bit safer according to the Linux coding style convention.

    Signed-off-by: Markus Elfring
    Signed-off-by: David S. Miller

    Markus Elfring
     
  • * A multiplication for the size determination of a memory allocation
    indicated that an array data structure should be processed.
    Thus use the corresponding function "kmalloc_array".

    This issue was detected by using the Coccinelle software.

    * Replace the specification of a data type by a pointer dereference
    to make the corresponding size determination a bit safer according to
    the Linux coding style convention.

    Signed-off-by: Markus Elfring
    Signed-off-by: David S. Miller

    Markus Elfring
     
  • The script "checkpatch.pl" can point out that assignments should usually
    not be performed within condition checks.
    Thus move an assignment for a local variable to a separate statement
    in this function.

    Signed-off-by: Markus Elfring
    Signed-off-by: David S. Miller

    Markus Elfring
     
  • * The script "checkpatch.pl" can point out that assignments should usually
    not be performed within condition checks.
    Thus move an assignment for a local variable to a separate statement
    in this function.

    * Replace the specification of a data structure by a pointer dereference
    as the parameter for the operator "sizeof" to make the corresponding size
    determination a bit safer according to the Linux coding style convention.

    Signed-off-by: Markus Elfring
    Signed-off-by: David S. Miller

    Markus Elfring
     
  • Replace the specification of a data structure by a reference for a field
    in a local variable as the parameter for the operator "sizeof" to make
    the corresponding size determination a bit safer according to
    the Linux coding style convention.

    Signed-off-by: Markus Elfring
    Signed-off-by: David S. Miller

    Markus Elfring
     
  • Replace the specification of a data structure by a pointer dereference
    as the parameter for the operator "sizeof" to make the corresponding size
    determination a bit safer according to the Linux coding style convention.

    Signed-off-by: Markus Elfring
    Signed-off-by: David S. Miller

    Markus Elfring
     
  • * A multiplication for the size determination of a memory allocation
    indicated that an array data structure should be processed.
    Thus use the corresponding function "kmalloc_array".

    This issue was detected by using the Coccinelle software.

    * Replace the specification of a data type by a pointer dereference
    to make the corresponding size determination a bit safer according to
    the Linux coding style convention.

    Signed-off-by: Markus Elfring
    Signed-off-by: David S. Miller

    Markus Elfring
     
  • Willem noticed that we could avoid an rbtree lookup if the
    the attempt to coalesce incoming skb to the last skb failed
    for some reason.

    Since most ooo additions are at the tail, this is definitely
    worth adding a test and fast path.

    Suggested-by: Willem de Bruijn
    Signed-off-by: Eric Dumazet
    Cc: Yaogong Wang
    Cc: Yuchung Cheng
    Cc: Neal Cardwell
    Cc: Ilpo Järvinen
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • When userspace tries to create datapaths and the module is not loaded,
    it will simply fail. With this patch, the module will be automatically
    loaded.

    Signed-off-by: Thadeu Lima de Souza Cascardo
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Thadeu Lima de Souza Cascardo
     
  • These functions are used by other code misc-next tree.

    This reverts commit 30d1de08c87ddde6f73936c3350e7e153988fe02.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • Saeed Mahameed says:

    ====================
    Mellanox 100G mlx5 seamless error recovery

    This series from Mohamad improves the driver load/unload flows
    to seamlessly handle pci errors and device internal errors recovery
    reset flows.

    Current pci and internal error handling is too heavy and is done
    with a full restart of the driver by unregistering mlx5 interfaces
    (mlx5e netedevs and mlx5_ib) which will cause losing all the current
    interfaces and mlx5 core configurations.

    To improve this, we add new callback functions of mlx5 interface
    object (attach/detach) to be called upon reset flows when errors are
    detected rather than calling register and unregister interfaces.

    On their side, interfaces such as (mlx5e and mlx5_ib) can choose to implement
    those callback, if not, the old heavy reset will be called for that interface.

    For non-interface mlx5 modules such as sriov and eswitch, we refactored
    and reorganized the code in a way that the software state objects are created
    only once on driver load. Those software state objects are kept upon reset recovery
    flows and only freed once on driver unload. On seamless soft reset flows, only
    hardware resources are released on stop and re-allocated on start according to the
    current soft state.

    In this series only mlx5e interface implements attach/detach callbacks
    so that the netdevice will be kept alive on reset. On detach only hardware resources
    are released and the netdevice will be marked as detached to the stack. Once
    attached again it will re-allocate the hardware resources according to the current
    netdevice state, and all the configurations and the software state will be kept or restored
    after recovery.

    Note: I will be out of office all next week, in case of any updates
    or V2 is required, Tariq will post the new series, I hope it is ok.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Hide the exposed (external) mlx5_dev_list and mlx5_intf_mutex and expose
    an organized modular API to manage and manipulate mlx5 devices list.

    Signed-off-by: Mohamad Haj Yahia
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Mohamad Haj Yahia
     
  • When detaching the mlx5e interface clear all the vlans rules from the
    vlan flow table.
    When attaching it back restore all the active vlans rules to the HW.

    Signed-off-by: Mohamad Haj Yahia
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Mohamad Haj Yahia
     
  • Needed to support seamless and lightweight PCI/Internal error recovery.
    Implement the attach/detach interface callbacks.
    In attach callback we only allocate HW resources.
    In detach callback we only deallocate HW resources.
    All SW/kernel objects initialzing/destroying is kept in add/remove
    callbacks.

    Signed-off-by: Mohamad Haj Yahia
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Mohamad Haj Yahia