21 Jun, 2017

40 commits

  • in my commit b952f4dff2751252db073c27c0f8a16a416a2ddc,
    - *(u8 *)skb_put(skb_out, 1) = (u8)(accm >> 24); \
    + skb_put(skb_out, (u8)(accm >> 24)); \
    it should skb_put_u8()

    Fixes: b952f4dff275 ("net: manual clean code which call skb_put_[data:zero])")
    Signed-off-by: yuan linyu
    Signed-off-by: David S. Miller

    yuan linyu
     
  • Saeed Mahameed says:

    ====================
    mlx5-updates-2017-06-20 (mlx5 IPoIB updates)

    This series includes updates to mlx5 IPoIB netdevice driver (mlx5i),

    1. We move ipoib files into separate directory, to allow it to grow
    separately in its own space
    2. Remove HW update carrier logic from IPoIB and VF representors profiles.
    3. Add basic ethtool support. (Rings options/statistics and driver info).
    4. Change MTU support.
    5. Xmit path statistics reporting.
    6. add PTP support.

    For the new ethtool ops, PTP (ioctl) and change_mtu ndos in IPoIB, we didn't add new
    implementation or new logic, we only reused those callbacks from the already existing
    mlx5e (ethernet netdevice profile) and exposed them in IPoIB netdevice/ethtool ops.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Julian Wiedmann says:

    ====================
    s390/net updates, part 2 (v2)

    thanks for the feedback. Here's an updated patchset that honours
    the reverse christmas tree and drops the __packed attribute. Please apply.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When a s390 guest runs on a z/VM host that's part of a SSI cluster,
    it can be migrated to a different host. In this case, the MAC address
    it originally obtained on the old host may be re-assigned to a new
    guest. This would result in address conflicts between the two guests.

    When running as z/VM guest, use the diag26c MAC Service to obtain
    a hypervisor-managed MAC address. The MAC Service is SSI-aware, and
    won't re-assign the address after the guest is migrated to a new host.

    This patch adds support for the z/VM MAC Service on L2 devices.

    Signed-off-by: Julian Wiedmann
    Acked-by: Ursula Braun
    Signed-off-by: David S. Miller

    Julian Wiedmann
     
  • Implement support for the hypervisor diagnose 0x26c
    ('Access Certain System Information').
    It passes a request buffer and a subfunction code, and receives
    a response buffer and a return code.

    Also add the scaffolding for the 'MAC Services' subfunction.
    It may be used by network devices to obtain a hypervisor-managed
    MAC address.

    Signed-off-by: Julian Wiedmann
    Acked-by: Heiko Carstens
    Signed-off-by: David S. Miller

    Julian Wiedmann
     
  • There's two spots in qeth_send_packet() where we don't accurately
    account for transmitted packing buffers in qeth's performance
    statistics:

    1) when flushing the current buffer due to insufficient size,
    and the next buffer is not EMPTY, we need to account for that
    flushed buffer.
    2) when synchronizing with the TX completion code, we reset
    flush_count and thus forget to account for any previously
    flushed buffers.

    Reported-by: Nils Hoppmann
    Signed-off-by: Julian Wiedmann
    Signed-off-by: David S. Miller

    Julian Wiedmann
     
  • add ipa return codes for Bridgeport (HiperSockets and OSA) according to
    system level design.

    Signed-off-by: Kittipon Meesompop
    Reviewed-by: Julian Wiedmann
    Reviewed-by: Ursula Braun
    Signed-off-by: Julian Wiedmann
    Signed-off-by: David S. Miller

    Kittipon Meesompop
     
  • It's a bad thing not to handle errors when updating asoc. The memory
    allocation failure in any of the functions called in sctp_assoc_update()
    would cause sctp to work unexpectedly.

    This patch is to fix it by aborting the asoc and reporting the error when
    any of these functions fails.

    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Xin Long
     
  • local_cork is used to decide if it should uncork asoc outq after processing
    some cmds, and it is set when replying or sending msgs. local_cork should
    always have the same value with current asoc q->cork in some way.

    The thing is when changing to a new asoc by cmd SET_ASOC, local_cork may
    not be consistent with the current asoc any more. The cmd seqs can be:

    SCTP_CMD_UPDATE_ASSOC (asoc)
    SCTP_CMD_REPLY (asoc)
    SCTP_CMD_SET_ASOC (new_asoc)
    SCTP_CMD_DELETE_TCB (new_asoc)
    SCTP_CMD_SET_ASOC (asoc)
    SCTP_CMD_REPLY (asoc)

    The 1st REPLY makes OLD asoc q->cork and local_cork both are 1, and the cmd
    DELETE_TCB clears NEW asoc q->cork and local_cork. After asoc goes back to
    OLD asoc, q->cork is still 1 while local_cork is 0. The 2nd REPLY will not
    set local_cork because q->cork is already set and it can't be uncorked and
    sent out because of this.

    To keep local_cork consistent with the current asoc q->cork, this patch is
    to uncork the old asoc if local_cork is set before changing to the new one.

    Note that the above cmd seqs will be used in the next patch when updating
    asoc and handling errors in it.

    Suggested-by: Marcelo Ricardo Leitner
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     
  • Patch "call inet_add_protocol after register_pernet_subsys in dccp_v4_init"
    fixed a null pointer dereference issue for dccp_ipv4 module.

    The same fix is needed for dccp_ipv6 module.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • Now dccp_ipv4 works as a kernel module. During loading this module, if
    one dccp packet is being recieved after inet_add_protocol but before
    register_pernet_subsys in which v4_ctl_sk is initialized, a null pointer
    dereference may be triggered because of init_net.dccp.v4_ctl_sk is 0x0.

    Jianlin found this issue when the following call trace occurred:

    [ 171.950177] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110
    [ 171.951007] IP: [] dccp_v4_ctl_send_reset+0xc4/0x220 [dccp_ipv4]
    [...]
    [ 171.984629] Call Trace:
    [ 171.984859]
    [ 171.985061]
    [ 171.985213] [] dccp_v4_rcv+0x383/0x3f9 [dccp_ipv4]
    [ 171.985711] [] ip_local_deliver_finish+0xb4/0x1f0
    [ 171.986309] [] ip_local_deliver+0x59/0xd0
    [ 171.986852] [] ? update_curr+0x104/0x190
    [ 171.986956] [] ip_rcv_finish+0x8a/0x350
    [ 171.986956] [] ip_rcv+0x2b6/0x410
    [ 171.986956] [] ? task_cputime+0x44/0x80
    [ 171.986956] [] __netif_receive_skb_core+0x572/0x7c0
    [ 171.986956] [] ? trigger_load_balance+0x61/0x1e0
    [ 171.986956] [] __netif_receive_skb+0x18/0x60
    [ 171.986956] [] process_backlog+0xae/0x180
    [ 171.986956] [] net_rx_action+0x16d/0x380
    [ 171.986956] [] __do_softirq+0xef/0x280
    [ 171.986956] [] call_softirq+0x1c/0x30

    This patch is to move inet_add_protocol after register_pernet_subsys in
    dccp_v4_init, so that v4_ctl_sk is initialized before any incoming dccp
    packets are processed.

    Reported-by: Jianlin Shi
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • With -Wformat-truncation, gcc throws the following warning.

    Fix this by increasing the size of devname to accommodate 15 character
    netdev interface name and description.

    Remove length format precision for %s. We can fit entire name.

    Also increment the version.

    drivers/net/ethernet/cisco/enic/enic_main.c: In function ‘enic_open’:
    drivers/net/ethernet/cisco/enic/enic_main.c:1740:15: warning: ‘%u’ directive output may be truncated writing between 1 and 2 bytes into a region of size between 1 and 12 [-Wformat-truncation=]
    "%.11s-rx-%u", netdev->name, i);
    ^~
    drivers/net/ethernet/cisco/enic/enic_main.c:1740:5: note: directive argument in the range [0, 16]
    "%.11s-rx-%u", netdev->name, i);
    ^~~~~~~~~~~~~
    drivers/net/ethernet/cisco/enic/enic_main.c:1738:4: note: ‘snprintf’ output between 6 and 18 bytes into a destination of size 16
    snprintf(enic->msix[intr].devname,
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    sizeof(enic->msix[intr].devname),
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    "%.11s-rx-%u", netdev->name, i);
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    Signed-off-by: Govindarajulu Varadarajan
    Signed-off-by: David S. Miller

    Govindarajulu Varadarajan
     
  • There is nothing in the IP that prevents us from enabling TSO for IPv6.

    Before patch:
    ftp fe80::2aa:bbff:fecc:1336%eth0
    ftp> get /dev/zero
    882512708 bytes received in 00:14 (56.11 MiB/s)

    After patch:
    ftp fe80::2aa:bbff:fecc:1336%eth0
    ftp> get /dev/zero
    1203326784 bytes received in 00:12 (94.52 MiB/s)

    Signed-off-by: Niklas Cassel
    Signed-off-by: David S. Miller

    Niklas Cassel
     
  • If the ibmvnic driver is not in the VNIC_OPEN state, return from
    ibmvnic_resume callback. If we are not in the VNIC_OPEN state, interrupts
    may not be initialized and directly calling the interrupt handler will
    cause a crash.

    Signed-off-by: John Allen
    Reviewed-by: Nathan Fontenot
    Signed-off-by: David S. Miller

    John Allen
     
  • Provide link partner advertising information.
    Removed testing for gigabit modes, which is useless for a fast ethernet phy.

    Signed-off-by: Thomas Bogendoerfer
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Thomas Bogendoerfer
     
  • John Crispin says:

    ====================
    net-next: mediatek: various performance improvements

    During development we mainly ran testing using iperf doing 1500 byte
    tcp frames. It was pointed out recently, that the driver does not perform
    very well when using 512 byte udp frames. The biggest problem was that
    RPS was not working as no rx queue was being set. fixing this more than
    doubled the throughput. Additionally the IRQ mask register is now locked
    independently for RX and TX. RX IRQ aggregation is also added. With all
    these patches applied we can almost triple the throughput.

    While at it we also add PHY status change reporting for GMACs connecting
    directly to a PHY.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The get_rps_cpu() function will not do any RPS on the data flow when no
    queue is setup and always use the current cpu where the IRQ was handled
    to also handle the backlog. As we only have one physical queue we always
    set this to 0 unconditionally.

    Signed-off-by: John Crispin
    Signed-off-by: David S. Miller

    John Crispin
     
  • Originally the driver only utilised the new QDMA engine. The current code
    still assumes this is the case when locking the IRQ mask register. Since
    RX now runs on the old style PDMA engine we can add a second lock. This
    patch reduces the IRQ latency as the TX and RX path no longer need to wait
    on each other under heavy load.

    Signed-off-by: John Crispin
    Signed-off-by: David S. Miller

    John Crispin
     
  • The PDMA engine used for RX allows IRQ aggregation. The patch sets up the
    corresponding registers to aggregate 4 IRQs into one. Using aggregation
    reduces the load on the core handling to a quarter thus reducing IRQ
    latency and increasing RX performance by around 10%.

    Signed-off-by: John Crispin
    Signed-off-by: David S. Miller

    John Crispin
     
  • Currently PHY status changes are only printed for DSA ports. This patch
    adds code to also print status changes for non-fixed links.

    Signed-off-by: John Crispin
    Signed-off-by: David S. Miller

    John Crispin
     
  • Matthias Schiffer says:

    ====================
    vxlan: cleanup and IPv6 link-local support

    Running VXLANs over IPv6 link-local addresses allows to use them as a
    drop-in replacement for VLANs, avoiding to allocate additional outer IP
    addresses to run the VXLAN over.

    Since v1, I have added a lot more consistency checks to the address
    configuration, making sure address families and scopes match. To simplify
    the implementation, I also did some general refactoring of the
    configuration handling in the new first patch of the series.

    The second patch is more cleanup; is slightly touches OVS code, so that
    list is in CC this time, too.

    As in v1, the last two patches actually make VXLAN over IPv6 link-local
    work, and allow multiple VXLANs with the same VNI and port, as long as
    link-local addresses on different interfaces are used. As suggested, I now
    store in the flags field if the VXLAN uses link-local addresses or not.

    v3 removes log messages as suggested by Roopa Prabhu (as it is very unusual
    for errors in netlink requests to be printed to the kernel log.) The commit
    message of patch 5 has been extended to add a note about IPv4.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • As link-local addresses are only valid for a single interface, we can allow
    to use the same VNI for multiple independent VXLANs, as long as the used
    interfaces are distinct. This way, VXLANs can always be used as a drop-in
    replacement for VLANs with greater ID space.

    This also extends VNI lookup to respect the ifindex when link-local IPv6
    addresses are used, so using the same VNI on multiple interfaces can
    actually work.

    Signed-off-by: Matthias Schiffer
    Signed-off-by: David S. Miller

    Matthias Schiffer
     
  • If VXLAN is run over link-local IPv6 addresses, it is necessary to store
    the ifindex in the FDB entries. Otherwise, the used interface is undefined
    and unicast communication will most likely fail.

    Support for link-local IPv4 addresses should be possible as well, but as
    the semantics aren't as well defined as for IPv6, and there doesn't seem to
    be much interest in having the support, it's not implemented for now.

    Signed-off-by: Matthias Schiffer
    Signed-off-by: David S. Miller

    Matthias Schiffer
     
  • * Multicast addresses are never valid as local address
    * Link-local IPv6 unicast addresses may only be used as remote when the
    local address is link-local as well
    * Don't allow link-local IPv6 local/remote addresses without interface

    We also store in the flags field if link-local addresses are used for the
    follow-up patches that actually make VXLAN over link-local IPv6 work.

    Signed-off-by: Matthias Schiffer
    Signed-off-by: David S. Miller

    Matthias Schiffer
     
  • Address families of source and destination addresses must match, and
    changelink operations can't change the address family.

    In addition, always use the VXLAN_F_IPV6 to check if a VXLAN device uses
    IPv4 or IPv6.

    Signed-off-by: Matthias Schiffer
    Signed-off-by: David S. Miller

    Matthias Schiffer
     
  • There is no good reason to keep the flags twice in vxlan_dev and
    vxlan_config.

    Signed-off-by: Matthias Schiffer
    Signed-off-by: David S. Miller

    Matthias Schiffer
     
  • The vxlan_dev_configure function was mixing validation and application of
    the vxlan configuration; this could easily lead to bugs with the changelink
    operation, as it was hard to see if the function wcould return an error
    after parts of the configuration had already been applied.

    This commit splits validation and application out of vxlan_dev_configure as
    separate functions to make it clearer where error returns are allowed and
    where the vxlan_dev or net_device may be configured. Log messages in these
    functions are removed, as it is generally unexpected to find error output
    for netlink requests in the kernel log. Userspace should be able to handle
    errors based on the error codes returned via netlink just fine.

    In addition, some validation and initialization is moved to vxlan_validate
    and vxlan_setup respectively to improve grouping of similar settings.

    Finally, this also fixes two actual bugs:

    * if set, conf->mtu would overwrite dev->mtu in each changelink operation,
    reverting other changes of dev->mtu
    * the "if (!conf->dst_port)" branch would never be run, as conf->dst_port
    was set in vxlan_setup before. This caused VXLAN-GPE to use the same
    default port as other VXLAN sockets instead of the intended IANA-assigned
    4790.

    Signed-off-by: Matthias Schiffer
    Signed-off-by: David S. Miller

    Matthias Schiffer
     
  • yuan linyu says:

    ====================
    net: more skb_put_[data:zero] related work

    yuan linyu (3):
    net: introduce __skb_put_[zero, data, u8]
    net: replace more place to skb_put_[data:zero]
    net: manual clean code which call skb_put_[data:zero]
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: yuan linyu
    Signed-off-by: David S. Miller

    yuan linyu
     
  • spatch file,
    @@
    expression skb, len, data;
    type t;
    @@
    -memcpy((t *)skb_put(skb, len), data, len);
    +skb_put_data(skb, data, len);

    @@
    identifier p;
    expression skb, len, data;
    type t;
    @@
    -p = (t *)memset(skb_put(skb, len), data, len);
    +p = skb_put_zero(skb, len);

    @@
    expression skb, len, data;
    type t;
    @@
    -memcpy((t *)__skb_put(skb, len), data, len);
    +__skb_put_data(skb, data, len);

    @@
    identifier p;
    expression skb, len, data;
    type t;
    @@
    -p = (t *)memset(__skb_put(skb, len), data, len);
    +p = __skb_put_zero(skb, len);

    Signed-off-by: yuan linyu
    Signed-off-by: David S. Miller

    yuan linyu
     
  • follow Johannes Berg, semantic patch file as below,
    @@
    identifier p, p2;
    expression len;
    expression skb;
    type t, t2;
    @@
    (
    -p = __skb_put(skb, len);
    +p = __skb_put_zero(skb, len);
    |
    -p = (t)__skb_put(skb, len);
    +p = __skb_put_zero(skb, len);
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, len);
    |
    -memset(p, 0, len);
    )

    @@
    identifier p;
    expression len;
    expression skb;
    type t;
    @@
    (
    -t p = __skb_put(skb, len);
    +t p = __skb_put_zero(skb, len);
    )
    ... when != p
    (
    -memset(p, 0, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb;
    @@
    t *p;
    ...
    (
    -p = __skb_put(skb, sizeof(t));
    +p = __skb_put_zero(skb, sizeof(t));
    |
    -p = (t *)__skb_put(skb, sizeof(t));
    +p = __skb_put_zero(skb, sizeof(t));
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, sizeof(*p));
    |
    -memset(p, 0, sizeof(*p));
    )

    @@
    expression skb, len;
    @@
    -memset(__skb_put(skb, len), 0, len);
    +__skb_put_zero(skb, len);

    @@
    expression skb, len, data;
    @@
    -memcpy(__skb_put(skb, len), data, len);
    +__skb_put_data(skb, data, len);

    @@
    expression SKB, C, S;
    typedef u8;
    identifier fn = {__skb_put};
    fresh identifier fn2 = fn ## "_u8";
    @@
    - *(u8 *)fn(SKB, S) = C;
    + fn2(SKB, C);

    Signed-off-by: yuan linyu
    Signed-off-by: David S. Miller

    yuan linyu
     
  • Kill the remaining shift macro in favor of calculating at compile time
    its value from the more descriptive mask, which gives us a better
    representation of the register layout.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • Vivien Didelot says:

    ====================
    net: dsa: Global 2 cosmetics

    Similarly to what has been done for the Port and Global 1 registers,
    this patch series prefixes and documents the macros of Global 2.

    It brings no functional changes except for 1/10 which fixes the IRL init
    for 88E6390 family.

    Changes in v2: make *_g2_irl_init_all static inline without
    NET_DSA_MV88E6XXX_GLOBAL2 and compile test with and without the symbol.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Prefix and document the remaining Global 2 registers macros.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • The Marvell 88E6352 family has a Global 2 register dedicated to the
    watchdog setup. But the 88E6390 turned it into an indirect table.

    Prefix and document that.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • Prefix and document the Global 2 Switch MAC registers macros.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • Prefix and document the Global 2 EEPROM registers macros.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • Prefix and document the Global 2 Cross-chip Port VLAN registers macros.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • Prefix and document the Global 2 MGMT registers macros.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • Prefix and document the Global 2 Device Mapping macros.

    Signed-off-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Vivien Didelot