02 Apr, 2020

1 commit


01 Apr, 2020

1 commit

  • Fix an oops in dsa_port_phylink_mac_change() caused by a combination
    of a20f997010c4 ("net: dsa: Don't instantiate phylink for CPU/DSA
    ports unless needed") and the net-dsa-improve-serdes-integration
    series of patches 65b7a2c8e369 ("Merge branch
    'net-dsa-improve-serdes-integration'").

    Unable to handle kernel NULL pointer dereference at virtual address 00000124
    pgd = c0004000
    [00000124] *pgd=00000000
    Internal error: Oops: 805 [#1] SMP ARM
    Modules linked in: tag_edsa spi_nor mtd xhci_plat_hcd mv88e6xxx(+) xhci_hcd armada_thermal marvell_cesa dsa_core ehci_orion libdes phy_armada38x_comphy at24 mcp3021 sfp evbug spi_orion sff mdio_i2c
    CPU: 1 PID: 214 Comm: irq/55-mv88e6xx Not tainted 5.6.0+ #470
    Hardware name: Marvell Armada 380/385 (Device Tree)
    PC is at phylink_mac_change+0x10/0x88
    LR is at mv88e6352_serdes_irq_status+0x74/0x94 [mv88e6xxx]

    Signed-off-by: Russell King
    Reviewed-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Russell King
     

31 Mar, 2020

3 commits

  • The approach taken to pass the port policer methods on to drivers is
    pragmatic. It is similar to the port mirroring implementation (in that
    the DSA core does all of the filter block interaction and only passes
    simple operations for the driver to implement) and dissimilar to how
    flow-based policers are going to be implemented (where the driver has
    full control over the flow_cls_offload data structure).

    Signed-off-by: Vladimir Oltean
    Signed-off-by: David S. Miller

    Vladimir Oltean
     
  • Make room for other actions for the matchall filter by keeping the
    mirred argument parsing self-contained in its own function.

    Signed-off-by: Vladimir Oltean
    Signed-off-by: David S. Miller

    Vladimir Oltean
     
  • There is no point in preparing the module name in a buffer. The format
    string can be passed diectly to 'request_module()'.

    This axes a few lines of code and cleans a few things:
    - max len for a driver name is MODULE_NAME_LEN wich is ~ 60 chars,
    not 128. It would be down-sized in 'request_module()'
    - we should pass the total size of the buffer to 'snprintf()', not the
    size minus 1

    Signed-off-by: Christophe JAILLET
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Christophe JAILLET
     

28 Mar, 2020

2 commits

  • Many switches don't have an explicit knob for configuring the MTU
    (maximum transmission unit per interface). Instead, they do the
    length-based packet admission checks on the ingress interface, for
    reasons that are easy to understand (why would you accept a packet in
    the queuing subsystem if you know you're going to drop it anyway).

    So it is actually the MRU that these switches permit configuring.

    In Linux there only exists the IFLA_MTU netlink attribute and the
    associated dev_set_mtu function. The comments like to play blind and say
    that it's changing the "maximum transfer unit", which is to say that
    there isn't any directionality in the meaning of the MTU word. So that
    is the interpretation that this patch is giving to things: MTU == MRU.

    When 2 interfaces having different MTUs are bridged, the bridge driver
    MTU auto-adjustment logic kicks in: what br_mtu_auto_adjust() does is it
    adjusts the MTU of the bridge net device itself (and not that of the
    slave net devices) to the minimum value of all slave interfaces, in
    order for forwarded packets to not exceed the MTU regardless of the
    interface they are received and send on.

    The idea behind this behavior, and why the slave MTUs are not adjusted,
    is that normal termination from Linux over the L2 forwarding domain
    should happen over the bridge net device, which _is_ properly limited by
    the minimum MTU. And termination over individual slave devices is
    possible even if those are bridged. But that is not "forwarding", so
    there's no reason to do normalization there, since only a single
    interface sees that packet.

    The problem with those switches that can only control the MRU is with
    the offloaded data path, where a packet received on an interface with
    MRU 9000 would still be forwarded to an interface with MRU 1500. And the
    br_mtu_auto_adjust() function does not really help, since the MTU
    configured on the bridge net device is ignored.

    In order to enforce the de-facto MTU == MRU rule for these switches, we
    need to do MTU normalization, which means: in order for no packet larger
    than the MTU configured on this port to be sent, then we need to limit
    the MRU on all ports that this packet could possibly come from. AKA
    since we are configuring the MRU via MTU, it means that all ports within
    a bridge forwarding domain should have the same MTU.

    And that is exactly what this patch is trying to do.

    >From an implementation perspective, we try to follow the intent of the
    user, otherwise there is a risk that we might livelock them (they try to
    change the MTU on an already-bridged interface, but we just keep
    changing it back in an attempt to keep the MTU normalized). So the MTU
    that the bridge is normalized to is either:

    - The most recently changed one:

    ip link set dev swp0 master br0
    ip link set dev swp1 master br0
    ip link set dev swp0 mtu 1400

    This sequence will make swp1 inherit MTU 1400 from swp0.

    - The one of the most recently added interface to the bridge:

    ip link set dev swp0 master br0
    ip link set dev swp1 mtu 1400
    ip link set dev swp1 master br0

    The above sequence will make swp0 inherit MTU 1400 as well.

    Suggested-by: Florian Fainelli
    Signed-off-by: Vladimir Oltean
    Signed-off-by: David S. Miller

    Vladimir Oltean
     
  • It is useful be able to configure port policers on a switch to accept
    frames of various sizes:

    - Increase the MTU for better throughput from the default of 1500 if it
    is known that there is no 10/100 Mbps device in the network.
    - Decrease the MTU to limit the latency of high-priority frames under
    congestion, or work around various network segments that add extra
    headers to packets which can't be fragmented.

    For DSA slave ports, this is mostly a pass-through callback, called
    through the regular ndo ops and at probe time (to ensure consistency
    across all supported switches).

    The CPU port is called with an MTU equal to the largest configured MTU
    of the slave ports. The assumption is that the user might want to
    sustain a bidirectional conversation with a partner over any switch
    port.

    The DSA master is configured the same as the CPU port, plus the tagger
    overhead. Since the MTU is by definition L2 payload (sans Ethernet
    header), it is up to each individual driver to figure out if it needs to
    do anything special for its frame tags on the CPU port (it shouldn't
    except in special cases). So the MTU does not contain the tagger
    overhead on the CPU port.
    However the MTU of the DSA master, minus the tagger overhead, is used as
    a proxy for the MTU of the CPU port, which does not have a net device.
    This is to avoid uselessly calling the .change_mtu function on the CPU
    port when nothing should change.

    So it is safe to assume that the DSA master and the CPU port MTUs are
    apart by exactly the tagger's overhead in bytes.

    Some changes were made around dsa_master_set_mtu(), function which was
    now removed, for 2 reasons:
    - dev_set_mtu() already calls dev_validate_mtu(), so it's redundant to
    do the same thing in DSA
    - __dev_set_mtu() returns 0 if ops->ndo_change_mtu is an absent method
    That is to say, there's no need for this function in DSA, we can safely
    call dev_set_mtu() directly, take the rtnl lock when necessary, and just
    propagate whatever errors get reported (since the user probably wants to
    be informed).

    Some inspiration (mainly in the MTU DSA notifier) was taken from a
    vaguely similar patch from Murali and Florian, who are credited as
    co-developers down below.

    Co-developed-by: Murali Krishna Policharla
    Signed-off-by: Murali Krishna Policharla
    Co-developed-by: Florian Fainelli
    Signed-off-by: Florian Fainelli
    Signed-off-by: Vladimir Oltean
    Signed-off-by: David S. Miller

    Vladimir Oltean
     

26 Mar, 2020

1 commit


25 Mar, 2020

1 commit

  • Not only did this wheel did not need reinventing, but there is also
    an issue with it: It doesn't remove the VLAN header in a way that
    preserves the L2 payload checksum when that is being provided by the DSA
    master hw. It should recalculate checksum both for the push, before
    removing the header, and for the pull afterwards. But the current
    implementation is quite dizzying, with pulls followed immediately
    afterwards by pushes, the memmove is done before the push, etc. This
    makes a DSA master with RX checksumming offload to print stack traces
    with the infamous 'hw csum failure' message.

    So remove the dsa_8021q_remove_header function and replace it with
    something that actually works with inet checksumming.

    Fixes: d461933638ae ("net: dsa: tag_8021q: Create helper function for removing VLAN header")
    Signed-off-by: Vladimir Oltean
    Signed-off-by: David S. Miller

    Vladimir Oltean
     

24 Mar, 2020

2 commits

  • Provide a flow_dissect callback which returns the network offset and
    where to find the skb protocol, given the tags structure a common
    function works for both tagging formats that are supported.

    Signed-off-by: Florian Fainelli
    Reviewed-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • When both the switch and the bridge are learning about new addresses,
    switch ports attached to the bridge would see duplicate ARP frames
    because both entities would attempt to send them.

    Fixes: 5037d532b83d ("net: dsa: add Broadcom tag RX/TX handler")
    Reported-by: Maxime Bizon
    Signed-off-by: Florian Fainelli
    Reviewed-by: Vivien Didelot
    Signed-off-by: David S. Miller

    Florian Fainelli
     

18 Mar, 2020

1 commit

  • flow_action_hw_stats_types_check() helper takes one of the
    FLOW_ACTION_HW_STATS_*_BIT values as input. If we align
    the arguments to the opening bracket of the helper there
    is no way to call this helper and stay under 80 characters.

    Remove the "types" part from the new flow_action helpers
    and enum values.

    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

16 Mar, 2020

1 commit


13 Mar, 2020

1 commit


12 Mar, 2020

1 commit

  • By default, DSA drivers should configure CPU and DSA ports to their
    maximum speed. In many configurations this is sufficient to make the
    link work.

    In some cases it is necessary to configure the link to run slower,
    e.g. because of limitations of the SoC it is connected to. Or back to
    back PHYs are used and the PHY needs to be driven in order to
    establish link. In this case, phylink is used.

    Only instantiate phylink if it is required. If there is no PHY, or no
    fixed link properties, phylink can upset a link which works in the
    default configuration.

    Fixes: 0e27921816ad ("net: dsa: Use PHYLINK for the CPU/DSA ports")
    Signed-off-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Andrew Lunn
     

09 Mar, 2020

1 commit


05 Mar, 2020

1 commit

  • Ocelot has the concept of a CPU port. The CPU port is represented in the
    forwarding and the queueing system, but it is not a physical device. The
    CPU port can either be accessed via register-based injection/extraction
    (which is the case of Ocelot), via Frame-DMA (similar to the first one),
    or "connected" to a physical Ethernet port (called NPI in the datasheet)
    which is the case of the Felix DSA switch.

    In Ocelot the CPU port is at index 11.
    In Felix the CPU port is at index 6.

    The CPU bit is treated special in the forwarding, as it is never cleared
    from the forwarding port mask (once added to it). Other than that, it is
    treated the same as a normal front port.

    Both Felix and Ocelot should use the CPU port in the same way. This
    means that Felix should not use the NPI port directly when forwarding to
    the CPU, but instead use the CPU port.

    This patch is fixing this such that Felix will use port 6 as its CPU
    port, and just use the NPI port to carry the traffic.

    Therefore, eliminate the "ocelot->cpu" variable which was holding the
    index of the NPI port for Felix, and the index of the CPU port module
    for Ocelot, so the variable was actually configuring different things
    for different drivers and causing at least part of the confusion.

    Also remove the "ocelot->num_cpu_ports" variable, which is the result of
    another confusion. The 2 CPU ports mentioned in the datasheet are
    because there are two frame extraction channels (register based or DMA
    based). This is of no relevance to the driver at the moment, and
    invisible to the analyzer module.

    Signed-off-by: Vladimir Oltean
    Suggested-by: Allan W. Nielsen
    Signed-off-by: David S. Miller

    Vladimir Oltean
     

04 Mar, 2020

2 commits

  • Due to the immense variety of classification keys and actions available
    for tc-flower, as well as due to potentially very different DSA switch
    capabilities, it doesn't make a lot of sense for the DSA mid layer to
    even attempt to interpret these. So just pass them on to the underlying
    switch driver.

    DSA implements just the standard boilerplate for binding and unbinding
    flow blocks to ports, since nobody wants to deal with that.

    Signed-off-by: Vladimir Oltean
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Vladimir Oltean
     
  • Place phylink_start()/phylink_stop() inside dsa_port_enable() and
    dsa_port_disable(), which ensures that we call phylink_stop() before
    tearing down phylink - which is a documented requirement. Failure
    to do so can cause use-after-free bugs.

    Fixes: 0e27921816ad ("net: dsa: Use PHYLINK for the CPU/DSA ports")
    Signed-off-by: Russell King
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Russell King
     

28 Feb, 2020

2 commits

  • Propagate the resolved link configuration down via DSA's
    phylink_mac_link_up() operation to allow split PCS/MAC to work.

    Tested-by: Vladimir Oltean
    Signed-off-by: Russell King
    Signed-off-by: David S. Miller

    Russell King
     
  • Propagate the resolved link parameters via the mac_link_up() call for
    MACs that do not automatically track their PCS state. We propagate the
    link parameters via function arguments so that inappropriate members
    of struct phylink_link_state can't be accessed, and creating a new
    structure just for this adds needless complexity to the API.

    Tested-by: Andre Przywara
    Tested-by: Alexandre Belloni
    Tested-by: Vladimir Oltean
    Signed-off-by: Russell King
    Signed-off-by: David S. Miller

    Russell King
     

14 Feb, 2020

2 commits


27 Jan, 2020

1 commit

  • DSA sets up a switch tree little by little. Every switch of the N
    members of the tree calls dsa_register_switch, and (N - 1) will just
    touch the dst->ports list with their ports and quickly exit. Only the
    last switch that calls dsa_register_switch will find all DSA links
    complete in dsa_tree_setup_routing_table, and not return zero as a
    result but instead go ahead and set up the entire DSA switch tree
    (practically on behalf of the other switches too).

    The trouble is that the (N - 1) switches don't clean up after themselves
    after they get an error such as EPROBE_DEFER. Their footprint left in
    dst->ports by dsa_switch_touch_ports is still there. And switch N, the
    one responsible with actually setting up the tree, is going to work with
    those stale dp, dp->ds and dp->ds->dev pointers. In particular ds and
    ds->dev might get freed by the device driver.

    Be there a 2-switch tree and the following calling order:
    - Switch 1 calls dsa_register_switch
    - Calls dsa_switch_touch_ports, populates dst->ports
    - Calls dsa_port_parse_cpu, gets -EPROBE_DEFER, exits.
    - Switch 2 calls dsa_register_switch
    - Calls dsa_switch_touch_ports, populates dst->ports
    - Probe doesn't get deferred, so it goes ahead.
    - Calls dsa_tree_setup_routing_table, which returns "complete == true"
    due to Switch 1 having called dsa_switch_touch_ports before.
    - Because the DSA links are complete, it calls dsa_tree_setup_switches
    now.
    - dsa_tree_setup_switches iterates through dst->ports, initializing
    the Switch 1 ds structure (invalid) and the Switch 2 ds structure
    (valid).
    - Undefined behavior (use after free, sometimes NULL pointers, etc).

    Real example below (debugging prints added by me, as well as guards
    against NULL pointers):

    [ 5.477947] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803df0b980 (dev ffffff803f775c00)
    [ 6.313002] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803df0b980 (dev ffffff803f775c00)
    [ 6.319932] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803df0b980 (dev ffffff803f775c00)
    [ 6.329693] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803df0b980 (dev ffffff803f775c00)
    [ 6.339458] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803df0b980 (dev ffffff803f775c00)
    [ 6.349226] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803df0b980 (dev ffffff803f775c00)
    [ 6.358991] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803df0b980 (dev ffffff803f775c00)
    [ 6.368758] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803df0b980 (dev ffffff803f775c00)
    [ 6.378524] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803df0b980 (dev ffffff803f775c00)
    [ 6.388291] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803df0b980 (dev ffffff803f775c00)
    [ 6.398057] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803df0b980 (dev ffffff803f775c00)
    [ 6.407912] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803da02f80 (dev 0000000000000000)
    [ 6.417682] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803da02f80 (dev 0000000000000000)
    [ 6.427446] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803da02f80 (dev 0000000000000000)
    [ 6.437212] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803da02f80 (dev 0000000000000000)
    [ 6.446979] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803da02f80 (dev 0000000000000000)
    [ 6.456744] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803da02f80 (dev 0000000000000000)
    [ 6.466512] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803da02f80 (dev 0000000000000000)
    [ 6.476277] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803da02f80 (dev 0000000000000000)
    [ 6.486043] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803da02f80 (dev 0000000000000000)
    [ 6.495810] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803da02f80 (dev 0000000000000000)
    [ 6.505577] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803da02f80 (dev 0000000000000000)
    [ 6.515433] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803db15b80 (dev ffffff803d8e4800)
    [ 7.354120] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803db15b80 (dev ffffff803d8e4800)
    [ 7.361045] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803db15b80 (dev ffffff803d8e4800)
    [ 7.370805] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803db15b80 (dev ffffff803d8e4800)
    [ 7.380571] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803db15b80 (dev ffffff803d8e4800)
    [ 7.390337] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803db15b80 (dev ffffff803d8e4800)
    [ 7.400104] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803db15b80 (dev ffffff803d8e4800)
    [ 7.409872] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803db15b80 (dev ffffff803d8e4800)
    [ 7.419637] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803db15b80 (dev ffffff803d8e4800)
    [ 7.429403] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803db15b80 (dev ffffff803d8e4800)
    [ 7.439169] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803db15b80 (dev ffffff803d8e4800)

    The solution is to recognize that the functions that call
    dsa_switch_touch_ports (dsa_switch_parse_of, dsa_switch_parse) have side
    effects, and therefore one should clean up their side effects on error
    path. The cleanup of dst->ports was taken from dsa_switch_remove and
    moved into a dedicated dsa_switch_release_ports function, which should
    really be per-switch (free only the members of dst->ports that are also
    members of ds, instead of all switch ports).

    Signed-off-by: Vladimir Oltean
    Signed-off-by: David S. Miller

    Vladimir Oltean
     

20 Jan, 2020

1 commit


16 Jan, 2020

2 commits

  • DSA subsystem takes care of netdev statistics since commit 4ed70ce9f01c
    ("net: dsa: Refactor transmit path to eliminate duplication"), so
    any accounting inside tagger callbacks is redundant and can lead to
    messing up the stats.
    This bug is present in Qualcomm tagger since day 0.

    Fixes: cafdc45c949b ("net-next: dsa: add Qualcomm tag RX/TX handler")
    Reviewed-by: Andrew Lunn
    Signed-off-by: Alexander Lobakin
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Alexander Lobakin
     
  • The correct name is GSWIP (Gigabit Switch IP). Typo was introduced in
    875138f81d71a ("dsa: Move tagger name into its ops structure") while
    moving tagger names to their structures.

    Fixes: 875138f81d71a ("dsa: Move tagger name into its ops structure")
    Reviewed-by: Andrew Lunn
    Signed-off-by: Alexander Lobakin
    Reviewed-by: Florian Fainelli
    Acked-by: Hauke Mehrtens
    Signed-off-by: David S. Miller

    Alexander Lobakin
     

09 Jan, 2020

1 commit

  • It is possible to stack multiple DSA switches in a way that they are not
    part of the tree (disjoint) but the DSA master of a switch is a DSA
    slave of another. When that happens switch drivers may have to know this
    is the case so as to determine whether their tagging protocol has a
    remove chance of working.

    This is useful for specific switch drivers such as b53 where devices
    have been known to be stacked in the wild without the Broadcom tag
    protocol supporting that feature. This allows b53 to continue supporting
    those devices by forcing the disabling of Broadcom tags on the outermost
    switches if necessary.

    The get_tag_protocol() function is therefore updated to gain an
    additional enum dsa_tag_protocol argument which denotes the current
    tagging protocol used by the DSA master we are attached to, else
    DSA_TAG_PROTO_NONE for the top of the dsa_switch_tree.

    Signed-off-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Florian Fainelli
     

06 Jan, 2020

3 commits

  • The DSA drivers that implement .phylink_mac_link_state should normally
    register an interrupt for the PCS, from which they should call
    phylink_mac_change(). However not all switches implement this, and those
    who don't should set this flag in dsa_switch in the .setup callback, so
    that PHYLINK will poll for a few ms until the in-band AN link timer
    expires and the PCS state settles.

    Signed-off-by: Vladimir Oltean
    Signed-off-by: David S. Miller

    Vladimir Oltean
     
  • This is a cosmetic patch that makes the dp, tx_vid, queue_mapping and
    pcp local variable definitions a bit closer in length, so they don't
    look like an eyesore as much.

    The 'ds' variable is not used otherwise, except for ds->dp.

    Signed-off-by: Vladimir Oltean
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Vladimir Oltean
     
  • There are 3 things that are wrong with the DSA deferred xmit mechanism:

    1. Its introduction has made the DSA hotpath ever so slightly more
    inefficient for everybody, since DSA_SKB_CB(skb)->deferred_xmit needs
    to be initialized to false for every transmitted frame, in order to
    figure out whether the driver requested deferral or not (a very rare
    occasion, rare even for the only driver that does use this mechanism:
    sja1105). That was necessary to avoid kfree_skb from freeing the skb.

    2. Because L2 PTP is a link-local protocol like STP, it requires
    management routes and deferred xmit with this switch. But as opposed
    to STP, the deferred work mechanism needs to schedule the packet
    rather quickly for the TX timstamp to be collected in time and sent
    to user space. But there is no provision for controlling the
    scheduling priority of this deferred xmit workqueue. Too bad this is
    a rather specific requirement for a feature that nobody else uses
    (more below).

    3. Perhaps most importantly, it makes the DSA core adhere a bit too
    much to the NXP company-wide policy "Innovate Where It Doesn't
    Matter". The sja1105 is probably the only DSA switch that requires
    some frames sent from the CPU to be routed to the slave port via an
    out-of-band configuration (register write) rather than in-band (DSA
    tag). And there are indeed very good reasons to not want to do that:
    if that out-of-band register is at the other end of a slow bus such
    as SPI, then you limit that Ethernet flow's throughput to effectively
    the throughput of the SPI bus. So hardware vendors should definitely
    not be encouraged to design this way. We do _not_ want more
    widespread use of this mechanism.

    Luckily we have a solution for each of the 3 issues:

    For 1, we can just remove that variable in the skb->cb and counteract
    the effect of kfree_skb with skb_get, much to the same effect. The
    advantage, of course, being that anybody who doesn't use deferred xmit
    doesn't need to do any extra operation in the hotpath.

    For 2, we can create a kernel thread for each port's deferred xmit work.
    If the user switch ports are named swp0, swp1, swp2, the kernel threads
    will be named swp0_xmit, swp1_xmit, swp2_xmit (there appears to be a 15
    character length limit on kernel thread names). With this, the user can
    change the scheduling priority with chrt $(pidof swp2_xmit).

    For 3, we can actually move the entire implementation to the sja1105
    driver.

    So this patch deletes the generic implementation from the DSA core and
    adds a new one, more adequate to the requirements of PTP TX
    timestamping, in sja1105_main.c.

    Suggested-by: Florian Fainelli
    Signed-off-by: Vladimir Oltean
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Vladimir Oltean
     

29 Dec, 2019

1 commit

  • It is possible to kill PTP on a DSA switch completely and absolutely,
    until a reboot, with a simple command:

    tcpdump -i eth2 -j adapter_unsynced

    where eth2 is the switch's DSA master.

    Why? Well, in short, the PTP API in place today is a bit rudimentary and
    relies on applications to retrieve the TX timestamps by polling the
    error queue and looking at the cmsg structure. But there is no timestamp
    identification of any sorts (except whether it's HW or SW), you don't
    know how many more timestamps are there to come, which one is this one,
    from whom it is, etc. In other words, the SO_TIMESTAMPING API is
    fundamentally limited in that you can get a single HW timestamp from the
    stack.

    And the "-j adapter_unsynced" flag of tcpdump enables hardware
    timestamping.

    So let's imagine what happens when the DSA master decides it wants to
    deliver TX timestamps to the skb's socket too:
    - The timestamp that the user space sees is taken by the DSA master.
    Whereas the RX timestamp will eventually be overwritten by the DSA
    switch. So the RX and TX timestamps will be in different time bases
    (aka garbage).
    - The user space applications have no way to deal with the second (real)
    TX timestamp finally delivered by the DSA switch, or even to know to
    wait for it.

    Take ptp4l from the linuxptp project, for example. This is its behavior
    after running tcpdump, before the patch:

    ptp4l[172]: [6469.594] Unexpected data on socket err queue:
    ptp4l[172]: [6469.693] rms 8 max 16 freq -21257 +/- 11 delay 748 +/- 0
    ptp4l[172]: [6469.711] Unexpected data on socket err queue:
    ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 05 00 fd
    ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00
    ptp4l[172]: [6469.721] Unexpected data on socket err queue:
    ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02
    ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00
    ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 01 c6 b1 00 fd
    ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00
    ptp4l[172]: [6469.838] Unexpected data on socket err queue:
    ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02
    ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00
    ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 06 00 fd
    ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00
    ptp4l[172]: [6469.848] Unexpected data on socket err queue:
    ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 13 02
    ptp4l[172]: 0010 00 36 00 00 02 00 00 00 00 00 00 00 00 00 00 00
    ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 04 1a 45 05 7f
    ptp4l[172]: 0030 00 00 5e 05 41 32 27 c2 1a 68 00 04 9f ff fe 05
    ptp4l[172]: 0040 de 06 00 01
    ptp4l[172]: [6469.855] Unexpected data on socket err queue:
    ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02
    ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00
    ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 01 c6 b2 00 fd
    ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00
    ptp4l[172]: [6469.974] Unexpected data on socket err queue:
    ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02
    ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00
    ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 07 00 fd
    ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00

    The ptp4l program itself is heavily patched to show this (more details
    here [0]). Otherwise, by default it just hangs.

    On the other hand, with the DSA patch to disallow HW timestamping
    applied:

    tcpdump -i eth2 -j adapter_unsynced
    tcpdump: SIOCSHWTSTAMP failed: Device or resource busy

    So it is a fact of life that PTP timestamping on the DSA master is
    incompatible with timestamping on the switch MAC, at least with the
    current API. And if the switch supports PTP, taking the timestamps from
    the switch MAC is highly preferable anyway, due to the fact that those
    don't contain the queuing latencies of the switch. So just disallow PTP
    on the DSA master if there is any PTP-capable switch attached.

    [0]: https://sourceforge.net/p/linuxptp/mailman/message/36880648/

    Fixes: 0336369d3a4d ("net: dsa: forward hardware timestamping ioctls to switch driver")
    Signed-off-by: Vladimir Oltean
    Acked-by: Richard Cochran
    Signed-off-by: David S. Miller

    Vladimir Oltean
     

23 Dec, 2019

1 commit


21 Dec, 2019

2 commits


18 Dec, 2019

2 commits

  • dsa_link_touch() is not exported, or defined outside of the
    file it is in so make it static to avoid the following warning:

    net/dsa/dsa2.c:127:17: warning: symbol 'dsa_link_touch' was not declared. Should it be static?

    Signed-off-by: Ben Dooks (Codethink)
    Signed-off-by: David S. Miller

    Ben Dooks (Codethink)
     
  • Commit 77373d49de22 ("net: dsa: Move the phylink driver calls into
    port.c") moved and exported a bunch of symbols, but they are not used
    outside of net/dsa/port.c at the moment, so no reason to export them.

    Reported-by: Russell King
    Signed-off-by: Florian Fainelli
    Reviewed-by: Vivien Didelot
    Acked-by: Russell King
    Acked-by: Vladimir Oltean
    Signed-off-by: David S. Miller

    Florian Fainelli
     

24 Nov, 2019

1 commit

  • Rename the mac_link_state() method to mac_pcs_get_state() to make it
    clear that it should be returning the MACs PCS current state, which
    is used for inband negotiation rather than just reading back what the
    MAC has been configured for. Update the documentation to explicitly
    mention that this is for inband.

    We drop the return value as well; most of phylink doesn't check the
    return value and it is not clear what it should do on error - instead
    arrange for state->link to be false.

    Signed-off-by: Russell King
    Signed-off-by: Jakub Kicinski

    Russell King
     

22 Nov, 2019

1 commit

  • This patch is to reuse ocelot functions as possible to enable PTP
    clock and to support hardware timestamping on Felix.
    On TX path, timestamping works on packet which requires timestamp.
    The injection header will be configured accordingly, and skb clone
    requires timestamp will be added into a list. The TX timestamp
    is final handled in threaded interrupt handler when PTP timestamp
    FIFO is ready.
    On RX path, timestamping is always working. The RX timestamp could
    be got from extraction header.

    Signed-off-by: Yangbo Lu
    Signed-off-by: David S. Miller

    Yangbo Lu
     

17 Nov, 2019

1 commit