21 Oct, 2017

5 commits

  • The length of GVI (GetVersionInfo) response packet should be 40 instead
    of 36. This issue was found from /sys/kernel/debug/ncsi/eth0/stats.

    # ethtool --ncsi eth0 swstats
    :
    RESPONSE OK TIMEOUT ERROR
    =======================================
    GVI 0 0 2

    With this applied, no error reported on GVI response packets:

    # ethtool --ncsi eth0 swstats
    :
    RESPONSE OK TIMEOUT ERROR
    =======================================
    GVI 2 0 0

    Signed-off-by: Gavin Shan
    Signed-off-by: Samuel Mendoza-Jonas
    Signed-off-by: David S. Miller

    Gavin Shan
     
  • The NCSI channel has been configured to provide service if its link
    monitor timer is enabled, regardless of its state (inactive or active).
    So the timeout event on the link monitor indicates the out-of-service
    on that channel, for which a failover is needed.

    This sets NCSI_DEV_RESHUFFLE flag to enforce failover on link monitor
    timeout, regardless the channel's original state (inactive or active).
    Also, the link is put into "down" state to give the failing channel
    lowest priority when selecting for the active channel. The state of
    failing channel should be set to active in order for deinitialization
    and failover to be done.

    Signed-off-by: Gavin Shan
    Signed-off-by: Samuel Mendoza-Jonas
    Signed-off-by: David S. Miller

    Gavin Shan
     
  • When there are no NCSI channels probed, HWA (Hardware Arbitration)
    mode is enabled. It's not correct because HWA depends on the fact:
    NCSI channels exist and all of them support HWA mode. This disables
    HWA when no channels are probed.

    Signed-off-by: Gavin Shan
    Signed-off-by: Samuel Mendoza-Jonas
    Signed-off-by: David S. Miller

    Gavin Shan
     
  • ncsi_channel_monitor() misses stopping the channel monitor in several
    places that it should, causing a WARN_ON_ONCE() to trigger when the
    monitor is re-started later, eg:

    [ 459.040000] WARNING: CPU: 0 PID: 1093 at net/ncsi/ncsi-manage.c:269 ncsi_start_channel_monitor+0x7c/0x90
    [ 459.040000] CPU: 0 PID: 1093 Comm: kworker/0:3 Not tainted 4.10.17-gaca2fdd #140
    [ 459.040000] Hardware name: ASpeed SoC
    [ 459.040000] Workqueue: events ncsi_dev_work
    [ 459.040000] [] (unwind_backtrace) from [] (show_stack+0x20/0x24)
    [ 459.040000] [] (show_stack) from [] (dump_stack+0x20/0x28)
    [ 459.040000] [] (dump_stack) from [] (__warn+0xe0/0x108)
    [ 459.040000] [] (__warn) from [] (warn_slowpath_null+0x30/0x38)
    [ 459.040000] [] (warn_slowpath_null) from [] (ncsi_start_channel_monitor+0x7c/0x90)
    [ 459.040000] [] (ncsi_start_channel_monitor) from [] (ncsi_configure_channel+0xdc/0x5fc)
    [ 459.040000] [] (ncsi_configure_channel) from [] (ncsi_dev_work+0xac/0x474)
    [ 459.040000] [] (ncsi_dev_work) from [] (process_one_work+0x1e0/0x450)
    [ 459.040000] [] (process_one_work) from [] (worker_thread+0x5c/0x570)
    [ 459.040000] [] (worker_thread) from [] (kthread+0x124/0x164)
    [ 459.040000] [] (kthread) from [] (ret_from_fork+0x14/0x2c)

    This also updates the monitor instead of just returning if
    ncsi_xmit_cmd() fails to send the get-link-status command so that the
    monitor properly times out.

    Fixes: e6f44ed6d04d3 "net/ncsi: Package and channel management"

    Signed-off-by: Samuel Mendoza-Jonas
    Signed-off-by: David S. Miller

    Samuel Mendoza-Jonas
     
  • Correct the value of the HNCDSC AEN packet.
    Fixes: 7a82ecf4cfb85 "net/ncsi: NCSI AEN packet handler"

    Signed-off-by: Samuel Mendoza-Jonas
    Signed-off-by: David S. Miller

    Samuel Mendoza-Jonas
     

12 Oct, 2017

1 commit

  • Currently we drop any new VLAN ids if there are more than the current
    (or last used) channel can support. Most importantly this is a problem
    if no channel has been selected yet, resulting in a segfault.

    Secondly this does not necessarily reflect the capabilities of any other
    channels. Instead only drop a new VLAN id if we are already tracking the
    maximum allowed by the NCSI specification. Per-channel limits are
    already handled by ncsi_add_filter(), but add a message to set_one_vid()
    to make it obvious that the channel can not support any more VLAN ids.

    Signed-off-by: Samuel Mendoza-Jonas
    Signed-off-by: David S. Miller

    Samuel Mendoza-Jonas
     

06 Sep, 2017

1 commit

  • We get a new link error in allmodconfig kernels after ftgmac100
    started using the ncsi helpers:

    ERROR: "ncsi_vlan_rx_kill_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
    ERROR: "ncsi_vlan_rx_add_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined!

    Related to that, we get another error when CONFIG_NET_NCSI is disabled:

    drivers/net/ethernet/faraday/ftgmac100.c:1626:25: error: 'ncsi_vlan_rx_add_vid' undeclared here (not in a function); did you mean 'ncsi_start_dev'?
    drivers/net/ethernet/faraday/ftgmac100.c:1627:26: error: 'ncsi_vlan_rx_kill_vid' undeclared here (not in a function); did you mean 'ncsi_vlan_rx_add_vid'?

    This fixes both problems at once, using a 'static inline' stub helper
    for the disabled case, and exporting the functions when they are present.

    Fixes: 51564585d8c6 ("ftgmac100: Support NCSI VLAN filtering when available")
    Fixes: 21acf63013ed ("net/ncsi: Configure VLAN tag filter")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

29 Aug, 2017

2 commits

  • Make use of the ndo_vlan_rx_{add,kill}_vid callbacks to have the NCSI
    stack process new VLAN tags and configure the channel VLAN filter
    appropriately.
    Several VLAN tags can be set and a "Set VLAN Filter" packet must be sent
    for each one, meaning the ncsi_dev_state_config_svf state must be
    repeated. An internal list of VLAN tags is maintained, and compared
    against the current channel's ncsi_channel_filter in order to keep track
    within the state. VLAN filters are removed in a similar manner, with the
    introduction of the ncsi_dev_state_config_clear_vids state. The maximum
    number of VLAN tag filters is determined by the "Get Capabilities"
    response from the channel.

    Signed-off-by: Samuel Mendoza-Jonas
    Signed-off-by: David S. Miller

    Samuel Mendoza-Jonas
     
  • Signed-off-by: Samuel Mendoza-Jonas
    Signed-off-by: David S. Miller

    Samuel Mendoza-Jonas
     

16 Jun, 2017

2 commits

  • It seems like a historic accident that these return unsigned char *,
    and in many places that means casts are required, more often than not.

    Make these functions return void * and remove all the casts across
    the tree, adding a (u8 *) cast only where the unsigned char pointer
    was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    @@
    expression SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - fn(SKB, LEN)[0]
    + *(u8 *)fn(SKB, LEN)

    Note that the last part there converts from push(...)[0] to the
    more idiomatic *(u8 *)push(...).

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • There were many places that my previous spatch didn't find,
    as pointed out by yuan linyu in various patches.

    The following spatch found many more and also removes the
    now unnecessary casts:

    @@
    identifier p, p2;
    expression len;
    expression skb;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, len);
    |
    -memset(p, 0, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, sizeof(*p));
    |
    -memset(p, 0, sizeof(*p));
    )

    @@
    expression skb, len;
    @@
    -memset(skb_put(skb, len), 0, len);
    +skb_put_zero(skb, len);

    Apply it to the tree (with one manual fixup to keep the
    comment in vxlan.c, which spatch removed.)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

20 Oct, 2016

4 commits

  • This improves AEN handler for Host Network Controller Driver Status
    Change (HNCDSC):

    * The channel's lock should be hold when accessing its state.
    * Do failover when host driver isn't ready.
    * Configure channel when host driver becomes ready.

    Signed-off-by: Gavin Shan
    Signed-off-by: David S. Miller

    Gavin Shan
     
  • The issue was found on BCM5718 which has two NCSI channels in one
    package: C0 and C1. C0 is in link-up state while C1 is in link-down
    state. C0 is chosen as active channel until unplugging and plugging
    C0's cable: On unplugging C0's cable, LSC (Link State Change) AEN
    packet received on C0 to report link-down event. After that, C1 is
    chosen as active channel. LSC AEN for link-up event is lost on C0
    when plugging C0's cable back. We lose the network even C0 is usable.

    This resolves the issue by recording the (hot) channel that was ever
    chosen as active one. The hot channel is chosen to be active one
    if none of available channels in link-up state. With this, C0 is still
    the active one after unplugging C0's cable. LSC AEN packet received
    on C0 when plugging its cable back.

    Signed-off-by: Gavin Shan
    Signed-off-by: David S. Miller

    Gavin Shan
     
  • The issue was found on BCM5718 which has two NCSI channels in one
    package: C0 and C1. Both of them are connected to different LANs,
    means they are in link-up state and C0 is chosen as the active one
    until resetting BCM5718 happens as below.

    Resetting BCM5718 results in LSC (Link State Change) AEN packet
    received on C0, meaning LSC AEN is missed on C1. When LSC AEN packet
    received on C0 to report link-down, it fails over to C1 because C1
    is in link-up state as software can see. However, C1 is in link-down
    state in hardware. It means the link state is out of synchronization
    between hardware and software, resulting in inappropriate channel (C1)
    selected as active one.

    This resolves the issue by sending separate GLS (Get Link Status)
    commands to all channels in the package before trying to do failover.
    The last link states of all channels in the package are retrieved.
    With it, C0 (not C1) is selected as active one as expected.

    Signed-off-by: Gavin Shan
    Signed-off-by: David S. Miller

    Gavin Shan
     
  • There are several if/else statements in the state machine implemented
    by switch/case in ncsi_suspend_channel() to avoid duplicated code. It
    makes the code a bit hard to be understood.

    This drops if/else statements in ncsi_suspend_channel() to improve the
    code readability as Joel Stanley suggested. Also, it becomes easy to
    add more states in the state machine without affecting current code.
    No logical changes introduced by this.

    Suggested-by: Joel Stanley
    Signed-off-by: Gavin Shan
    Signed-off-by: David S. Miller

    Gavin Shan
     

04 Oct, 2016

7 commits

  • This introduces ncsi_stop_dev(), as counterpart to ncsi_start_dev(),
    to stop the NCSI device so that it can be reenabled in future. This
    API should be called when the network device driver is going to
    shutdown the device. There are 3 things done in the function: Stop
    the channel monitoring; Reset channels to inactive state; Report
    NCSI link down.

    Signed-off-by: Gavin Shan
    Reviewed-by: Joel Stanley
    Signed-off-by: David S. Miller

    Gavin Shan
     
  • The original NCSI channel monitoring was implemented based on a
    backoff algorithm: the GLS response should be received in the
    specified interval. Otherwise, the channel is regarded as dead
    and failover should be taken if current channel is an active one.
    There are several problems in the implementation: (A) On BCM5718,
    we found when the IID (Instance ID) in the GLS command packet
    changes from 255 to 1, the response corresponding to IID#1 never
    comes in. It means we cannot make the unfair judgement that the
    channel is dead when one response is missed. (B) The code's
    readability should be improved. (C) We should do failover when
    current channel is active one and the channel monitoring should
    be marked as disabled before doing failover.

    This reworks the channel monitoring to address all above issues.
    The fields for channel monitoring is put into separate struct
    and the state of channel monitoring is predefined. The channel
    is regarded alive if the network controller responses to one of
    two GLS commands or both of them in 5 seconds.

    Signed-off-by: Gavin Shan
    Reviewed-by: Joel Stanley
    Signed-off-by: David S. Miller

    Gavin Shan
     
  • There is only one NCSI request property for now: the response for
    the sent command need drive the workqueue or not. So we had one
    field (@driven) for the purpose. We lost the flexibility to extend
    NCSI request properties.

    This replaces @driven with @flags and @req_flags in NCSI request
    and NCSI command argument struct. Each bit of the newly introduced
    field can be used for one property. No functional changes introduced.

    Signed-off-by: Gavin Shan
    Reviewed-by: Joel Stanley
    Signed-off-by: David S. Miller

    Gavin Shan
     
  • The NCSI request index (struct ncsi_request::id) is put into instance
    ID (IID) field while sending NCSI command packet. It was designed the
    available IDs are given in round-robin fashion. @ndp->request_id was
    introduced to represent the next available ID, but it has been used
    as number of successively allocated IDs. It breaks the round-robin
    design. Besides, we shouldn't put 0 to NCSI command packet's IID
    field, meaning ID#0 should be reserved according section 6.3.1.1
    in NCSI spec (v1.1.0).

    This fixes above two issues. With it applied, the available IDs will
    be assigned in round-robin fashion and ID#0 won't be assigned.

    Signed-off-by: Gavin Shan
    Reviewed-by: Joel Stanley
    Signed-off-by: David S. Miller

    Gavin Shan
     
  • We needn't send CIS (Clear Initial State) command to the NCSI
    reserved channel (0x1f) in the enumeration. We shouldn't receive
    a valid response from CIS on NCSI channel 0x1f.

    Signed-off-by: Gavin Shan
    Reviewed-by: Joel Stanley
    Signed-off-by: David S. Miller

    Gavin Shan
     
  • This defines NCSI_RESERVED_CHANNEL as the reserved NCSI channel
    ID (0x1f). No logical changes introduced.

    Signed-off-by: Gavin Shan
    Reviewed-by: Joel Stanley
    Signed-off-by: David S. Miller

    Gavin Shan
     
  • xchg() is used to set NCSI channel's state in order for consistent
    access to the state. xchg()'s return value should be used. Otherwise,
    one build warning will be raised (with -Wunused-value) as below message
    indicates. It is reported by ia64-linux-gcc (GCC) 4.9.0.

    net/ncsi/ncsi-manage.c: In function 'ncsi_channel_monitor':
    arch/ia64/include/uapi/asm/cmpxchg.h:56:2: warning: value computed is \
    not used [-Wunused-value]
    ((__typeof__(*(ptr))) __xchg((unsigned long) (x), (ptr), sizeof(*(ptr))))
    ^
    net/ncsi/ncsi-manage.c:202:3: note: in expansion of macro 'xchg'
    xchg(&nc->state, NCSI_CHANNEL_INACTIVE);

    This removes the atomic access to NCSI channel's state avoid the above
    build warning. We have to hold the channel's lock when its state is readed
    or updated. No functional changes introduced.

    Signed-off-by: Gavin Shan
    Reviewed-by: Joel Stanley
    Signed-off-by: David S. Miller

    Gavin Shan
     

26 Jul, 2016

1 commit

  • gcc-4.9 and higher warn about the newly added NSCI code:

    net/ncsi/ncsi-manage.c: In function 'ncsi_process_next_channel':
    net/ncsi/ncsi-manage.c:1003:2: error: 'old_state' may be used uninitialized in this function [-Werror=maybe-uninitialized]

    The warning is a false positive and therefore harmless, but it would be good to
    avoid it anyway. I have determined that the barrier in the spin_unlock_irqsave()
    is what confuses gcc to the point that it cannot track whether the variable
    was unused or not.

    This rearranges the code in a way that makes it obvious to gcc that old_state
    is always initialized at the time of use, functionally this should not
    change anything.

    Signed-off-by: Arnd Bergmann
    Acked-by: Gavin Shan
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

20 Jul, 2016

5 commits

  • This introduces NCSI AEN packet handlers that result in (A) the
    currently active channel is reconfigured; (B) Currently active
    channel is deconfigured and disabled, another channel is chosen
    as active one and configured. Case (B) won't happen if hardware
    arbitration has been enabled, the channel that was in active
    state is suspended simply.

    Signed-off-by: Gavin Shan
    Acked-by: Joel Stanley
    Signed-off-by: David S. Miller

    Gavin Shan
     
  • This manages NCSI packages and channels:

    * The available packages and channels are enumerated in the first
    time of calling ncsi_start_dev(). The channels' capabilities are
    probed in the meanwhile. The NCSI network topology won't change
    until the NCSI device is destroyed.
    * There in a queue in every NCSI device. The element in the queue,
    channel, is waiting for configuration (bringup) or suspending
    (teardown). The channel's state (inactive/active) indicates the
    futher action (configuration or suspending) will be applied on the
    channel. Another channel's state (invisible) means the requested
    action is being applied.
    * The hardware arbitration will be enabled if all available packages
    and channels support it. All available channels try to provide
    service when hardware arbitration is enabled. Otherwise, one channel
    is selected as the active one at once.
    * When channel is in active state, meaning it's providing service, a
    timer started to retrieve the channe's link status. If the channel's
    link status fails to be updated in the determined period, the channel
    is going to be reconfigured. It's the error handling implementation
    as defined in NCSI spec.

    Signed-off-by: Gavin Shan
    Acked-by: Joel Stanley
    Signed-off-by: David S. Miller

    Gavin Shan
     
  • The NCSI response packets are sent to MC (Management Controller)
    from the remote end. They are responses of NCSI command packets
    for multiple purposes: completion status of NCSI command packets,
    return NCSI channel's capability or configuration etc.

    This defines struct to represent NCSI response packets and introduces
    function ncsi_rcv_rsp() which will be used to receive NCSI response
    packets and parse them.

    Signed-off-by: Gavin Shan
    Acked-by: Joel Stanley
    Signed-off-by: David S. Miller

    Gavin Shan
     
  • The NCSI command packets are sent from MC (Management Controller)
    to remote end. They are used for multiple purposes: probe existing
    NCSI package/channel, retrieve NCSI channel's capability, configure
    NCSI channel etc.

    This defines struct to represent NCSI command packets and introduces
    function ncsi_xmit_cmd(), which will be used to transmit NCSI command
    packet according to the request. The request is represented by struct
    ncsi_cmd_arg.

    Signed-off-by: Gavin Shan
    Acked-by: Joel Stanley
    Signed-off-by: David S. Miller

    Gavin Shan
     
  • NCSI spec (DSP0222) defines several objects: package, channel, mode,
    filter, version and statistics etc. This introduces the data structs
    to represent those objects and implement functions to manage them.
    Also, this introduces CONFIG_NET_NCSI for the newly implemented NCSI
    stack.

    * The user (e.g. netdev driver) dereference NCSI device by
    "struct ncsi_dev", which is embedded to "struct ncsi_dev_priv".
    The later one is used by NCSI stack internally.
    * Every NCSI device can have multiple packages simultaneously, up
    to 8 packages. It's represented by "struct ncsi_package" and
    identified by 3-bits ID.
    * Every NCSI package can have multiple channels, up to 32. It's
    represented by "struct ncsi_channel" and identified by 5-bits ID.
    * Every NCSI channel has version, statistics, various modes and
    filters. They are represented by "struct ncsi_channel_version",
    "struct ncsi_channel_stats", "struct ncsi_channel_mode" and
    "struct ncsi_channel_filter" separately.
    * Apart from AEN (Asynchronous Event Notification), the NCSI stack
    works in terms of command and response. This introduces "struct
    ncsi_req" to represent a complete NCSI transaction made of NCSI
    request and response.

    link: https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.1.0.pdf
    Signed-off-by: Gavin Shan
    Acked-by: Joel Stanley
    Signed-off-by: David S. Miller

    Gavin Shan