01 Apr, 2020

37 commits

  • [ Upstream commit 6cd6cbf593bfa3ae6fc3ed34ac21da4d35045425 ]

    When application uses TCP_QUEUE_SEQ socket option to
    change tp->rcv_next, we must also update tp->copied_seq.

    Otherwise, stuff relying on tcp_inq() being precise can
    eventually be confused.

    For example, tcp_zerocopy_receive() might crash because
    it does not expect tcp_recv_skb() to return NULL.

    We could add tests in various places to fix the issue,
    or simply make sure tcp_inq() wont return a random value,
    and leave fast path as it is.

    Note that this fixes ioctl(fd, SIOCINQ, &val) at the same
    time.

    Fixes: ee9952831cfd ("tcp: Initial repair mode")
    Fixes: 05255b823a61 ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit b738a185beaab8728943acdb3e67371b8a88185e ]

    skb->rbnode is sharing three skb fields : next, prev, dev

    When a packet is sent, TCP keeps the original skb (master)
    in a rtx queue, which was converted to rbtree a while back.

    __tcp_transmit_skb() is responsible to clone the master skb,
    and add the TCP header to the clone before sending it
    to network layer.

    skb_clone() already clears skb->next and skb->prev, but copies
    the master oskb->dev into the clone.

    We need to clear skb->dev, otherwise lower layers could interpret
    the value as a pointer to a netdev.

    This old bug surfaced recently when commit 28f8bfd1ac94
    ("netfilter: Support iif matches in POSTROUTING") was merged.

    Before this netfilter commit, skb->dev value was ignored and
    changed before reaching dev_queue_xmit()

    Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
    Fixes: 28f8bfd1ac94 ("netfilter: Support iif matches in POSTROUTING")
    Signed-off-by: Eric Dumazet
    Reported-by: Martin Zaharinov
    Cc: Florian Westphal
    Cc: Pablo Neira Ayuso
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 07f8e4d0fddbf2f87e4cefb551278abc38db8cdd ]

    In rare cases retransmit logic will make a full skb copy, which will not
    trigger the zeroing added in recent change
    b738a185beaa ("tcp: ensure skb->dev is NULL before leaving TCP stack").

    Cc: Eric Dumazet
    Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
    Fixes: 28f8bfd1ac94 ("netfilter: Support iif matches in POSTROUTING")
    Signed-off-by: Florian Westphal
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit 2091a3d42b4f339eaeed11228e0cbe9d4f92f558 ]

    As the description before netdev_run_todo, we cannot call free_netdev
    before rtnl_unlock, fix it by reorder the code.

    This patch is a 1:1 copy of upstream slip.c commit f596c87005f7
    ("slip: not call free_netdev before rtnl_unlock in slip_open").

    Reported-by: yangerkun
    Signed-off-by: Oliver Hartkopp
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Oliver Hartkopp
     
  • [ Upstream commit f13bc68131b0c0d67a77fb43444e109828a983bf ]

    The original change fixed an issue on RTL8168b by mimicking the vendor
    driver behavior to disable MSI on chip versions before RTL8168d.
    This however now caused an issue on a system with RTL8168c, see [0].
    Therefore leave MSI disabled on RTL8168b, but re-enable it on RTL8168c.

    [0] https://bugzilla.redhat.com/show_bug.cgi?id=1792839

    Fixes: 003bd5b4a7b4 ("r8169: don't use MSI before RTL8168d")
    Signed-off-by: Heiner Kallweit
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Heiner Kallweit
     
  • [ Upstream commit 0dcdf9f64028ec3b75db6b691560f8286f3898bf ]

    The nci_conn_max_data_pkt_payload_size() function sometimes returns
    -EPROTO so "max_size" needs to be signed for the error handling to
    work. We can make "payload_size" an int as well.

    Fixes: a06347c04c13 ("NFC: Add Intel Fields Peak NFC solution driver")
    Signed-off-by: Dan Carpenter
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • [ Upstream commit 9de9aa487daff7a5c73434c24269b44ed6a428e6 ]

    Make sure we clean up devicetree related configuration
    also when clock init fails.

    Fixes: fecd4d7eef8b ("net: stmmac: dwmac-rk: Add integrated PHY support")
    Signed-off-by: Emil Renner Berthing
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Emil Renner Berthing
     
  • [ Upstream commit 0d1c3530e1bd38382edef72591b78e877e0edcd3 ]

    In commit 599be01ee567 ("net_sched: fix an OOB access in cls_tcindex")
    I moved cp->hash calculation before the first
    tcindex_alloc_perfect_hash(), but cp->alloc_hash is left untouched.
    This difference could lead to another out of bound access.

    cp->alloc_hash should always be the size allocated, we should
    update it after this tcindex_alloc_perfect_hash().

    Reported-and-tested-by: syzbot+dcc34d54d68ef7d2d53d@syzkaller.appspotmail.com
    Reported-and-tested-by: syzbot+c72da7b9ed57cde6fca2@syzkaller.appspotmail.com
    Fixes: 599be01ee567 ("net_sched: fix an OOB access in cls_tcindex")
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit b1be2e8cd290f620777bfdb8aa00890cd2fa02b5 ]

    syzbot reported a use-after-free in tcindex_dump(). This is due to
    the lack of RTNL in the deferred rcu work. We queue this work with
    RTNL in tcindex_change(), later, tcindex_dump() is called:

    fh = tp->ops->get(tp, t->tcm_handle);
    ...
    err = tp->ops->change(..., &fh, ...);
    tfilter_notify(..., fh, ...);

    but there is nothing to serialize the pending
    tcindex_partial_destroy_work() with tcindex_dump().

    Fix this by simply holding RTNL in tcindex_partial_destroy_work(),
    so that it won't be called until RTNL is released after
    tc_new_tfilter() is completed.

    Reported-and-tested-by: syzbot+653090db2562495901dc@syzkaller.appspotmail.com
    Fixes: 3d210534cc93 ("net_sched: fix a race condition in tcindex_destroy()")
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit ef299cc3fa1a9e1288665a9fdc8bff55629fd359 ]

    route4_change() allocates a new filter and copies values from
    the old one. After the new filter is inserted into the hash
    table, the old filter should be removed and freed, as the final
    step of the update.

    However, the current code mistakenly removes the new one. This
    looks apparently wrong to me, and it causes double "free" and
    use-after-free too, as reported by syzbot.

    Reported-and-tested-by: syzbot+f9b32aaacd60305d9687@syzkaller.appspotmail.com
    Reported-and-tested-by: syzbot+2f8c233f131943d6056d@syzkaller.appspotmail.com
    Reported-and-tested-by: syzbot+9c2df9fd5e9445b74e01@syzkaller.appspotmail.com
    Fixes: 1109c00547fc ("net: sched: RCU cls_route")
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Cc: John Fastabend
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit dd2af10402684cb5840a127caec9e7cdcff6d167 ]

    Currently, on replace, the previous action instance params
    is swapped with a newly allocated params. The old params is
    only freed (via kfree_rcu), without releasing the allocated
    ct zone template related to it.

    Call tcf_ct_params_free (via call_rcu) for the old params,
    so it will release it.

    Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
    Signed-off-by: Paul Blakey
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Paul Blakey
     
  • [ Upstream commit 12a5ba5a1994568d4ceaff9e78c6b0329d953386 ]

    ASKEY WWHC050 is a mcie LTE modem.
    The oem configuration states:

    T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=480 MxCh= 0
    D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
    P: Vendor=1690 ProdID=7588 Rev=ff.ff
    S: Manufacturer=Android
    S: Product=Android
    S: SerialNumber=813f0eef6e6e
    C:* #Ifs= 6 Cfg#= 1 Atr=80 MxPwr=500mA
    I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
    E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
    E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
    E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
    E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
    E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
    E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
    E: Ad=88(I) Atr=03(Int.) MxPS= 8 Ivl=32ms
    E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:* If#= 5 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=(none)
    E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=125us

    Tested on openwrt distribution.

    Signed-off-by: Cezary Jackiewicz
    Signed-off-by: Pawel Dembicki
    Acked-by: Bjørn Mork
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Pawel Dembicki
     
  • [ Upstream commit 872307abbd0d9afd72171929806c2fa33dc34179 ]

    Check clk_prepare_enable() return value.

    Fixes: 2c7230446bc9 ("net: phy: Add pm support to Broadcom iProc mdio mux driver")
    Signed-off-by: Rayagonda Kokatanur
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Rayagonda Kokatanur
     
  • [ Upstream commit c312c7818b86b663d32ec5d4b512abf06b23899a ]

    The DT binding for this PHY describes an *optional* clock property.
    Due to a bug in the error handling logic, we are actually ignoring this
    clock *all* of the time so far.

    Fix this by using devm_clk_get_optional() to handle this clock properly.

    Fixes: b78ac6ecd1b6b ("net: phy: mdio-bcm-unimac: Allow configuring MDIO clock divider")
    Signed-off-by: Andre Przywara
    Reviewed-by: Andrew Lunn
    Acked-by: Florian Fainelli
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Andre Przywara
     
  • [ Upstream commit 749f6f6843115b424680f1aada3c0dd613ad807c ]

    When the DP83867 PHY is strapped to enable Fast Link Drop (FLD) feature
    STRAP_STS2.STRAP_ FLD (reg 0x006F bit 10), the Energy Lost Threshold for
    FLD Energy Lost Mode FLD_THR_CFG.ENERGY_LOST_FLD_THR (reg 0x002e bits 2:0)
    will be defaulted to 0x2. This may cause the phy link to be unstable. The
    new DP83867 DM recommends to always restore ENERGY_LOST_FLD_THR to 0x1.

    Hence, restore default value of FLD_THR_CFG.ENERGY_LOST_FLD_THR to 0x1 when
    FLD is enabled by bootstrapping as recommended by DM.

    Signed-off-by: Grygorii Strashko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Grygorii Strashko
     
  • [ Upstream commit 61fad6816fc10fb8793a925d5c1256d1c3db0cd2 ]

    PACKET_RX_RING can cause multiple writers to access the same slot if a
    fast writer wraps the ring while a slow writer is still copying. This
    is particularly likely with few, large, slots (e.g., GSO packets).

    Synchronize kernel thread ownership of rx ring slots with a bitmap.

    Writers acquire a slot race-free by testing tp_status TP_STATUS_KERNEL
    while holding the sk receive queue lock. They release this lock before
    copying and set tp_status to TP_STATUS_USER to release to userspace
    when done. During copying, another writer may take the lock, also see
    TP_STATUS_KERNEL, and start writing to the same slot.

    Introduce a new rx_owner_map bitmap with a bit per slot. To acquire a
    slot, test and set with the lock held. To release race-free, update
    tp_status and owner bit as a transaction, so take the lock again.

    This is the one of a variety of discussed options (see Link below):

    * instead of a shadow ring, embed the data in the slot itself, such as
    in tp_padding. But any test for this field may match a value left by
    userspace, causing deadlock.

    * avoid the lock on release. This leaves a small race if releasing the
    shadow slot before setting TP_STATUS_USER. The below reproducer showed
    that this race is not academic. If releasing the slot after tp_status,
    the race is more subtle. See the first link for details.

    * add a new tp_status TP_KERNEL_OWNED to avoid the transactional store
    of two fields. But, legacy applications may interpret all non-zero
    tp_status as owned by the user. As libpcap does. So this is possible
    only opt-in by newer processes. It can be added as an optional mode.

    * embed the struct at the tail of pg_vec to avoid extra allocation.
    The implementation proved no less complex than a separate field.

    The additional locking cost on release adds contention, no different
    than scaling on multicore or multiqueue h/w. In practice, below
    reproducer nor small packet tcpdump showed a noticeable change in
    perf report in cycles spent in spinlock. Where contention is
    problematic, packet sockets support mitigation through PACKET_FANOUT.
    And we can consider adding opt-in state TP_KERNEL_OWNED.

    Easy to reproduce by running multiple netperf or similar TCP_STREAM
    flows concurrently with `tcpdump -B 129 -n greater 60000`.

    Based on an earlier patchset by Jon Rosen. See links below.

    I believe this issue goes back to the introduction of tpacket_rcv,
    which predates git history.

    Link: https://www.mail-archive.com/netdev@vger.kernel.org/msg237222.html
    Suggested-by: Jon Rosen
    Signed-off-by: Willem de Bruijn
    Signed-off-by: Jon Rosen
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit 065fd83e1be2e1ba0d446a257fd86a3cc7bddb51 ]

    For the case where the last mvneta_poll did not process all
    RX packets, we need to xor the pp->cause_rx_tx or port->cause_rx_tx
    before claculating the rx_queue.

    Fixes: 2dcf75e2793c ("net: mvneta: Associate RX queues with each CPU")
    Signed-off-by: Jisheng Zhang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jisheng Zhang
     
  • [ Upstream commit 428c491332bca498c8eb2127669af51506c346c7 ]

    Currently ENA only provides the PCI remove() handler, used during rmmod
    for example. This is not called on shutdown/kexec path; we are potentially
    creating a failure scenario on kexec:

    (a) Kexec is triggered, no shutdown() / remove() handler is called for ENA;
    instead pci_device_shutdown() clears the master bit of the PCI device,
    stopping all DMA transactions;

    (b) Kexec reboot happens and the device gets enabled again, likely having
    its FW with that DMA transaction buffered; then it may trigger the (now
    invalid) memory operation in the new kernel, corrupting kernel memory area.

    This patch aims to prevent this, by implementing a shutdown() handler
    quite similar to the remove() one - the difference being the handling
    of the netdev, which is unregistered on remove(), but following the
    convention observed in other drivers, it's only detached on shutdown().

    This prevents an odd issue in AWS Nitro instances, in which after the 2nd
    kexec the next one will fail with an initrd corruption, caused by a wild
    DMA write to invalid kernel memory. The lspci output for the adapter
    present in my instance is:

    00:05.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network
    Adapter (ENA) [1d0f:ec20]

    Suggested-by: Gavin Shan
    Signed-off-by: Guilherme G. Piccoli
    Acked-by: Sameeh Jubran
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Guilherme G. Piccoli
     
  • [ Upstream commit e80f40cbe4dd51371818e967d40da8fe305db5e4 ]

    Not only did this wheel did not need reinventing, but there is also
    an issue with it: It doesn't remove the VLAN header in a way that
    preserves the L2 payload checksum when that is being provided by the DSA
    master hw. It should recalculate checksum both for the push, before
    removing the header, and for the pull afterwards. But the current
    implementation is quite dizzying, with pulls followed immediately
    afterwards by pushes, the memmove is done before the push, etc. This
    makes a DSA master with RX checksumming offload to print stack traces
    with the infamous 'hw csum failure' message.

    So remove the dsa_8021q_remove_header function and replace it with
    something that actually works with inet checksumming.

    Fixes: d461933638ae ("net: dsa: tag_8021q: Create helper function for removing VLAN header")
    Signed-off-by: Vladimir Oltean
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Vladimir Oltean
     
  • [ Upstream commit 22259471b51925353bd7b16f864c79fdd76e425e ]

    Andrew reported:

    After a number of network port link up/down changes, sometimes the switch
    port gets stuck in a state where it thinks it is still transmitting packets
    but the cpu port is not actually transmitting anymore. In this state you
    will see a message on the console
    "mtk_soc_eth 1e100000.ethernet eth0: transmit timed out" and the Tx counter
    in ifconfig will be incrementing on virtual port, but not incrementing on
    cpu port.

    The issue is that MAC TX/RX status has no impact on the link status or
    queue manager of the switch. So the queue manager just queues up packets
    of a disabled port and sends out pause frames when the queue is full.

    Change the LINK bit to reflect the link status.

    Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
    Reported-by: Andrew Smith
    Signed-off-by: René van Dorst
    Reviewed-by: Vivien Didelot
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    René van Dorst
     
  • [ Upstream commit 0e62f543bed03a64495bd2651d4fe1aa4bcb7fe5 ]

    When both the switch and the bridge are learning about new addresses,
    switch ports attached to the bridge would see duplicate ARP frames
    because both entities would attempt to send them.

    Fixes: 5037d532b83d ("net: dsa: add Broadcom tag RX/TX handler")
    Reported-by: Maxime Bizon
    Signed-off-by: Florian Fainelli
    Reviewed-by: Vivien Didelot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Florian Fainelli
     
  • [ Upstream commit 961d0e5b32946703125964f9f5b6321d60f4d706 ]

    Currently the software CBS does not consider the packet sending time
    when depleting the credits. It caused the throughput to be
    Idleslope[kbps] * (Port transmit rate[kbps] / |Sendslope[kbps]|) where
    Idleslope * (Port transmit rate / (Idleslope + |Sendslope|)) = Idleslope
    is expected. In order to fix the issue above, this patch takes the time
    when the packet sending completes into account by moving the anchor time
    variable "last" ahead to the send completion time upon transmission and
    adding wait when the next dequeue request comes before the send
    completion time of the previous packet.

    changelog:
    V2->V3:
    - remove unnecessary whitespace cleanup
    - add the checks if port_rate is 0 before division

    V1->V2:
    - combine variable "send_completed" into "last"
    - add the comment for estimate of the packet sending

    Fixes: 585d763af09c ("net/sched: Introduce Credit Based Shaper (CBS) qdisc")
    Signed-off-by: Zh-yuan Ye
    Reviewed-by: Vinicius Costa Gomes
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Zh-yuan Ye
     
  • [ Upstream commit 13d0f7b814d9b4c67e60d8c2820c86ea181e7d99 ]

    The bpfilter UMH code was recently changed to log its informative messages to
    /dev/kmsg, however this interface doesn't support SEEK_CUR yet, used by
    dprintf(). As result dprintf() returns -EINVAL and doesn't log anything.

    However there already had some discussions about supporting SEEK_CUR into
    /dev/kmsg interface in the past it wasn't concluded. Since the only user of
    that from userspace perspective inside the kernel is the bpfilter UMH
    (userspace) module it's better to correct it here instead waiting a conclusion
    on the interface.

    Fixes: 36c4357c63f3 ("net: bpfilter: print umh messages to /dev/kmsg")
    Signed-off-by: Bruno Meneguele
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Bruno Meneguele
     
  • [ Upstream commit f6bf1bafdc2152bb22aff3a4e947f2441a1d49e2 ]

    list_for_each_entry_from_reverse() iterates backwards over the list from
    the current position, but in the error path we should start from the
    previous position.

    Fix this by using list_for_each_entry_continue_reverse() instead.

    This suppresses the following error from coccinelle:

    drivers/net/ethernet/mellanox/mlxsw//spectrum_mr.c:655:34-38: ERROR:
    invalid reference to the index variable of the iterator on line 636

    Fixes: c011ec1bbfd6 ("mlxsw: spectrum: Add the multicast routing offloading logic")
    Signed-off-by: Ido Schimmel
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ido Schimmel
     
  • [ Upstream commit 6002059d7882c3512e6ac52fa82424272ddfcd5c ]

    During initialization the driver issues a software reset command and
    then waits for the system status to change back to "ready" state.

    However, before issuing the reset command the driver does not check that
    the system is actually in "ready" state. On Spectrum-{1,2} systems this
    was always the case as the hardware initialization time is very short.
    On Spectrum-3 systems this is no longer the case. This results in the
    software reset command timing-out and the driver failing to load:

    [ 6.347591] mlxsw_spectrum3 0000:06:00.0: Cmd exec timed-out (opcode=40(ACCESS_REG),opcode_mod=0,in_mod=0)
    [ 6.358382] mlxsw_spectrum3 0000:06:00.0: Reg cmd access failed (reg_id=9023(mrsr),type=write)
    [ 6.368028] mlxsw_spectrum3 0000:06:00.0: cannot register bus device
    [ 6.375274] mlxsw_spectrum3: probe of 0000:06:00.0 failed with error -110

    Fix this by waiting for the system to become ready both before issuing
    the reset command and afterwards. In case of failure, print the last
    system status to aid in debugging.

    Fixes: da382875c616 ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC")
    Signed-off-by: Ido Schimmel
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ido Schimmel
     
  • [ Upstream commit b06d072ccc4b1acd0147b17914b7ad1caa1818bb ]

    Only attach macsec to ethernet devices.

    Syzbot was able to trigger a KMSAN warning in macsec_handle_frame
    by attaching to a phonet device.

    Macvlan has a similar check in macvlan_port_create.

    v1->v2
    - fix commit message typo

    Reported-by: syzbot
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit dddeb30bfc43926620f954266fd12c65a7206f07 ]

    There is a place,

    inet_dump_fib()
    fib_table_dump
    fn_trie_dump_leaf()
    hlist_for_each_entry_rcu()

    without rcu_read_lock() will trigger a warning,

    WARNING: suspicious RCU usage
    -----------------------------
    net/ipv4/fib_trie.c:2216 RCU-list traversed in non-reader section!!

    other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by ip/1923:
    #0: ffffffff8ce76e40 (rtnl_mutex){+.+.}, at: netlink_dump+0xd6/0x840

    Call Trace:
    dump_stack+0xa1/0xea
    lockdep_rcu_suspicious+0x103/0x10d
    fn_trie_dump_leaf+0x581/0x590
    fib_table_dump+0x15f/0x220
    inet_dump_fib+0x4ad/0x5d0
    netlink_dump+0x350/0x840
    __netlink_dump_start+0x315/0x3e0
    rtnetlink_rcv_msg+0x4d1/0x720
    netlink_rcv_skb+0xf0/0x220
    rtnetlink_rcv+0x15/0x20
    netlink_unicast+0x306/0x460
    netlink_sendmsg+0x44b/0x770
    __sys_sendto+0x259/0x270
    __x64_sys_sendto+0x80/0xa0
    do_syscall_64+0x69/0xf4
    entry_SYSCALL_64_after_hwframe+0x49/0xb3

    Fixes: 18a8021a7be3 ("net/ipv4: Plumb support for filtering route dumps")
    Signed-off-by: Qian Cai
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Qian Cai
     
  • [ Upstream commit 3a303cfdd28d5f930a307c82e8a9d996394d5ebd ]

    The port->hsr is used in the hsr_handle_frame(), which is a
    callback of rx_handler.
    hsr master and slaves are initialized in hsr_add_port().
    This function initializes several pointers, which includes port->hsr after
    registering rx_handler.
    So, in the rx_handler routine, un-initialized pointer would be used.
    In order to fix this, pointers should be initialized before
    registering rx_handler.

    Test commands:
    ip netns del left
    ip netns del right
    modprobe -rv veth
    modprobe -rv hsr
    killall ping
    modprobe hsr
    ip netns add left
    ip netns add right
    ip link add veth0 type veth peer name veth1
    ip link add veth2 type veth peer name veth3
    ip link add veth4 type veth peer name veth5
    ip link set veth1 netns left
    ip link set veth3 netns right
    ip link set veth4 netns left
    ip link set veth5 netns right
    ip link set veth0 up
    ip link set veth2 up
    ip link set veth0 address fc:00:00:00:00:01
    ip link set veth2 address fc:00:00:00:00:02
    ip netns exec left ip link set veth1 up
    ip netns exec left ip link set veth4 up
    ip netns exec right ip link set veth3 up
    ip netns exec right ip link set veth5 up
    ip link add hsr0 type hsr slave1 veth0 slave2 veth2
    ip a a 192.168.100.1/24 dev hsr0
    ip link set hsr0 up
    ip netns exec left ip link add hsr1 type hsr slave1 veth1 slave2 veth4
    ip netns exec left ip a a 192.168.100.2/24 dev hsr1
    ip netns exec left ip link set hsr1 up
    ip netns exec left ip n a 192.168.100.1 dev hsr1 lladdr \
    fc:00:00:00:00:01 nud permanent
    ip netns exec left ip n r 192.168.100.1 dev hsr1 lladdr \
    fc:00:00:00:00:01 nud permanent
    for i in {1..100}
    do
    ip netns exec left ping 192.168.100.1 &
    done
    ip netns exec left hping3 192.168.100.1 -2 --flood &
    ip netns exec right ip link add hsr2 type hsr slave1 veth3 slave2 veth5
    ip netns exec right ip a a 192.168.100.3/24 dev hsr2
    ip netns exec right ip link set hsr2 up
    ip netns exec right ip n a 192.168.100.1 dev hsr2 lladdr \
    fc:00:00:00:00:02 nud permanent
    ip netns exec right ip n r 192.168.100.1 dev hsr2 lladdr \
    fc:00:00:00:00:02 nud permanent
    for i in {1..100}
    do
    ip netns exec right ping 192.168.100.1 &
    done
    ip netns exec right hping3 192.168.100.1 -2 --flood &
    while :
    do
    ip link add hsr0 type hsr slave1 veth0 slave2 veth2
    ip a a 192.168.100.1/24 dev hsr0
    ip link set hsr0 up
    ip link del hsr0
    done

    Splat looks like:
    [ 120.954938][ C0] general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1]I
    [ 120.957761][ C0] KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
    [ 120.959064][ C0] CPU: 0 PID: 1511 Comm: hping3 Not tainted 5.6.0-rc5+ #460
    [ 120.960054][ C0] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
    [ 120.962261][ C0] RIP: 0010:hsr_addr_is_self+0x65/0x2a0 [hsr]
    [ 120.963149][ C0] Code: 44 24 18 70 73 2f c0 48 c1 eb 03 48 8d 04 13 c7 00 f1 f1 f1 f1 c7 40 04 00 f2 f2 f2 4
    [ 120.966277][ C0] RSP: 0018:ffff8880d9c09af0 EFLAGS: 00010206
    [ 120.967293][ C0] RAX: 0000000000000006 RBX: 1ffff1101b38135f RCX: 0000000000000000
    [ 120.968516][ C0] RDX: dffffc0000000000 RSI: ffff8880d17cb208 RDI: 0000000000000000
    [ 120.969718][ C0] RBP: 0000000000000030 R08: ffffed101b3c0e3c R09: 0000000000000001
    [ 120.972203][ C0] R10: 0000000000000001 R11: ffffed101b3c0e3b R12: 0000000000000000
    [ 120.973379][ C0] R13: ffff8880aaf80100 R14: ffff8880aaf800f2 R15: ffff8880aaf80040
    [ 120.974410][ C0] FS: 00007f58e693f740(0000) GS:ffff8880d9c00000(0000) knlGS:0000000000000000
    [ 120.979794][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 120.980773][ C0] CR2: 00007ffcb8b38f29 CR3: 00000000afe8e001 CR4: 00000000000606f0
    [ 120.981945][ C0] Call Trace:
    [ 120.982411][ C0]
    [ 120.982848][ C0] ? hsr_add_node+0x8c0/0x8c0 [hsr]
    [ 120.983522][ C0] ? rcu_read_lock_held+0x90/0xa0
    [ 120.984159][ C0] ? rcu_read_lock_sched_held+0xc0/0xc0
    [ 120.984944][ C0] hsr_handle_frame+0x1db/0x4e0 [hsr]
    [ 120.985597][ C0] ? hsr_nl_nodedown+0x2b0/0x2b0 [hsr]
    [ 120.986289][ C0] __netif_receive_skb_core+0x6bf/0x3170
    [ 120.992513][ C0] ? check_chain_key+0x236/0x5d0
    [ 120.993223][ C0] ? do_xdp_generic+0x1460/0x1460
    [ 120.993875][ C0] ? register_lock_class+0x14d0/0x14d0
    [ 120.994609][ C0] ? __netif_receive_skb_one_core+0x8d/0x160
    [ 120.995377][ C0] __netif_receive_skb_one_core+0x8d/0x160
    [ 120.996204][ C0] ? __netif_receive_skb_core+0x3170/0x3170
    [ ... ]

    Reported-by: syzbot+fcf5dd39282ceb27108d@syzkaller.appspotmail.com
    Fixes: c5a759117210 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.")
    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     
  • [ Upstream commit 0fda7600c2e174fe27e9cf02e78e345226e441fa ]

    The debug check must be done after unregister_netdevice_many() call --
    the list_del() for this is done inside .ndo_stop.

    Fixes: 2843a25348f8 ("geneve: speedup geneve tunnels dismantle")
    Reported-and-tested-by:
    Cc: Haishuang Yan
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit f1f20a8666c55cb534b8f3fc1130eebf01a06155 ]

    Driver reclaims descriptors in much smaller batches, even if hardware
    indicates more to reclaim, during backpressure. So, fix the check to
    restart the Txq during backpressure, by looking at how many
    descriptors hardware had indicated to reclaim, and not on how many
    descriptors that driver had actually reclaimed. Once the Txq is
    restarted, driver will reclaim even more descriptors when Tx path
    is entered again.

    Fixes: d429005fdf2c ("cxgb4/cxgb4vf: Add support for SGE doorbell queue timer")
    Signed-off-by: Rahul Lakkireddy
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Rahul Lakkireddy
     
  • [ Upstream commit 7affd80802afb6ca92dba47d768632fbde365241 ]

    commit 7c3bebc3d868 ("cxgb4: request the TX CIDX updates to status page")
    reverted back to getting Tx CIDX updates via DMA, instead of interrupts,
    introduced by commit d429005fdf2c ("cxgb4/cxgb4vf: Add support for SGE
    doorbell queue timer")

    However, it missed reverting back several code changes where Tx CIDX
    updates are not explicitly requested during backpressure when using
    interrupt mode. These missed changes cause slow recovery during
    backpressure because the corresponding interrupt no longer comes and
    hence results in Tx throughput drop.

    So, revert back these missed code changes, as well, which will allow
    explicitly requesting Tx CIDX updates when backpressure happens.
    This enables the corresponding interrupt with Tx CIDX update message
    to get generated and hence speed up recovery and restore back
    throughput.

    Fixes: 7c3bebc3d868 ("cxgb4: request the TX CIDX updates to status page")
    Fixes: d429005fdf2c ("cxgb4/cxgb4vf: Add support for SGE doorbell queue timer")
    Signed-off-by: Rahul Lakkireddy
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Rahul Lakkireddy
     
  • commit 024aa8732acb7d2503eae43c3fe3504d0a8646d0 upstream.

    Note that the EC GPE processing need not be synchronized in
    acpi_s2idle_wake() after invoking acpi_ec_dispatch_gpe(), because
    that function checks the GPE status and dispatches its handler if
    need be and the SCI action handler is not going to run anyway at
    that point.

    Moreover, it is better to drain all of the pending ACPI events
    before restoring the working-state configuration of GPEs in
    acpi_s2idle_restore(), because those events are likely to be related
    to system wakeup, in which case they will not be relevant going
    forward.

    Rework the code to take these observations into account.

    Tested-by: Kenneth R. Crudup
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Rafael J. Wysocki
     
  • [ Upstream commit d2f8bfa4bff5028bc40ed56b4497c32e05b0178f ]

    It has turned out that the sdhci-tegra controller requires the R1B response,
    for commands that has this response associated with them. So, converting
    from an R1B to an R1 response for a CMD6 for example, leads to problems
    with the HW busy detection support.

    Fix this by informing the mmc core about the requirement, via setting the
    host cap, MMC_CAP_NEED_RSP_BUSY.

    Reported-by: Bitan Biswas
    Reported-by: Peter Geis
    Suggested-by: Sowjanya Komatineni
    Cc:
    Tested-by: Sowjanya Komatineni
    Tested-By: Peter Geis
    Signed-off-by: Ulf Hansson
    Signed-off-by: Sasha Levin

    Ulf Hansson
     
  • [ Upstream commit 055e04830d4544c57f2a5192a26c9e25915c29c0 ]

    It has turned out that the sdhci-omap controller requires the R1B response,
    for commands that has this response associated with them. So, converting
    from an R1B to an R1 response for a CMD6 for example, leads to problems
    with the HW busy detection support.

    Fix this by informing the mmc core about the requirement, via setting the
    host cap, MMC_CAP_NEED_RSP_BUSY.

    Reported-by: Naresh Kamboju
    Reported-by: Anders Roxell
    Reported-by: Faiz Abbas
    Cc:
    Tested-by: Anders Roxell
    Tested-by: Faiz Abbas
    Signed-off-by: Ulf Hansson
    Signed-off-by: Sasha Levin

    Ulf Hansson
     
  • [ Upstream commit 18d200460cd73636d4f20674085c39e32b4e0097 ]

    The busy timeout for the CMD5 to put the eMMC into sleep state, is specific
    to the card. Potentially the timeout may exceed the host->max_busy_timeout.
    If that becomes the case, mmc_sleep() converts from using an R1B response
    to an R1 response, as to prevent the host from doing HW busy detection.

    However, it has turned out that some hosts requires an R1B response no
    matter what, so let's respect that via checking MMC_CAP_NEED_RSP_BUSY. Note
    that, if the R1B gets enforced, the host becomes fully responsible of
    managing the needed busy timeout, in one way or the other.

    Suggested-by: Sowjanya Komatineni
    Cc:
    Link: https://lore.kernel.org/r/20200311092036.16084-1-ulf.hansson@linaro.org
    Signed-off-by: Ulf Hansson
    Signed-off-by: Sasha Levin

    Ulf Hansson
     
  • [ Upstream commit 43cc64e5221cc6741252b64bc4531dd1eefb733d ]

    The busy timeout that is computed for each erase/trim/discard operation,
    can become quite long and may thus exceed the host->max_busy_timeout. If
    that becomes the case, mmc_do_erase() converts from using an R1B response
    to an R1 response, as to prevent the host from doing HW busy detection.

    However, it has turned out that some hosts requires an R1B response no
    matter what, so let's respect that via checking MMC_CAP_NEED_RSP_BUSY. Note
    that, if the R1B gets enforced, the host becomes fully responsible of
    managing the needed busy timeout, in one way or the other.

    Suggested-by: Sowjanya Komatineni
    Cc:
    Tested-by: Anders Roxell
    Tested-by: Sowjanya Komatineni
    Tested-by: Faiz Abbas
    Tested-By: Peter Geis
    Signed-off-by: Ulf Hansson
    Signed-off-by: Sasha Levin

    Ulf Hansson
     
  • [ Upstream commit 1292e3efb149ee21d8d33d725eeed4e6b1ade963 ]

    It has turned out that some host controllers can't use R1B for CMD6 and
    other commands that have R1B associated with them. Therefore invent a new
    host cap, MMC_CAP_NEED_RSP_BUSY to let them specify this.

    In __mmc_switch(), let's check the flag and use it to prevent R1B responses
    from being converted into R1. Note that, this also means that the host are
    on its own, when it comes to manage the busy timeout.

    Suggested-by: Sowjanya Komatineni
    Cc:
    Tested-by: Anders Roxell
    Tested-by: Sowjanya Komatineni
    Tested-by: Faiz Abbas
    Tested-By: Peter Geis
    Signed-off-by: Ulf Hansson
    Signed-off-by: Sasha Levin

    Ulf Hansson
     

25 Mar, 2020

3 commits

  • Greg Kroah-Hartman
     
  • commit ae62cf5eb2792d9a818c2d93728ed92119357017 upstream.

    Newer GCC warns about possible truncations of two generated path names as
    we're concatenating the configurable sysfs and debugfs path prefixes
    with a filename and placing the results in buffers of the same size as
    the maximum length of the prefixes.

    snprintf(d->name, MAX_STR_LEN, "gb_loopback%u", dev_id);

    snprintf(d->sysfs_entry, MAX_SYSFS_PATH, "%s%s/",
    t->sysfs_prefix, d->name);

    snprintf(d->debugfs_entry, MAX_SYSFS_PATH, "%sraw_latency_%s",
    t->debugfs_prefix, d->name);

    Fix this by separating the maximum path length from the maximum prefix
    length and reducing the latter enough to fit the generated strings.

    Note that we also need to reduce the device-name buffer size as GCC
    isn't smart enough to figure out that we ever only used MAX_STR_LEN
    bytes of it.

    Fixes: 6b0658f68786 ("greybus: tools: Add tools directory to greybus repo and add loopback")
    Signed-off-by: Johan Hovold
    Link: https://lore.kernel.org/r/20200312110151.22028-4-johan@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Johan Hovold
     
  • commit f16023834863932f95dfad13fac3fc47f77d2f29 upstream.

    Newer GCC warns about a possible truncation of a generated sysfs path
    name as we're concatenating a directory path with a file name and
    placing the result in a buffer that is half the size of the maximum
    length of the directory path (which is user controlled).

    loopback_test.c: In function 'open_poll_files':
    loopback_test.c:651:31: warning: '%s' directive output may be truncated writing up to 511 bytes into a region of size 255 [-Wformat-truncation=]
    651 | snprintf(buf, sizeof(buf), "%s%s", dev->sysfs_entry, "iteration_count");
    | ^~
    loopback_test.c:651:3: note: 'snprintf' output between 16 and 527 bytes into a destination of size 255
    651 | snprintf(buf, sizeof(buf), "%s%s", dev->sysfs_entry, "iteration_count");
    | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    Fix this by making sure the buffer is large enough the concatenated
    strings.

    Fixes: 6b0658f68786 ("greybus: tools: Add tools directory to greybus repo and add loopback")
    Fixes: 9250c0ee2626 ("greybus: Loopback_test: use poll instead of inotify")
    Signed-off-by: Johan Hovold
    Link: https://lore.kernel.org/r/20200312110151.22028-3-johan@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Johan Hovold