08 Mar, 2020

1 commit

  • Merge Linux stable release v5.4.24 into imx_5.4.y

    * tag 'v5.4.24': (3306 commits)
    Linux 5.4.24
    blktrace: Protect q->blk_trace with RCU
    kvm: nVMX: VMWRITE checks unsupported field before read-only field
    ...

    Signed-off-by: Jason Liu

    Conflicts:
    arch/arm/boot/dts/imx6sll-evk.dts
    arch/arm/boot/dts/imx7ulp.dtsi
    arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
    drivers/clk/imx/clk-composite-8m.c
    drivers/gpio/gpio-mxc.c
    drivers/irqchip/Kconfig
    drivers/mmc/host/sdhci-of-esdhc.c
    drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
    drivers/net/can/flexcan.c
    drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
    drivers/net/ethernet/mscc/ocelot.c
    drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
    drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
    drivers/net/phy/realtek.c
    drivers/pci/controller/mobiveil/pcie-mobiveil-host.c
    drivers/perf/fsl_imx8_ddr_perf.c
    drivers/tee/optee/shm_pool.c
    drivers/usb/cdns3/gadget.c
    kernel/sched/cpufreq.c
    net/core/xdp.c
    sound/soc/fsl/fsl_esai.c
    sound/soc/fsl/fsl_sai.c
    sound/soc/sof/core.c
    sound/soc/sof/imx/Kconfig
    sound/soc/sof/loader.c

    Jason Liu
     

05 Mar, 2020

22 commits

  • commit cf3e204a1ca5442190018a317d9ec181b4639bd6 upstream.

    info->key.tp_src and tp_dst are __be16, when using nla_put_be16()
    to dump them, htons() is not needed, so remove it in this patch.

    Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support")
    Signed-off-by: Xin Long
    Reviewed-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • commit 369537c97024dca99303a8d4d6ab38b4f54d3909 upstream.

    Just SMCR requires a CLC Peer ID, but not SMCD. The field should be
    zero for SMCD.

    Fixes: c758dfddc1b5 ("net/smc: add SMC-D support in CLC messages")
    Signed-off-by: Ursula Braun
    Signed-off-by: Karsten Graul
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ursula Braun
     
  • commit 3a20773beeeeadec41477a5ba872175b778ff752 upstream.

    Since nl_groups is a u32 we can't bind more groups via ->bind
    (netlink_bind) call, but netlink has supported more groups via
    setsockopt() for a long time and thus nlk->ngroups could be over 32.
    Recently I added support for per-vlan notifications and increased the
    groups to 33 for NETLINK_ROUTE which exposed an old bug in the
    netlink_bind() code causing out-of-bounds access on archs where unsigned
    long is 32 bits via test_bit() on a local variable. Fix this by capping the
    maximum groups in netlink_bind() to BITS_PER_TYPE(u32), effectively
    capping them at 32 which is the minimum of allocated groups and the
    maximum groups which can be bound via netlink_bind().

    CC: Christophe Leroy
    CC: Richard Guy Briggs
    Fixes: 4f520900522f ("netlink: have netlink per-protocol bind function return an error code.")
    Reported-by: Erhard F.
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Nikolay Aleksandrov
     
  • commit 0daa63ed4c6c4302790ce67b7a90c0997ceb7514 upstream.

    The below-mentioned commit changed the code to unlock *inside*
    the function, but previously the unlock was *outside*. It failed
    to remove the outer unlock, however, leading to double unlock.

    Fix this.

    Fixes: 33483a6b88e4 ("mac80211: fix missing unlock on error in ieee80211_mark_sta_auth()")
    Signed-off-by: Andrei Otcheretianski
    Link: https://lore.kernel.org/r/20200221104719.cce4741cf6eb.I671567b185c8a4c2409377e483fd149ce590f56d@changeid
    [rewrite commit message to better explain what happened]
    Signed-off-by: Johannes Berg
    Signed-off-by: Greg Kroah-Hartman

    Andrei Otcheretianski
     
  • commit 9951ebfcdf2b97dbb28a5d930458424341e61aa2 upstream.

    If nl80211_parse_he_obss_pd() fails, we leak the previously
    allocated ACL memory. Free it in this case.

    Fixes: 796e90f42b7e ("cfg80211: add support for parsing OBBS_PD attributes")
    Signed-off-by: Johannes Berg
    Link: https://lore.kernel.org/r/20200221104142.835aba4cdd14.I1923b55ba9989c57e13978f91f40bfdc45e60cbd@changeid
    Signed-off-by: Johannes Berg
    Signed-off-by: Greg Kroah-Hartman

    Johannes Berg
     
  • commit c4a3922d2d20c710f827d3a115ee338e8d0467df upstream.

    It is unnecessary to hold hashlimit_mutex for htable_destroy()
    as it is already removed from the global hashtable and its
    refcount is already zero.

    Also, switch hinfo->use to refcount_t so that we don't have
    to hold the mutex until it reaches zero in htable_put().

    Reported-and-tested-by: syzbot+adf6c6c2be1c3a718121@syzkaller.appspotmail.com
    Acked-by: Florian Westphal
    Signed-off-by: Cong Wang
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • commit 8af1c6fbd9239877998c7f5a591cb2c88d41fb66 upstream.

    When the forceadd option is enabled, the hash:* types should find and replace
    the first entry in the bucket with the new one if there are no reuseable
    (deleted or timed out) entries. However, the position index was just not set
    to zero and remained the invalid -1 if there were no reuseable entries.

    Reported-by: syzbot+6a86565c74ebe30aea18@syzkaller.appspotmail.com
    Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Greg Kroah-Hartman

    Jozsef Kadlecsik
     
  • commit 67f562e3e147750a02b2a91d21a163fc44a1d13e upstream.

    SMC does not work together with FASTOPEN. If sendmsg() is called with
    flag MSG_FASTOPEN in SMC_INIT state, the SMC-socket switches to
    fallback mode. To handle the previous ioctl FIOASYNC call correctly
    in this case, it is necessary to transfer the socket wait queue
    fasync_list to the internal TCP socket.

    Reported-by: syzbot+4b1fe8105f8044a26162@syzkaller.appspotmail.com
    Fixes: ee9dfbef02d18 ("net/smc: handle sockopts forcing fallback")
    Signed-off-by: Ursula Braun
    Signed-off-by: Karsten Graul
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ursula Braun
     
  • commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.

    In the case of huge hash:* types of sets, due to the single spinlock of
    a set the processing of the whole set under spinlock protection could take
    too long.

    There were four places where the whole hash table of the set was processed
    from bucket to bucket under holding the spinlock:

    - During resizing a set, the original set was locked to exclude kernel side
    add/del element operations (userspace add/del is excluded by the
    nfnetlink mutex). The original set is actually just read during the
    resize, so the spinlocking is replaced with rcu locking of regions.
    However, thus there can be parallel kernel side add/del of entries.
    In order not to loose those operations a backlog is added and replayed
    after the successful resize.
    - Garbage collection of timed out entries was also protected by the spinlock.
    In order not to lock too long, region locking is introduced and a single
    region is processed in one gc go. Also, the simple timer based gc running
    is replaced with a workqueue based solution. The internal book-keeping
    (number of elements, size of extensions) is moved to region level due to
    the region locking.
    - Adding elements: when the max number of the elements is reached, the gc
    was called to evict the timed out entries. The new approach is that the gc
    is called just for the matching region, assuming that if the region
    (proportionally) seems to be full, then the whole set does. We could scan
    the other regions to check every entry under rcu locking, but for huge
    sets it'd mean a slowdown at adding elements.
    - Listing the set header data: when the set was defined with timeout
    support, the garbage collector was called to clean up timed out entries
    to get the correct element numbers and set size values. Now the set is
    scanned to check non-timed out entries, without actually calling the gc
    for the whole set.

    Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
    SOFTIRQ-unsafe lock order issues during working on the patch.

    Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
    Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
    Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
    Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
    Signed-off-by: Jozsef Kadlecsik
    Signed-off-by: Greg Kroah-Hartman

    Jozsef Kadlecsik
     
  • [ Upstream commit 33181ea7f5a62a17fbe55f0f73428ecb5e686be8 ]

    Before this patch, STA's would set new width of 160/80+80 MHz based on AP capability only.
    This is wrong because STA may not support > 80MHz BW.
    Fix is to verify STA has 160/80+80 MHz capability before increasing its width to > 80MHz.

    The "support_80_80" and "support_160" setting is based on:
    "Table 9-272 — Setting of the Supported Channel Width Set subfield and Extended NSS BW
    Support subfield at a STA transmitting the VHT Capabilities Information field"
    From "Draft P802.11REVmd_D3.0.pdf"

    Signed-off-by: Aviad Brikman
    Signed-off-by: Shay Bar
    Link: https://lore.kernel.org/r/20200210130728.23674-1-shay.bar@celeno.com
    Signed-off-by: Johannes Berg
    Signed-off-by: Sasha Levin

    Shay Bar
     
  • [ Upstream commit ea75080110a4c1fa011b0a73cb8f42227143ee3e ]

    The nl80211_policy is missing for NL80211_ATTR_STATUS_CODE attribute.
    As a result, for strictly validated commands, it's assumed to not be
    supported.

    Signed-off-by: Sergey Matyukevich
    Link: https://lore.kernel.org/r/20200213131608.10541-2-sergey.matyukevich.os@quantenna.com
    Signed-off-by: Johannes Berg
    Signed-off-by: Sasha Levin

    Sergey Matyukevich
     
  • [ Upstream commit bfb7bac3a8f47100ebe7961bd14e924c96e21ca7 ]

    When preparing ethtool drvinfo, check if wiphy driver is defined
    before dereferencing it. Driver may not exist, e.g. if wiphy is
    attached to a virtual platform device.

    Signed-off-by: Sergey Matyukevich
    Link: https://lore.kernel.org/r/20200203105644.28875-1-sergey.matyukevich.os@quantenna.com
    Signed-off-by: Johannes Berg
    Signed-off-by: Sasha Levin

    Sergey Matyukevich
     
  • [ Upstream commit a04564c99bb4a92f805a58e56b2d22cc4978f152 ]

    We only use the parsing CRC for checking if a beacon changed,
    and elements with an ID > 63 cannot be represented in the
    filter. Thus, like we did before with WMM and Cisco vendor
    elements, just statically add these forgotten items to the
    CRC:
    - WLAN_EID_VHT_OPERATION
    - WLAN_EID_OPMODE_NOTIF

    I guess that in most cases when VHT/HE operation change, the HT
    operation also changed, and so the change was picked up, but we
    did notice that pure operating mode notification changes were
    ignored.

    Signed-off-by: Johannes Berg
    Signed-off-by: Luca Coelho
    Link: https://lore.kernel.org/r/20200131111300.891737-22-luca@coelho.fi
    [restrict to VHT for the mac80211 branch]
    Signed-off-by: Johannes Berg
    Signed-off-by: Sasha Levin

    Johannes Berg
     
  • [ Upstream commit afecdb376bd81d7e16578f0cfe82a1aec7ae18f3 ]

    When splitting an RTA_MULTIPATH request into multiple routes and adding the
    second and later components, we must not simply remove NLM_F_REPLACE but
    instead replace it by NLM_F_CREATE. Otherwise, it may look like the netlink
    message was malformed.

    For example,
    ip route add 2001:db8::1/128 dev dummy0
    ip route change 2001:db8::1/128 nexthop via fe80::30:1 dev dummy0 \
    nexthop via fe80::30:2 dev dummy0
    results in the following warnings:
    [ 1035.057019] IPv6: RTM_NEWROUTE with no NLM_F_CREATE or NLM_F_REPLACE
    [ 1035.057517] IPv6: NLM_F_CREATE should be set when creating new route

    This patch makes the nlmsg sequence look equivalent for __ip6_ins_rt() to
    what it would get if the multipath route had been added in multiple netlink
    operations:
    ip route add 2001:db8::1/128 dev dummy0
    ip route change 2001:db8::1/128 nexthop via fe80::30:1 dev dummy0
    ip route append 2001:db8::1/128 nexthop via fe80::30:2 dev dummy0

    Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
    Signed-off-by: Benjamin Poirier
    Reviewed-by: Michal Kubecek
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Benjamin Poirier
     
  • [ Upstream commit e404b8c7cfb31654c9024d497cec58a501501692 ]

    After commit 27596472473a ("ipv6: fix ECMP route replacement") it is no
    longer possible to replace an ECMP-able route by a non ECMP-able route.
    For example,
    ip route add 2001:db8::1/128 via fe80::1 dev dummy0
    ip route replace 2001:db8::1/128 dev dummy0
    does not work as expected.

    Tweak the replacement logic so that point 3 in the log of the above commit
    becomes:
    3. If the new route is not ECMP-able, and no matching non-ECMP-able route
    exists, replace matching ECMP-able route (if any) or add the new route.

    We can now summarize the entire replace semantics to:
    When doing a replace, prefer replacing a matching route of the same
    "ECMP-able-ness" as the replace argument. If there is no such candidate,
    fallback to the first route found.

    Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
    Signed-off-by: Benjamin Poirier
    Reviewed-by: Michal Kubecek
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Benjamin Poirier
     
  • [ Upstream commit 7151affeef8d527f50b4b68a871fd28bd660023f ]

    netdev_next_lower_dev_rcu() will be used to implement a function,
    which is to walk all lower interfaces.
    There are already functions that they walk their lower interface.
    (netdev_walk_all_lower_dev_rcu, netdev_walk_all_lower_dev()).
    But, there would be cases that couldn't be covered by given
    netdev_walk_all_lower_dev_{rcu}() function.
    So, some modules would want to implement own function,
    which is to walk all lower interfaces.

    In the next patch, netdev_next_lower_dev_rcu() will be used.
    In addition, this patch removes two unused prototypes in netdevice.h.

    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Taehee Yoo
     
  • [ Upstream commit 245709ec8be89af46ea7ef0444c9c80913999d99 ]

    When T2 timer is to be stopped, the asoc should also be deleted,
    otherwise, there will be no chance to call sctp_association_free
    and the asoc could last in memory forever.

    However, in sctp_sf_shutdown_sent_abort(), after adding the cmd
    SCTP_CMD_TIMER_STOP for T2 timer, it may return error due to the
    format error from __sctp_sf_do_9_1_abort() and miss adding
    SCTP_CMD_ASSOC_FAILED where the asoc will be deleted.

    This patch is to fix it by moving the format error check out of
    __sctp_sf_do_9_1_abort(), and do it before adding the cmd
    SCTP_CMD_TIMER_STOP for T2 timer.

    Thanks Hangbin for reporting this issue by the fuzz testing.

    v1->v2:
    - improve the comment in the code as Marcelo's suggestion.

    Fixes: 96ca468b86b0 ("sctp: check invalid value of length parameter in error cause")
    Reported-by: Hangbin Liu
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit 303d0403b8c25e994e4a6e45389e173cf8706fb5 ]

    As of the below commit, udp sockets bound to a specific address can
    coexist with one bound to the any addr for the same port.

    The commit also phased out the use of socket hashing based only on
    port (hslot), in favor of always hashing on {addr, port} (hslot2).

    The change broke the following behavior with disconnect (AF_UNSPEC):

    server binds to 0.0.0.0:1337
    server connects to 127.0.0.1:80
    server disconnects
    client connects to 127.0.0.1:1337
    client sends "hello"
    server reads "hello" // times out, packet did not find sk

    On connect the server acquires a specific source addr suitable for
    routing to its destination. On disconnect it reverts to the any addr.

    The connect call triggers a rehash to a different hslot2. On
    disconnect, add the same to return to the original hslot2.

    Skip this step if the socket is going to be unhashed completely.

    Fixes: 4cdeeee9252a ("net: udp: prefer listeners bound to an address")
    Reported-by: Pavel Roskin
    Signed-off-by: Willem de Bruijn
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit 379349e9bc3b42b8b2f8f7a03f64a97623fff323 ]

    This reverts commit ba27b4cdaaa66561aaedb2101876e563738d36fe

    Ahmed reported ouf-of-order issues bisected to commit ba27b4cdaaa6
    ("net: dev: introduce support for sch BYPASS for lockless qdisc").
    I can't find any working solution other than a plain revert.

    This will introduce some minor performance regressions for
    pfifo_fast qdisc. I plan to address them in net-next with more
    indirect call wrapper boilerplate for qdiscs.

    Reported-by: Ahmad Fatoum
    Fixes: ba27b4cdaaa6 ("net: dev: introduce support for sch BYPASS for lockless qdisc")
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     
  • [ Upstream commit 06f5201c6392f998a49ca9c9173e2930c8eb51d8 ]

    Current code doesn't check if tcp sequence number is starting from (/after)
    1st record's start sequnce number. It only checks if seq number is before
    1st record's end sequnce number. This problem will always be a possibility
    in re-transmit case. If a record which belongs to a requested seq number is
    already deleted, tls_get_record will start looking into list and as per the
    check it will look if seq number is before the end seq of 1st record, which
    will always be true and will return 1st record always, it should in fact
    return NULL.
    As part of the fix, start looking each record only if the sequence number
    lies in the list else return NULL.
    There is one more check added, driver look for the start marker record to
    handle tcp packets which are before the tls offload start sequence number,
    hence return 1st record if the record is tls start marker and seq number is
    before the 1st record's starting sequence number.

    Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
    Signed-off-by: Rohit Maheshwari
    Reviewed-by: Jakub Kicinski
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Rohit Maheshwari
     
  • [ Upstream commit 8a9093c79863b58cc2f9874d7ae788f0d622a596 ]

    tc flower rules that are based on src or dst port blocking are sometimes
    ineffective due to uninitialized stack data. __skb_flow_dissect() extracts
    ports from the skb for tc flower to match against. However, the port
    dissection is not done when when the FLOW_DIS_IS_FRAGMENT bit is set in
    key_control->flags. All callers of __skb_flow_dissect(), zero-out the
    key_control field except for fl_classify() as used by the flower
    classifier. Thus, the FLOW_DIS_IS_FRAGMENT may be set on entry to
    __skb_flow_dissect(), since key_control is allocated on the stack
    and may not be initialized.

    Since key_basic and key_control are present for all flow keys, let's
    make sure they are initialized.

    Fixes: 62230715fd24 ("flow_dissector: do not dissect l4 ports for fragments")
    Co-developed-by: Eric Dumazet
    Signed-off-by: Eric Dumazet
    Acked-by: Cong Wang
    Signed-off-by: Jason Baron
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jason Baron
     
  • [ Upstream commit 540e585a79e9d643ede077b73bcc7aa2d7b4d919 ]

    In 709772e6e06564ed94ba740de70185ac3d792773, RT_TABLE_COMPAT was added to
    allow legacy software to deal with routing table numbers >= 256, but the
    same change to FIB rule queries was overlooked.

    Signed-off-by: Jethro Beekman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jethro Beekman
     

29 Feb, 2020

2 commits

  • commit 963485d436ccc2810177a7b08af22336ec2af67b upstream.

    rxrpc_rcu_destroy_call(), which is called as an RCU callback to clean up a
    put call, calls rxrpc_put_connection() which, deep in its bowels, takes a
    number of spinlocks in a non-BH-safe way, including rxrpc_conn_id_lock and
    local->client_conns_lock. RCU callbacks, however, are normally called from
    softirq context, which can cause lockdep to notice the locking
    inconsistency.

    To get lockdep to detect this, it's necessary to have the connection
    cleaned up on the put at the end of the last of its calls, though normally
    the clean up is deferred. This can be induced, however, by starting a call
    on an AF_RXRPC socket and then closing the socket without reading the
    reply.

    Fix this by having rxrpc_rcu_destroy_call() punt the destruction to a
    workqueue if in softirq-mode and defer the destruction to process context.

    Note that another way to fix this could be to add a bunch of bh-disable
    annotations to the spinlocks concerned - and there might be more than just
    those two - but that means spending more time with BHs disabled.

    Note also that some of these places were covered by bh-disable spinlocks
    belonging to the rxrpc_transport object, but these got removed without the
    _bh annotation being retained on the next lock in.

    Fixes: 999b69f89241 ("rxrpc: Kill the client connection bundle concept")
    Reported-by: syzbot+d82f3ac8d87e7ccbb2c9@syzkaller.appspotmail.com
    Reported-by: syzbot+3f1fd6b8cbf8702d134e@syzkaller.appspotmail.com
    Signed-off-by: David Howells
    cc: Hillf Danton
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • commit 8d0015a7ab76b8b1e89a3e5f5710a6e5103f2dd5 upstream.

    The user-specified hashtable size is unbound, this could
    easily lead to an OOM or a hung task as we hold the global
    mutex while allocating and initializing the new hashtable.

    Add a max value to cap both cfg->size and cfg->max, as
    suggested by Florian.

    Reported-and-tested-by: syzbot+adf6c6c2be1c3a718121@syzkaller.appspotmail.com
    Signed-off-by: Cong Wang
    Reviewed-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     

26 Feb, 2020

1 commit

  • The DSA drivers that implement .phylink_mac_link_state should normally
    register an interrupt for the PCS, from which they should call
    phylink_mac_change(). However not all switches implement this, and those
    who don't should set this flag in dsa_switch in the .setup callback, so
    that PHYLINK will poll for a few ms until the in-band AN link timer
    expires and the PCS state settles.

    Signed-off-by: Vladimir Oltean

    Conflicts:
    include/net/dsa.h

    trivially with upstream commit 05f294a85235 ("net: dsa: allocate ports
    on touch") which was merged in v5.4-rc3.

    (cherry picked from commit 222d888331f409755fc25b1933e5dee1a976b9c1)

    Vladimir Oltean
     

24 Feb, 2020

9 commits

  • [ Upstream commit 1d82163714c16ebe09c7a8c9cd3cef7abcc16208 ]

    When we unhash the cache entry, we need to handle any pending upcalls
    by calling cache_fresh_unlocked().

    Signed-off-by: Trond Myklebust
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Sasha Levin

    Trond Myklebust
     
  • [ Upstream commit 0a29275b6300f39f78a87f2038bbfe5bdbaeca47 ]

    A negative value should be returned if map->map_type is invalid
    although that is impossible now, but if we run into such situation
    in future, then xdpbuff could be leaked.

    Daniel Borkmann suggested:

    -EBADRQC should be returned to stay consistent with generic XDP
    for the tracepoint output and not to be confused with -EOPNOTSUPP
    from other locations like dev_map_enqueue() when ndo_xdp_xmit is
    missing and such.

    Suggested-by: Daniel Borkmann
    Signed-off-by: Li RongQing
    Signed-off-by: Daniel Borkmann
    Link: https://lore.kernel.org/bpf/1578618277-18085-1-git-send-email-lirongqing@baidu.com
    Signed-off-by: Sasha Levin

    Li RongQing
     
  • [ Upstream commit 0705f95c332081036d85f26691e9d3cd7d901c31 ]

    ERSPAN_VERSION is an attribute parsed in kernel side, nla_policy
    type should be added for it, like other attributes.

    Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support")
    Signed-off-by: Xin Long
    Reviewed-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Sasha Levin

    Xin Long
     
  • [ Upstream commit 0b2dc83906cf1e694e48003eae5df8fa63f76fd9 ]

    We need to have a synchronize_rcu before free'ing the sockhash because any
    outstanding psock references will have a pointer to the map and when they
    use it, this could trigger a use after free.

    This is a sister fix for sockhash, following commit 2bb90e5cc90e ("bpf:
    sockmap, synchronize_rcu before free'ing map") which addressed sockmap,
    which comes from a manual audit.

    Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
    Signed-off-by: Jakub Sitnicki
    Signed-off-by: Daniel Borkmann
    Acked-by: John Fastabend
    Link: https://lore.kernel.org/bpf/20200206111652.694507-3-jakub@cloudflare.com
    Signed-off-by: Sasha Levin

    Jakub Sitnicki
     
  • [ Upstream commit e2debf0852c4d66ba1a8bde12869b196094c70a7 ]

    unlike other classifiers that can be offloaded (i.e. users can set flags
    like 'skip_hw' and 'skip_sw'), 'cls_flower' doesn't validate the size of
    netlink attribute 'TCA_FLOWER_FLAGS' provided by user: add a proper entry
    to fl_policy.

    Fixes: 5b33f48842fa ("net/flower: Introduce hardware offload support")
    Signed-off-by: Davide Caratti
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Davide Caratti
     
  • [ Upstream commit 1afa3cc90f8fb745c777884d79eaa1001d6927a6 ]

    unlike other classifiers that can be offloaded (i.e. users can set flags
    like 'skip_hw' and 'skip_sw'), 'cls_matchall' doesn't validate the size
    of netlink attribute 'TCA_MATCHALL_FLAGS' provided by user: add a proper
    entry to mall_policy.

    Fixes: b87f7936a932 ("net/sched: Add match-all classifier hw offloading.")
    Signed-off-by: Davide Caratti
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Davide Caratti
     
  • [ Upstream commit 04fb91243a853dbde216d829c79d9632e52aa8d9 ]

    Passing tag size to skb_cow_head will make sure
    there is enough headroom for the tag data.
    This change does not introduce any overhead in case there
    is already available headroom for tag.

    Signed-off-by: Per Forlin
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Per Forlin
     
  • [ Upstream commit 457fed775c97ac2c0cd1672aaf2ff2c8a6235e87 ]

    As nlmsg_put() does not clear the memory that is reserved,
    it this the caller responsability to make sure all of this
    memory will be written, in order to not reveal prior content.

    While we are at it, we can provide the socket cookie even
    if clsock is not set.

    syzbot reported :

    BUG: KMSAN: uninit-value in __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
    BUG: KMSAN: uninit-value in __fswab32 include/uapi/linux/swab.h:59 [inline]
    BUG: KMSAN: uninit-value in __swab32p include/uapi/linux/swab.h:179 [inline]
    BUG: KMSAN: uninit-value in __be32_to_cpup include/uapi/linux/byteorder/little_endian.h:82 [inline]
    BUG: KMSAN: uninit-value in get_unaligned_be32 include/linux/unaligned/access_ok.h:30 [inline]
    BUG: KMSAN: uninit-value in ____bpf_skb_load_helper_32 net/core/filter.c:240 [inline]
    BUG: KMSAN: uninit-value in ____bpf_skb_load_helper_32_no_cache net/core/filter.c:255 [inline]
    BUG: KMSAN: uninit-value in bpf_skb_load_helper_32_no_cache+0x14a/0x390 net/core/filter.c:252
    CPU: 1 PID: 5262 Comm: syz-executor.5 Not tainted 5.5.0-rc5-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1c9/0x220 lib/dump_stack.c:118
    kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
    __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
    __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
    __fswab32 include/uapi/linux/swab.h:59 [inline]
    __swab32p include/uapi/linux/swab.h:179 [inline]
    __be32_to_cpup include/uapi/linux/byteorder/little_endian.h:82 [inline]
    get_unaligned_be32 include/linux/unaligned/access_ok.h:30 [inline]
    ____bpf_skb_load_helper_32 net/core/filter.c:240 [inline]
    ____bpf_skb_load_helper_32_no_cache net/core/filter.c:255 [inline]
    bpf_skb_load_helper_32_no_cache+0x14a/0x390 net/core/filter.c:252

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline]
    kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127
    kmsan_kmalloc_large+0x73/0xc0 mm/kmsan/kmsan_hooks.c:128
    kmalloc_large_node_hook mm/slub.c:1406 [inline]
    kmalloc_large_node+0x282/0x2c0 mm/slub.c:3841
    __kmalloc_node_track_caller+0x44b/0x1200 mm/slub.c:4368
    __kmalloc_reserve net/core/skbuff.c:141 [inline]
    __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:209
    alloc_skb include/linux/skbuff.h:1049 [inline]
    netlink_dump+0x44b/0x1ab0 net/netlink/af_netlink.c:2224
    __netlink_dump_start+0xbb2/0xcf0 net/netlink/af_netlink.c:2352
    netlink_dump_start include/linux/netlink.h:233 [inline]
    smc_diag_handler_dump+0x2ba/0x300 net/smc/smc_diag.c:242
    sock_diag_rcv_msg+0x211/0x610 net/core/sock_diag.c:256
    netlink_rcv_skb+0x451/0x650 net/netlink/af_netlink.c:2477
    sock_diag_rcv+0x63/0x80 net/core/sock_diag.c:275
    netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
    netlink_unicast+0xf9e/0x1100 net/netlink/af_netlink.c:1328
    netlink_sendmsg+0x1248/0x14d0 net/netlink/af_netlink.c:1917
    sock_sendmsg_nosec net/socket.c:639 [inline]
    sock_sendmsg net/socket.c:659 [inline]
    kernel_sendmsg+0x433/0x440 net/socket.c:679
    sock_no_sendpage+0x235/0x300 net/core/sock.c:2740
    kernel_sendpage net/socket.c:3776 [inline]
    sock_sendpage+0x1e1/0x2c0 net/socket.c:937
    pipe_to_sendpage+0x38c/0x4c0 fs/splice.c:458
    splice_from_pipe_feed fs/splice.c:512 [inline]
    __splice_from_pipe+0x539/0xed0 fs/splice.c:636
    splice_from_pipe fs/splice.c:671 [inline]
    generic_splice_sendpage+0x1d5/0x2d0 fs/splice.c:844
    do_splice_from fs/splice.c:863 [inline]
    do_splice fs/splice.c:1170 [inline]
    __do_sys_splice fs/splice.c:1447 [inline]
    __se_sys_splice+0x2380/0x3350 fs/splice.c:1427
    __x64_sys_splice+0x6e/0x90 fs/splice.c:1427
    do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: f16a7dd5cf27 ("smc: netlink interface for SMC sockets")
    Signed-off-by: Eric Dumazet
    Cc: Ursula Braun
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit ad1e03b2b3d4430baaa109b77bc308dc73050de3 ]

    The current generic XDP handler skips execution of XDP programs entirely if
    an SKB is marked as cloned. This leads to some surprising behaviour, as
    packets can end up being cloned in various ways, which will make an XDP
    program not see all the traffic on an interface.

    This was discovered by a simple test case where an XDP program that always
    returns XDP_DROP is installed on a veth device. When combining this with
    the Scapy packet sniffer (which uses an AF_PACKET) socket on the sending
    side, SKBs reliably end up in the cloned state, causing them to be passed
    through to the receiving interface instead of being dropped. A minimal
    reproducer script for this is included below.

    This patch fixed the issue by simply triggering the existing linearisation
    code for cloned SKBs instead of skipping the XDP program execution. This
    behaviour is in line with the behaviour of the native XDP implementation
    for the veth driver, which will reallocate and copy the SKB data if the SKB
    is marked as shared.

    Reproducer Python script (requires BCC and Scapy):

    from scapy.all import TCP, IP, Ether, sendp, sniff, AsyncSniffer, Raw, UDP
    from bcc import BPF
    import time, sys, subprocess, shlex

    SKB_MODE = (1 << 1)
    DRV_MODE = (1 << 2)
    PYTHON=sys.executable

    def client():
    time.sleep(2)
    # Sniffing on the sender causes skb_cloned() to be set
    s = AsyncSniffer()
    s.start()

    for p in range(10):
    sendp(Ether(dst="aa:aa:aa:aa:aa:aa", src="cc:cc:cc:cc:cc:cc")/IP()/UDP()/Raw("Test"),
    verbose=False)
    time.sleep(0.1)

    s.stop()
    return 0

    def server(mode):
    prog = BPF(text="int dummy_drop(struct xdp_md *ctx) {return XDP_DROP;}")
    func = prog.load_func("dummy_drop", BPF.XDP)
    prog.attach_xdp("a_to_b", func, mode)

    time.sleep(1)

    s = sniff(iface="a_to_b", count=10, timeout=15)
    if len(s):
    print(f"Got {len(s)} packets - should have gotten 0")
    return 1
    else:
    print("Got no packets - as expected")
    return 0

    if len(sys.argv) < 2:
    print(f"Usage: {sys.argv[0]} ")
    sys.exit(1)

    if sys.argv[1] == "client":
    sys.exit(client())
    elif sys.argv[1] == "server":
    mode = SKB_MODE if sys.argv[2] == 'skb' else DRV_MODE
    sys.exit(server(mode))
    else:
    try:
    mode = sys.argv[1]
    if mode not in ('skb', 'drv'):
    print(f"Usage: {sys.argv[0]} ")
    sys.exit(1)
    print(f"Running in {mode} mode")

    for cmd in [
    'ip netns add netns_a',
    'ip netns add netns_b',
    'ip -n netns_a link add a_to_b type veth peer name b_to_a netns netns_b',
    # Disable ipv6 to make sure there's no address autoconf traffic
    'ip netns exec netns_a sysctl -qw net.ipv6.conf.a_to_b.disable_ipv6=1',
    'ip netns exec netns_b sysctl -qw net.ipv6.conf.b_to_a.disable_ipv6=1',
    'ip -n netns_a link set dev a_to_b address aa:aa:aa:aa:aa:aa',
    'ip -n netns_b link set dev b_to_a address cc:cc:cc:cc:cc:cc',
    'ip -n netns_a link set dev a_to_b up',
    'ip -n netns_b link set dev b_to_a up']:
    subprocess.check_call(shlex.split(cmd))

    server = subprocess.Popen(shlex.split(f"ip netns exec netns_a {PYTHON} {sys.argv[0]} server {mode}"))
    client = subprocess.Popen(shlex.split(f"ip netns exec netns_b {PYTHON} {sys.argv[0]} client"))

    client.wait()
    server.wait()
    sys.exit(server.returncode)

    finally:
    subprocess.run(shlex.split("ip netns delete netns_a"))
    subprocess.run(shlex.split("ip netns delete netns_b"))

    Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices")
    Reported-by: Stepan Horacek
    Suggested-by: Paolo Abeni
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Toke Høiland-Jørgensen
     

20 Feb, 2020

2 commits

  • commit 2bf973ff9b9aeceb8acda629ae65341820d4b35b upstream.

    Previously I intended to ignore quiet mode in probe response, however
    I ended up ignoring it instead for action frames. As a matter of fact,
    this path isn't invoked for probe responses to start with. Just revert
    this patch.

    Signed-off-by: Sara Sharon
    Fixes: 7976b1e9e3bf ("mac80211: ignore quiet mode in probe")
    Signed-off-by: Luca Coelho
    Link: https://lore.kernel.org/r/20200131111300.891737-15-luca@coelho.fi
    Signed-off-by: Johannes Berg
    Signed-off-by: Greg Kroah-Hartman

    Sara Sharon
     
  • commit ca1c671302825182629d3c1a60363cee6f5455bb upstream.

    The @nents value that was passed to ib_dma_map_sg() has to be passed
    to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to
    concatenate sg entries, it will return a different nents value than
    it was passed.

    The bug was exposed by recent changes to the AMD IOMMU driver, which
    enabled sg entry concatenation.

    Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to
    new memory registration API") and reviewing other kernel ULPs, it's
    not clear that the frwr_map() logic was ever correct for this case.

    Reported-by: Andre Tomt
    Suggested-by: Robin Murphy
    Signed-off-by: Chuck Lever
    Cc: stable@vger.kernel.org
    Reviewed-by: Jason Gunthorpe
    Signed-off-by: Anna Schumaker
    Signed-off-by: Greg Kroah-Hartman

    Chuck Lever
     

15 Feb, 2020

3 commits

  • commit 85b8ac01a421791d66c3a458a7f83cfd173fe3fa upstream.

    It's currently possible to insert sockets in unexpected states into
    a sockmap, due to a TOCTTOU when updating the map from a syscall.
    sock_map_update_elem checks that sk->sk_state == TCP_ESTABLISHED,
    locks the socket and then calls sock_map_update_common. At this
    point, the socket may have transitioned into another state, and
    the earlier assumptions don't hold anymore. Crucially, it's
    conceivable (though very unlikely) that a socket has become unhashed.
    This breaks the sockmap's assumption that it will get a callback
    via sk->sk_prot->unhash.

    Fix this by checking the (fixed) sk_type and sk_protocol without the
    lock, followed by a locked check of sk_state.

    Unfortunately it's not possible to push the check down into
    sock_(map|hash)_update_common, since BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB
    run before the socket has transitioned from TCP_SYN_RECV into
    TCP_ESTABLISHED.

    Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
    Signed-off-by: Lorenz Bauer
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Jakub Sitnicki
    Link: https://lore.kernel.org/bpf/20200207103713.28175-1-lmb@cloudflare.com
    Signed-off-by: Greg Kroah-Hartman

    Lorenz Bauer
     
  • commit 88d6f130e5632bbf419a2e184ec7adcbe241260b upstream.

    It was reported that the max_t, ilog2, and roundup_pow_of_two macros have
    exponential effects on the number of states in the sparse checker.

    This patch breaks them up by calculating the "nbuckets" first so that the
    "bucket_log" only needs to take ilog2().

    In addition, Linus mentioned:

    Patch looks good, but I'd like to point out that it's not just sparse.

    You can see it with a simple

    make net/core/bpf_sk_storage.i
    grep 'smap->bucket_log = ' net/core/bpf_sk_storage.i | wc

    and see the end result:

    1 365071 2686974

    That's one line (the assignment line) that is 2,686,974 characters in
    length.

    Now, sparse does happen to react particularly badly to that (I didn't
    look to why, but I suspect it's just that evaluating all the types
    that don't actually ever end up getting used ends up being much more
    expensive than it should be), but I bet it's not good for gcc either.

    Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage")
    Reported-by: Randy Dunlap
    Reported-by: Luc Van Oostenryck
    Suggested-by: Linus Torvalds
    Signed-off-by: Martin KaFai Lau
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Luc Van Oostenryck
    Link: https://lore.kernel.org/bpf/20200207081810.3918919-1-kafai@fb.com
    Signed-off-by: Greg Kroah-Hartman

    Martin KaFai Lau
     
  • commit 0b2dc83906cf1e694e48003eae5df8fa63f76fd9 upstream.

    We need to have a synchronize_rcu before free'ing the sockhash because any
    outstanding psock references will have a pointer to the map and when they
    use it, this could trigger a use after free.

    This is a sister fix for sockhash, following commit 2bb90e5cc90e ("bpf:
    sockmap, synchronize_rcu before free'ing map") which addressed sockmap,
    which comes from a manual audit.

    Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
    Signed-off-by: Jakub Sitnicki
    Signed-off-by: Daniel Borkmann
    Acked-by: John Fastabend
    Link: https://lore.kernel.org/bpf/20200206111652.694507-3-jakub@cloudflare.com
    Signed-off-by: Greg Kroah-Hartman

    Jakub Sitnicki