17 Dec, 2017

36 commits

  • commit 4d2dc2cc766c3b51929658cacbc6e34fc8e242fb upstream.

    Currently, we're capping the values too low in the F_GETLK64 case. The
    fields in that structure are 64-bit values, so we shouldn't need to do
    any sort of fixup there.

    Make sure we check that assumption at build time in the future however
    by ensuring that the sizes we're copying will fit.

    With this, we no longer need COMPAT_LOFF_T_MAX either, so remove it.

    Fixes: 94073ad77fff2 (fs/locks: don't mess with the address limit in compat_fcntl64)
    Reported-by: Vitaly Lipatov
    Signed-off-by: Jeff Layton
    Reviewed-by: David Howells
    Signed-off-by: Greg Kroah-Hartman

    Jeff Layton
     
  • commit 30bf90ccdec1da9c8198b161ecbff39ce4e5a9ba upstream.

    Found using DEBUG_ATOMIC_SLEEP while submitting an AIO read operation:

    [ 100.853642] BUG: sleeping function called from invalid context at mm/slab.h:421
    [ 100.861148] in_atomic(): 1, irqs_disabled(): 1, pid: 1880, name: python
    [ 100.867954] 2 locks held by python/1880:
    [ 100.867961] #0: (&epfile->mutex){....}, at: [] ffs_mutex_lock+0x27/0x30 [usb_f_fs]
    [ 100.868020] #1: (&(&ffs->eps_lock)->rlock){....}, at: [] ffs_epfile_io.isra.17+0x24b/0x590 [usb_f_fs]
    [ 100.868076] CPU: 1 PID: 1880 Comm: python Not tainted 4.14.0-edison+ #118
    [ 100.868085] Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48
    [ 100.868093] Call Trace:
    [ 100.868122] dump_stack+0x47/0x62
    [ 100.868156] ___might_sleep+0xfd/0x110
    [ 100.868182] __might_sleep+0x68/0x70
    [ 100.868217] kmem_cache_alloc_trace+0x4b/0x200
    [ 100.868248] ? dwc3_gadget_ep_alloc_request+0x24/0xe0 [dwc3]
    [ 100.868302] dwc3_gadget_ep_alloc_request+0x24/0xe0 [dwc3]
    [ 100.868343] usb_ep_alloc_request+0x16/0xc0 [udc_core]
    [ 100.868386] ffs_epfile_io.isra.17+0x444/0x590 [usb_f_fs]
    [ 100.868424] ? _raw_spin_unlock_irqrestore+0x27/0x40
    [ 100.868457] ? kiocb_set_cancel_fn+0x57/0x60
    [ 100.868477] ? ffs_ep0_poll+0xc0/0xc0 [usb_f_fs]
    [ 100.868512] ffs_epfile_read_iter+0xfe/0x157 [usb_f_fs]
    [ 100.868551] ? security_file_permission+0x9c/0xd0
    [ 100.868587] ? rw_verify_area+0xac/0x120
    [ 100.868633] aio_read+0x9d/0x100
    [ 100.868692] ? __fget+0xa2/0xd0
    [ 100.868727] ? __might_sleep+0x68/0x70
    [ 100.868763] SyS_io_submit+0x471/0x680
    [ 100.868878] do_int80_syscall_32+0x4e/0xd0
    [ 100.868921] entry_INT80_32+0x2a/0x2a
    [ 100.868932] EIP: 0xb7fbb676
    [ 100.868941] EFLAGS: 00000292 CPU: 1
    [ 100.868951] EAX: ffffffda EBX: b7aa2000 ECX: 00000002 EDX: b7af8368
    [ 100.868961] ESI: b7fbb660 EDI: b7aab000 EBP: bfb6c658 ESP: bfb6c638
    [ 100.868973] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b

    Signed-off-by: Vincent Pelletier
    Signed-off-by: Felipe Balbi
    Signed-off-by: Greg Kroah-Hartman

    Vincent Pelletier
     
  • commit 4f7f5551a760eb0124267be65763008169db7087 upstream.

    System may crash after unloading ipmi_si.ko module
    because a timer may remain and fire after the module cleaned up resources.

    cleanup_one_si() contains the following processing.

    /*
    * Make sure that interrupts, the timer and the thread are
    * stopped and will not run again.
    */
    if (to_clean->irq_cleanup)
    to_clean->irq_cleanup(to_clean);
    wait_for_timer_and_thread(to_clean);

    /*
    * Timeouts are stopped, now make sure the interrupts are off
    * in the BMC. Note that timers and CPU interrupts are off,
    * so no need for locks.
    */
    while (to_clean->curr_msg || (to_clean->si_state != SI_NORMAL)) {
    poll(to_clean);
    schedule_timeout_uninterruptible(1);
    }

    si_state changes as following in the while loop calling poll(to_clean).

    SI_GETTING_MESSAGES
    => SI_CHECKING_ENABLES
    => SI_SETTING_ENABLES
    => SI_GETTING_EVENTS
    => SI_NORMAL

    As written in the code comments above,
    timers are expected to stop before the polling loop and not to run again.
    But the timer is set again in the following process
    when si_state becomes SI_SETTING_ENABLES.

    => poll
    => smi_event_handler
    => handle_transaction_done
    // smi_info->si_state == SI_SETTING_ENABLES
    => start_getting_events
    => start_new_msg
    => smi_mod_timer
    => mod_timer

    As a result, before the timer set in start_new_msg() expires,
    the polling loop may see si_state becoming SI_NORMAL
    and the module clean-up finishes.

    For example, hard LOCKUP and panic occurred as following.
    smi_timeout was called after smi_event_handler,
    kcs_event and hangs at port_inb()
    trying to access I/O port after release.

    [exception RIP: port_inb+19]
    RIP: ffffffffc0473053 RSP: ffff88069fdc3d80 RFLAGS: 00000006
    RAX: ffff8806800f8e00 RBX: ffff880682bd9400 RCX: 0000000000000000
    RDX: 0000000000000ca3 RSI: 0000000000000ca3 RDI: ffff8806800f8e40
    RBP: ffff88069fdc3d80 R8: ffffffff81d86dfc R9: ffffffff81e36426
    R10: 00000000000509f0 R11: 0000000000100000 R12: 0000000000]:000000
    R13: 0000000000000000 R14: 0000000000000246 R15: ffff8806800f8e00
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
    --- ---

    To fix the problem I defined a flag, timer_can_start,
    as member of struct smi_info.
    The flag is enabled immediately after initializing the timer
    and disabled immediately before waiting for timer deletion.

    Fixes: 0cfec916e86d ("ipmi: Start the timer and thread on internal msgs")
    Signed-off-by: Yamazaki Masamitsu
    [Some fairly major changes went into the IPMI driver in 4.15, so this
    required a backport as the code had changed and moved to a different
    file.]
    Signed-off-by: Corey Minyard
    Signed-off-by: Greg Kroah-Hartman

    Masamitsu Yamazaki
     
  • [ Upstream commit a8dd397903a6e57157f6265911f7d35681364427 ]

    Commit d04adf1b3551 ("sctp: reset owner sk for data chunks on out queues
    when migrating a sock") made a mistake that using 'list' as the param of
    list_for_each_entry to traverse the retransmit, sacked and abandoned
    queues, while chunks are using 'transmitted_list' to link into these
    queues.

    It could cause NULL dereference panic if there are chunks in any of these
    queues when peeling off one asoc.

    So use the chunk member 'transmitted_list' instead in this patch.

    Fixes: d04adf1b3551 ("sctp: reset owner sk for data chunks on out queues when migrating a sock")
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit 25415cec502a1232b19fffc85465882b19a90415 ]

    When cls_bpf offload was added it seemed like a good idea to
    call cls_bpf_delete_prog() instead of extending the error
    handling path, since the software state is fully initialized
    at that point. This handling of errors without jumping to
    the end of the function is error prone, as proven by later
    commit missing that extra call to __cls_bpf_delete_prog().

    __cls_bpf_delete_prog() is now expected to be invoked with
    a reference on exts->net or the field zeroed out. The call
    on the offload's error patch does not fullfil this requirement,
    leading to each error stealing a reference on net namespace.

    Create a function undoing what cls_bpf_set_parms() did and
    use it from __cls_bpf_delete_prog() and the error path.

    Fixes: aae2c35ec892 ("cls_bpf: use tcf_exts_get_net() before call_rcu()")
    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Acked-by: Daniel Borkmann
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jakub Kicinski
     
  • [ Upstream commit 2734166e89639c973c6e125ac8bcfc2d9db72b70 ]

    gso_type is being used in binary AND operations together with SKB_GSO_UDP.
    The issue is that variable gso_type is of type unsigned short and
    SKB_GSO_UDP expands to more than 16 bits:

    SKB_GSO_UDP = 1 << 16

    this makes any binary AND operation between gso_type and SKB_GSO_UDP to
    be always zero, hence making some code unreachable and likely causing
    undesired behavior.

    Fix this by changing the data type of variable gso_type to unsigned int.

    Addresses-Coverity-ID: 1462223
    Fixes: 0c19f846d582 ("net: accept UFO datagrams from tuntap and packet")
    Signed-off-by: Gustavo A. R. Silva
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Gustavo A. R. Silva
     
  • [ Upstream commit 0c19f846d582af919db66a5914a0189f9f92c936 ]

    Tuntap and similar devices can inject GSO packets. Accept type
    VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.

    Processes are expected to use feature negotiation such as TUNSETOFFLOAD
    to detect supported offload types and refrain from injecting other
    packets. This process breaks down with live migration: guest kernels
    do not renegotiate flags, so destination hosts need to expose all
    features that the source host does.

    Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677.
    This patch introduces nearly(*) no new code to simplify verification.
    It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
    insertion and software UFO segmentation.

    It does not reinstate protocol stack support, hardware offload
    (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
    of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.

    To support SKB_GSO_UDP reappearing in the stack, also reinstate
    logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
    by squashing in commit 939912216fa8 ("net: skb_needs_check() removes
    CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1
    ("net: avoid skb_warn_bad_offload false positives on UFO").

    (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
    ipv6_proxy_select_ident is changed to return a __be32 and this is
    assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
    at the end of the enum to minimize code churn.

    Tested
    Booted a v4.13 guest kernel with QEMU. On a host kernel before this
    patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
    enabled, same as on a v4.13 host kernel.

    A UFO packet sent from the guest appears on the tap device:
    host:
    nc -l -p -u 8000 &
    tcpdump -n -i tap0

    guest:
    dd if=/dev/zero of=payload.txt bs=1 count=2000
    nc -u 192.16.1.1 8000 < payload.txt

    Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
    packets arriving fragmented:

    ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
    (from https://github.com/wdebruij/kerneltools/tree/master/tests)

    Changes
    v1 -> v2
    - simplified set_offload change (review comment)
    - documented test procedure

    Link: http://lkml.kernel.org/r/
    Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
    Reported-by: Michal Kubecek
    Signed-off-by: Willem de Bruijn
    Acked-by: Jason Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit 654d573845f35017dc397840fa03610fef3d08b0 ]

    rcu_read_lock in tun_build_skb is used to rcu_dereference tun->xdp_prog
    safely, rcu_read_unlock should be done in every return path.

    Now I could see one place missing it, where it returns NULL in switch-case
    XDP_REDIRECT, another palce using rcu_read_lock wrongly, where it returns
    NULL in if (xdp_xmit) chunk.

    So fix both in this patch.

    Fixes: 761876c857cb ("tap: XDP support")
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit 98d11291d189cb5adf49694d0ad1b971c0212697 ]

    Florian reported a breakage with anycast routes due to commit
    4832c30d5458 ("net: ipv6: put host and anycast routes on device with
    address"). Prior to this commit anycast routes were added against the
    loopback device causing repetitive route entries with no insight into
    why they existed. e.g.:
    $ ip -6 ro ls table local type anycast
    anycast 2001:db8:1:: dev lo proto kernel metric 0 pref medium
    anycast 2001:db8:2:: dev lo proto kernel metric 0 pref medium
    anycast fe80:: dev lo proto kernel metric 0 pref medium
    anycast fe80:: dev lo proto kernel metric 0 pref medium

    The point of commit 4832c30d5458 is to add the routes using the device
    with the address which is causing the route to be added. e.g.,:
    $ ip -6 ro ls table local type anycast
    anycast 2001:db8:1:: dev eth1 proto kernel metric 0 pref medium
    anycast 2001:db8:2:: dev eth2 proto kernel metric 0 pref medium
    anycast fe80:: dev eth2 proto kernel metric 0 pref medium
    anycast fe80:: dev eth1 proto kernel metric 0 pref medium

    For traffic to work as it did before, the dst device needs to be switched
    to the loopback when the copy is created similar to local routes.

    Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit c33ee15b3820a03cf8229ba9415084197b827f8c ]

    tun_recvmsg() supports accepting skb by msg_control after
    commit ac77cfd4258f ("tun: support receiving skb through msg_control"),
    the skb if presented should be freed no matter how far it can go
    along, otherwise it would be leaked.

    This patch fixes several missed cases.

    Signed-off-by: Wei Xu
    Reported-by: Matthew Rosato
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Wei Xu
     
  • [ Upstream commit ed66dfaf236c04d414de1d218441296e57fb2bd2 ]

    Fix the TLP scheduling logic so that when scheduling a TLP probe, we
    ensure that the estimated time at which an RTO would fire accounts for
    the fact that ACKs indicating forward progress should push back RTO
    times.

    After the following fix:

    df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed")

    we had an unintentional behavior change in the following kind of
    scenario: suppose the RTT variance has been very low recently. Then
    suppose we send out a flight of N packets and our RTT is 100ms:

    t=0: send a flight of N packets
    t=100ms: receive an ACK for N-1 packets

    The response before df92c8394e6e that was:
    -> schedule a TLP for now + RTO_interval

    The response after df92c8394e6e is:
    -> schedule a TLP for t=0 + RTO_interval

    Since RTO_interval = srtt + RTT_variance, this means that we have
    scheduled a TLP timer at a point in the future that only accounts for
    RTT_variance. If the RTT_variance term is small, this means that the
    timer fires soon.

    Before df92c8394e6e this would not happen, because in that code, when
    we receive an ACK for a prefix of flight, we did:

    1) Near the top of tcp_ack(), switch from TLP timer to RTO
    at write_queue_head->paket_tx_time + RTO_interval:
    if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
    tcp_rearm_rto(sk);

    2) In tcp_clean_rtx_queue(), update the RTO to now + RTO_interval:
    if (flag & FLAG_ACKED) {
    tcp_rearm_rto(sk);

    3) In tcp_ack() after tcp_fastretrans_alert() switch from RTO
    to TLP at now + RTO_interval:
    if (icsk->icsk_pending == ICSK_TIME_RETRANS)
    tcp_schedule_loss_probe(sk);

    In df92c8394e6e we removed that 3-phase dance, and instead directly
    set the TLP timer once: we set the TLP timer in cases like this to
    write_queue_head->packet_tx_time + RTO_interval. So if the RTT
    variance is small, then this means that this is setting the TLP timer
    to fire quite soon. This means if the ACK for the tail of the flight
    takes longer than an RTT to arrive (often due to delayed ACKs), then
    the TLP timer fires too quickly.

    Fixes: df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed")
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Eric Dumazet
    Acked-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Neal Cardwell
     
  • [ Upstream commit 61d78537843e676e7f56ac6db333db0c0529b892 ]

    tap_recvmsg() supports accepting skb by msg_control after
    commit 3b4ba04acca8 ("tap: support receiving skb from msg_control"),
    the skb if presented should be freed within the function, otherwise
    it would be leaked.

    Signed-off-by: Wei Xu
    Reported-by: Matthew Rosato
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Wei Xu
     
  • [ Upstream commit d51aae68b142f48232257e96ce317db25445418d ]

    q->link.block is not initialized, that leads to EINVAL when one tries to
    add filter there. So initialize it properly.

    This can be reproduced by:
    $ tc qdisc add dev eth0 root handle 1: cbq avpkt 1000 rate 1000Mbit bandwidth 1000Mbit
    $ tc filter add dev eth0 parent 1: protocol ip prio 100 u32 match ip protocol 0 0x00 flowid 1:1

    Reported-by: Jaroslav Aster
    Reported-by: Ivan Vecera
    Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
    Signed-off-by: Jiri Pirko
    Acked-by: Eelco Chaudron
    Reviewed-by: Ivan Vecera
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jiri Pirko
     
  • [ Upstream commit 8632385022f2b05a6ca0b9e0f95575865de0e2ce ]

    When I switched rcv_rtt_est to high resolution timestamps, I forgot
    that tp->tcp_mstamp needed to be refreshed in tcp_rcv_space_adjust()

    Using an old timestamp leads to autotuning lags.

    Fixes: 645f4c6f2ebd ("tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps")
    Signed-off-by: Eric Dumazet
    Cc: Wei Wang
    Cc: Neal Cardwell
    Cc: Yuchung Cheng
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit c7799c067c2ae33e348508c8afec354f3257ff25 ]

    Remove the second tipc_rcv() call in tipc_udp_recv(). We have just
    checked that the bearer is not up, and calling tipc_rcv() with a bearer
    that is not up leads to a TIPC div-by-zero crash in
    tipc_node_calculate_timer(). The crash is rare in practice, but can
    happen like this:

    We're enabling a bearer, but it's not yet up and fully initialized.
    At the same time we receive a discovery packet, and in tipc_udp_recv()
    we end up calling tipc_rcv() with the not-yet-initialized bearer,
    causing later the div-by-zero crash in tipc_node_calculate_timer().

    Jon Maloy explains the impact of removing the second tipc_rcv() call:
    "link setup in the worst case will be delayed until the next arriving
    discovery messages, 1 sec later, and this is an acceptable delay."

    As the tipc_rcv() call is removed, just leave the function via the
    rcu_out label, so that we will kfree_skb().

    [ 12.590450] Own node address , network identity 1
    [ 12.668088] divide error: 0000 [#1] SMP
    [ 12.676952] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.2-dirty #1
    [ 12.679225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
    [ 12.682095] task: ffff8c2a761edb80 task.stack: ffffa41cc0cac000
    [ 12.684087] RIP: 0010:tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc]
    [ 12.686486] RSP: 0018:ffff8c2a7fc838a0 EFLAGS: 00010246
    [ 12.688451] RAX: 0000000000000000 RBX: ffff8c2a5b382600 RCX: 0000000000000000
    [ 12.691197] RDX: 0000000000000000 RSI: ffff8c2a5b382600 RDI: ffff8c2a5b382600
    [ 12.693945] RBP: ffff8c2a7fc838b0 R08: 0000000000000001 R09: 0000000000000001
    [ 12.696632] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2a5d8949d8
    [ 12.699491] R13: ffffffff95ede400 R14: 0000000000000000 R15: ffff8c2a5d894800
    [ 12.702338] FS: 0000000000000000(0000) GS:ffff8c2a7fc80000(0000) knlGS:0000000000000000
    [ 12.705099] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 12.706776] CR2: 0000000001bb9440 CR3: 00000000bd009001 CR4: 00000000003606e0
    [ 12.708847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 12.711016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 12.712627] Call Trace:
    [ 12.713390]
    [ 12.714011] tipc_node_check_dest+0x2e8/0x350 [tipc]
    [ 12.715286] tipc_disc_rcv+0x14d/0x1d0 [tipc]
    [ 12.716370] tipc_rcv+0x8b0/0xd40 [tipc]
    [ 12.717396] ? minmax_running_min+0x2f/0x60
    [ 12.718248] ? dst_alloc+0x4c/0xa0
    [ 12.718964] ? tcp_ack+0xaf1/0x10b0
    [ 12.719658] ? tipc_udp_is_known_peer+0xa0/0xa0 [tipc]
    [ 12.720634] tipc_udp_recv+0x71/0x1d0 [tipc]
    [ 12.721459] ? dst_alloc+0x4c/0xa0
    [ 12.722130] udp_queue_rcv_skb+0x264/0x490
    [ 12.722924] __udp4_lib_rcv+0x21e/0x990
    [ 12.723670] ? ip_route_input_rcu+0x2dd/0xbf0
    [ 12.724442] ? tcp_v4_rcv+0x958/0xa40
    [ 12.725039] udp_rcv+0x1a/0x20
    [ 12.725587] ip_local_deliver_finish+0x97/0x1d0
    [ 12.726323] ip_local_deliver+0xaf/0xc0
    [ 12.726959] ? ip_route_input_noref+0x19/0x20
    [ 12.727689] ip_rcv_finish+0xdd/0x3b0
    [ 12.728307] ip_rcv+0x2ac/0x360
    [ 12.728839] __netif_receive_skb_core+0x6fb/0xa90
    [ 12.729580] ? udp4_gro_receive+0x1a7/0x2c0
    [ 12.730274] __netif_receive_skb+0x1d/0x60
    [ 12.730953] ? __netif_receive_skb+0x1d/0x60
    [ 12.731637] netif_receive_skb_internal+0x37/0xd0
    [ 12.732371] napi_gro_receive+0xc7/0xf0
    [ 12.732920] receive_buf+0x3c3/0xd40
    [ 12.733441] virtnet_poll+0xb1/0x250
    [ 12.733944] net_rx_action+0x23e/0x370
    [ 12.734476] __do_softirq+0xc5/0x2f8
    [ 12.734922] irq_exit+0xfa/0x100
    [ 12.735315] do_IRQ+0x4f/0xd0
    [ 12.735680] common_interrupt+0xa2/0xa2
    [ 12.736126]
    [ 12.736416] RIP: 0010:native_safe_halt+0x6/0x10
    [ 12.736925] RSP: 0018:ffffa41cc0cafe90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff4d
    [ 12.737756] RAX: 0000000000000000 RBX: ffff8c2a761edb80 RCX: 0000000000000000
    [ 12.738504] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
    [ 12.739258] RBP: ffffa41cc0cafe90 R08: 0000014b5b9795e5 R09: ffffa41cc12c7e88
    [ 12.740118] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
    [ 12.740964] R13: ffff8c2a761edb80 R14: 0000000000000000 R15: 0000000000000000
    [ 12.741831] default_idle+0x2a/0x100
    [ 12.742323] arch_cpu_idle+0xf/0x20
    [ 12.742796] default_idle_call+0x28/0x40
    [ 12.743312] do_idle+0x179/0x1f0
    [ 12.743761] cpu_startup_entry+0x1d/0x20
    [ 12.744291] start_secondary+0x112/0x120
    [ 12.744816] secondary_startup_64+0xa5/0xa5
    [ 12.745367] Code: b9 f4 01 00 00 48 89 c2 48 c1 ea 02 48 3d d3 07 00
    00 48 0f 47 d1 49 8b 0c 24 48 39 d1 76 07 49 89 14 24 48 89 d1 31 d2 48
    89 df f7 f1 89 c6 e8 81 6e ff ff 5b 41 5c 5d c3 66 90 66 2e 0f 1f
    [ 12.747527] RIP: tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc] RSP: ffff8c2a7fc838a0
    [ 12.748555] ---[ end trace 1399ab83390650fd ]---
    [ 12.749296] Kernel panic - not syncing: Fatal exception in interrupt
    [ 12.750123] Kernel Offset: 0x13200000 from 0xffffffff82000000
    (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
    [ 12.751215] Rebooting in 60 seconds..

    Fixes: c9b64d492b1f ("tipc: add replicast peer discovery")
    Signed-off-by: Tommi Rantala
    Cc: Jon Maloy
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Tommi Rantala
     
  • [ Usptream commit b4d1605a8ea608fd7dc45b926a05d75d340bde4b ]

    After this fix : ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()"),
    socket lookups happen while skb->cb[] has not been mangled yet by TCP.

    Fixes: a04a480d4392 ("net: Require exact match for TCP socket lookups if dif is l3mdev")
    Signed-off-by: David Ahern
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit 6d69b1f1eb7a2edf8a3547f361c61f2538e054bb ]

    Using GSO with small MTUs currently results in a substantial throughput
    regression - which is caused by how qeth needs to map non-linear skbs
    into its IO buffer elements:
    compared to a linear skb, each GSO-segmented skb effectively consumes
    twice as many buffer elements (ie two instead of one) due to the
    additional header-only part. This causes the Output Queue to be
    congested with low-utilized IO buffers.

    Fix this as follows:
    If the MSS is low enough so that a non-SG GSO segmentation produces
    order-0 skbs (currently ~3500 byte), opt out from NETIF_F_SG. This is
    where we anticipate the biggest savings, since an SG-enabled
    GSO segmentation produces skbs that always consume at least two
    buffer elements.

    Larger MSS values continue to get a SG-enabled GSO segmentation, since
    1) the relative overhead of the additional header-only buffer element
    becomes less noticeable, and
    2) the linearization overhead increases.

    With the throughput regression fixed, re-enable NETIF_F_SG by default to
    reap the significant CPU savings of GSO.

    Fixes: 5722963a8e83 ("qeth: do not turn on SG per default")
    Reported-by: Nils Hoppmann
    Signed-off-by: Julian Wiedmann
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Julian Wiedmann
     
  • [ Upsteam commit bc3ab70584696cb798b9e1e0ac8e6ced5fd4c3b8 ]

    Commit 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
    reworked how secondary addresses are managed for qeth devices.
    Instead of dropping & subsequently re-adding all addresses on every
    ndo_set_rx_mode() call, qeth now keeps track of the addresses that are
    currently registered with the HW.
    On a ndo_set_rx_mode(), we thus only need to do (de-)registration
    requests for the addresses that have actually changed.

    On L3 devices, the lookup for IPv4 Multicast addresses checks the wrong
    hashtable - and thus never finds a match. As a result, we first delete
    *all* such addresses, and then re-add them again. So each set_rx_mode()
    causes a short period where the IPv4 Multicast addresses are not
    registered, and the card stops forwarding inbound traffic for them.

    Fix this by setting the ->is_multicast flag on the lookup object, thus
    enabling qeth_l3_ip_from_hash() to search the correct hashtable and
    find a match there.

    Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
    Signed-off-by: Julian Wiedmann
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Julian Wiedmann
     
  • [ Upstream commit 0cbff6d4546613330a1c5f139f5c368e4ce33ca1 ]

    The current GSO skb size limit was copy&pasted over from the L3 path,
    where it is needed due to a TSO limitation.
    As L2 devices don't offer TSO support (and thus all GSO skbs are
    segmented before they reach the driver), there's no reason to restrict
    the stack in how large it may build the GSO skbs.

    Fixes: d52aec97e5bc ("qeth: enable scatter/gather in layer 2 mode")
    Signed-off-by: Julian Wiedmann
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Julian Wiedmann
     
  • [ Upstream commit cfac7f836a715b91f08c851df915d401a4d52783 ]

    Maciej Żenczykowski reported some panics in tcp_twsk_destructor()
    that might be caused by the following bug.

    timewait timer is pinned to the cpu, because we want to transition
    timwewait refcount from 0 to 4 in one go, once everything has been
    initialized.

    At the time commit ed2e92394589 ("tcp/dccp: fix timewait races in timer
    handling") was merged, TCP was always running from BH habdler.

    After commit 5413d1babe8f ("net: do not block BH while processing
    socket backlog") we definitely can run tcp_time_wait() from process
    context.

    We need to block BH in the critical section so that the pinned timer
    has still its purpose.

    This bug is more likely to happen under stress and when very small RTO
    are used in datacenter flows.

    Fixes: 5413d1babe8f ("net: do not block BH while processing socket backlog")
    Signed-off-by: Eric Dumazet
    Reported-by: Maciej Żenczykowski
    Acked-by: Maciej Żenczykowski
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 45ab4b13e46325d00f4acdb365d406e941a15f81 ]

    The mss variable tracks the last max segment size sent to the TSO
    engine. We do not update the hardware as long as we receive skb:s with
    the same value in gso_size.

    During a network device down/up cycle (mapped to stmmac_release() and
    stmmac_open() callbacks) we issue a reset to the hardware and it
    forgets the setting for mss. However we did not zero out our mss
    variable so the next transmission of a gso packet happens with an
    undefined hardware setting.

    This triggers a hang in the TSO engine and eventuelly the netdev
    watchdog will bark.

    Fixes: f748be531d70 ("stmmac: support new GMAC4")
    Signed-off-by: Lars Persson
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Lars Persson
     
  • [ Upstream commit d7efc6c11b277d9d80b99b1334a78bfe7d7edf10 ]

    Alexander Potapenko reported use of uninitialized memory [1]

    This happens when inserting a request socket into TCP ehash,
    in __sk_nulls_add_node_rcu(), since sk_reuseport is not initialized.

    Bug was added by commit d894ba18d4e4 ("soreuseport: fix ordering for
    mixed v4/v6 sockets")

    Note that d296ba60d8e2 ("soreuseport: Resolve merge conflict for v4/v6
    ordering fix") missed the opportunity to get rid of
    hlist_nulls_add_tail_rcu() :

    Both UDP sockets and TCP/DCCP listeners no longer use
    __sk_nulls_add_node_rcu() for their hash insertion.

    Since all other sockets have unique 4-tuple, the reuseport status
    has no special meaning, so we can always use hlist_nulls_add_head_rcu()
    for them and save few cycles/instructions.

    [1]

    ==================================================================
    BUG: KMSAN: use of uninitialized memory in inet_ehash_insert+0xd40/0x1050
    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3288
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
     
     __dump_stack lib/dump_stack.c:16
     dump_stack+0x185/0x1d0 lib/dump_stack.c:52
     kmsan_report+0x13f/0x1c0 mm/kmsan/kmsan.c:1016
     __msan_warning_32+0x69/0xb0 mm/kmsan/kmsan_instr.c:766
     __sk_nulls_add_node_rcu ./include/net/sock.h:684
     inet_ehash_insert+0xd40/0x1050 net/ipv4/inet_hashtables.c:413
     reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:754
     inet_csk_reqsk_queue_hash_add+0x1cc/0x300 net/ipv4/inet_connection_sock.c:765
     tcp_conn_request+0x31e7/0x36f0 net/ipv4/tcp_input.c:6414
     tcp_v4_conn_request+0x16d/0x220 net/ipv4/tcp_ipv4.c:1314
     tcp_rcv_state_process+0x42a/0x7210 net/ipv4/tcp_input.c:5917
     tcp_v4_do_rcv+0xa6a/0xcd0 net/ipv4/tcp_ipv4.c:1483
     tcp_v4_rcv+0x3de0/0x4ab0 net/ipv4/tcp_ipv4.c:1763
     ip_local_deliver_finish+0x6bb/0xcb0 net/ipv4/ip_input.c:216
     NF_HOOK ./include/linux/netfilter.h:248
     ip_local_deliver+0x3fa/0x480 net/ipv4/ip_input.c:257
     dst_input ./include/net/dst.h:477
     ip_rcv_finish+0x6fb/0x1540 net/ipv4/ip_input.c:397
     NF_HOOK ./include/linux/netfilter.h:248
     ip_rcv+0x10f6/0x15c0 net/ipv4/ip_input.c:488
     __netif_receive_skb_core+0x36f6/0x3f60 net/core/dev.c:4298
     __netif_receive_skb net/core/dev.c:4336
     netif_receive_skb_internal+0x63c/0x19c0 net/core/dev.c:4497
     napi_skb_finish net/core/dev.c:4858
     napi_gro_receive+0x629/0xa50 net/core/dev.c:4889
     e1000_receive_skb drivers/net/ethernet/intel/e1000/e1000_main.c:4018
     e1000_clean_rx_irq+0x1492/0x1d30
    drivers/net/ethernet/intel/e1000/e1000_main.c:4474
     e1000_clean+0x43aa/0x5970 drivers/net/ethernet/intel/e1000/e1000_main.c:3819
     napi_poll net/core/dev.c:5500
     net_rx_action+0x73c/0x1820 net/core/dev.c:5566
     __do_softirq+0x4b4/0x8dd kernel/softirq.c:284
     invoke_softirq kernel/softirq.c:364
     irq_exit+0x203/0x240 kernel/softirq.c:405
     exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:638
     do_IRQ+0x15e/0x1a0 arch/x86/kernel/irq.c:263
     common_interrupt+0x86/0x86

    Fixes: d894ba18d4e4 ("soreuseport: fix ordering for mixed v4/v6 sockets")
    Fixes: d296ba60d8e2 ("soreuseport: Resolve merge conflict for v4/v6 ordering fix")
    Signed-off-by: Eric Dumazet
    Reported-by: Alexander Potapenko
    Acked-by: Craig Gallek
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit a4abd7a80addb4a9547f7dfc7812566b60ec505c ]

    The qmi_wwan minidriver support a 'raw-ip' mode where frames are
    received without any ethernet header. This causes alignment issues
    because the skbs allocated by usbnet are "IP aligned".

    Fix by allowing minidrivers to disable the additional alignment
    offset. This is implemented using a per-device flag, since the same
    minidriver also supports 'ethernet' mode.

    Fixes: 32f7adf633b9 ("net: qmi_wwan: support "raw IP" mode")
    Reported-and-tested-by: Jay Foster
    Signed-off-by: Bjørn Mork
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Bjørn Mork
     
  • [ Upstream commit 3016dad75b48279e579117ee3ed566ba90a3b023 ]

    tcp_v6_send_reset() expects to receive an skb with skb->cb[] layout as
    used in TCP stack.
    MD5 lookup uses tcp_v6_iif() and tcp_v6_sdif() and thus
    TCP_SKB_CB(skb)->header.h6

    This patch probably fixes RST packets sent on behalf of a timewait md5
    ipv6 socket.

    Before Florian patch, tcp_v6_restore_cb() was needed before jumping to
    no_tcp_socket label.

    Fixes: 271c3b9b7bda ("tcp: honour SO_BINDTODEVICE for TW_RST case too")
    Signed-off-by: Eric Dumazet
    Cc: Florian Westphal
    Acked-by: Florian Westphal
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 15fe076edea787807a7cdc168df832544b58eba6 ]

    syzbot reported crashes [1] and provided a C repro easing bug hunting.

    When/if packet_do_bind() calls __unregister_prot_hook() and releases
    po->bind_lock, another thread can run packet_notifier() and process an
    NETDEV_UP event.

    This calls register_prot_hook() and hooks again the socket right before
    first thread is able to grab again po->bind_lock.

    Fixes this issue by temporarily setting po->num to 0, as suggested by
    David Miller.

    [1]
    dev_remove_pack: ffff8801bf16fa80 not found
    ------------[ cut here ]------------
    kernel BUG at net/core/dev.c:7945! ( BUG_ON(!list_empty(&dev->ptype_all)); )
    invalid opcode: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    device syz0 entered promiscuous mode
    CPU: 0 PID: 3161 Comm: syzkaller404108 Not tainted 4.14.0+ #190
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    task: ffff8801cc57a500 task.stack: ffff8801cc588000
    RIP: 0010:netdev_run_todo+0x772/0xae0 net/core/dev.c:7945
    RSP: 0018:ffff8801cc58f598 EFLAGS: 00010293
    RAX: ffff8801cc57a500 RBX: dffffc0000000000 RCX: ffffffff841f75b2
    RDX: 0000000000000000 RSI: 1ffff100398b1ede RDI: ffff8801bf1f8810
    device syz0 entered promiscuous mode
    RBP: ffff8801cc58f898 R08: 0000000000000001 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801bf1f8cd8
    R13: ffff8801cc58f870 R14: ffff8801bf1f8780 R15: ffff8801cc58f7f0
    FS: 0000000001716880(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000020b13000 CR3: 0000000005e25000 CR4: 00000000001406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:106
    tun_detach drivers/net/tun.c:670 [inline]
    tun_chr_close+0x49/0x60 drivers/net/tun.c:2845
    __fput+0x333/0x7f0 fs/file_table.c:210
    ____fput+0x15/0x20 fs/file_table.c:244
    task_work_run+0x199/0x270 kernel/task_work.c:113
    exit_task_work include/linux/task_work.h:22 [inline]
    do_exit+0x9bb/0x1ae0 kernel/exit.c:865
    do_group_exit+0x149/0x400 kernel/exit.c:968
    SYSC_exit_group kernel/exit.c:979 [inline]
    SyS_exit_group+0x1d/0x20 kernel/exit.c:977
    entry_SYSCALL_64_fastpath+0x1f/0x96
    RIP: 0033:0x44ad19

    Fixes: 30f7ea1c2b5f ("packet: race condition in packet_bind")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Francesco Ruggeri
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • syzkaller found a race condition fanout_demux_rollover() while removing
    a packet socket from a fanout group.

    po->rollover is read and operated on during packet_rcv_fanout(), via
    fanout_demux_rollover(), but the pointer is currently cleared before the
    synchronization in packet_release(). It is safer to delay the cleanup
    until after synchronize_net() has been called, ensuring all calls to
    packet_rcv_fanout() for this socket have finished.

    To further simplify synchronization around the rollover structure, set
    po->rollover in fanout_add() only if there are no errors. This removes
    the need for rcu in the struct and in the call to
    packet_getsockopt(..., PACKET_ROLLOVER_STATS, ...).

    Crashing stack trace:
    fanout_demux_rollover+0xb6/0x4d0 net/packet/af_packet.c:1392
    packet_rcv_fanout+0x649/0x7c8 net/packet/af_packet.c:1487
    dev_queue_xmit_nit+0x835/0xc10 net/core/dev.c:1953
    xmit_one net/core/dev.c:2975 [inline]
    dev_hard_start_xmit+0x16b/0xac0 net/core/dev.c:2995
    __dev_queue_xmit+0x17a4/0x2050 net/core/dev.c:3476
    dev_queue_xmit+0x17/0x20 net/core/dev.c:3509
    neigh_connected_output+0x489/0x720 net/core/neighbour.c:1379
    neigh_output include/net/neighbour.h:482 [inline]
    ip6_finish_output2+0xad1/0x22a0 net/ipv6/ip6_output.c:120
    ip6_finish_output+0x2f9/0x920 net/ipv6/ip6_output.c:146
    NF_HOOK_COND include/linux/netfilter.h:239 [inline]
    ip6_output+0x1f4/0x850 net/ipv6/ip6_output.c:163
    dst_output include/net/dst.h:459 [inline]
    NF_HOOK.constprop.35+0xff/0x630 include/linux/netfilter.h:250
    mld_sendpack+0x6a8/0xcc0 net/ipv6/mcast.c:1660
    mld_send_initial_cr.part.24+0x103/0x150 net/ipv6/mcast.c:2072
    mld_send_initial_cr net/ipv6/mcast.c:2056 [inline]
    ipv6_mc_dad_complete+0x99/0x130 net/ipv6/mcast.c:2079
    addrconf_dad_completed+0x595/0x970 net/ipv6/addrconf.c:4039
    addrconf_dad_work+0xac9/0x1160 net/ipv6/addrconf.c:3971
    process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2113
    worker_thread+0x223/0x1990 kernel/workqueue.c:2247
    kthread+0x35e/0x430 kernel/kthread.c:231
    ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:432

    Fixes: 0648ab70afe6 ("packet: rollover prepare: per-socket state")
    Fixes: 509c7a1ecc860 ("packet: avoid panic in packet_getsockopt()")
    Reported-by: syzbot
    Signed-off-by: Mike Maloney
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Mike Maloney
     
  • [ Upstream commit eeea10b83a139451130df1594f26710c8fa390c8 ]

    James Morris reported kernel stack corruption bug [1] while
    running the SELinux testsuite, and bisected to a recent
    commit bffa72cf7f9d ("net: sk_buff rbnode reorg")

    We believe this commit is fine, but exposes an older bug.

    SELinux code runs from tcp_filter() and might send an ICMP,
    expecting IP options to be found in skb->cb[] using regular IPCB placement.

    We need to defer TCP mangling of skb->cb[] after tcp_filter() calls.

    This patch adds tcp_v4_fill_cb()/tcp_v4_restore_cb() in a very
    similar way we added them for IPv6.

    [1]
    [ 339.806024] SELinux: failure in selinux_parse_skb(), unable to parse packet
    [ 339.822505] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81745af5
    [ 339.822505]
    [ 339.852250] CPU: 4 PID: 3642 Comm: client Not tainted 4.15.0-rc1-test #15
    [ 339.868498] Hardware name: LENOVO 10FGS0VA1L/30BC, BIOS FWKT68A 01/19/2017
    [ 339.885060] Call Trace:
    [ 339.896875]
    [ 339.908103] dump_stack+0x63/0x87
    [ 339.920645] panic+0xe8/0x248
    [ 339.932668] ? ip_push_pending_frames+0x33/0x40
    [ 339.946328] ? icmp_send+0x525/0x530
    [ 339.958861] ? kfree_skbmem+0x60/0x70
    [ 339.971431] __stack_chk_fail+0x1b/0x20
    [ 339.984049] icmp_send+0x525/0x530
    [ 339.996205] ? netlbl_skbuff_err+0x36/0x40
    [ 340.008997] ? selinux_netlbl_err+0x11/0x20
    [ 340.021816] ? selinux_socket_sock_rcv_skb+0x211/0x230
    [ 340.035529] ? security_sock_rcv_skb+0x3b/0x50
    [ 340.048471] ? sk_filter_trim_cap+0x44/0x1c0
    [ 340.061246] ? tcp_v4_inbound_md5_hash+0x69/0x1b0
    [ 340.074562] ? tcp_filter+0x2c/0x40
    [ 340.086400] ? tcp_v4_rcv+0x820/0xa20
    [ 340.098329] ? ip_local_deliver_finish+0x71/0x1a0
    [ 340.111279] ? ip_local_deliver+0x6f/0xe0
    [ 340.123535] ? ip_rcv_finish+0x3a0/0x3a0
    [ 340.135523] ? ip_rcv_finish+0xdb/0x3a0
    [ 340.147442] ? ip_rcv+0x27c/0x3c0
    [ 340.158668] ? inet_del_offload+0x40/0x40
    [ 340.170580] ? __netif_receive_skb_core+0x4ac/0x900
    [ 340.183285] ? rcu_accelerate_cbs+0x5b/0x80
    [ 340.195282] ? __netif_receive_skb+0x18/0x60
    [ 340.207288] ? process_backlog+0x95/0x140
    [ 340.218948] ? net_rx_action+0x26c/0x3b0
    [ 340.230416] ? __do_softirq+0xc9/0x26a
    [ 340.241625] ? do_softirq_own_stack+0x2a/0x40
    [ 340.253368]
    [ 340.262673] ? do_softirq+0x50/0x60
    [ 340.273450] ? __local_bh_enable_ip+0x57/0x60
    [ 340.285045] ? ip_finish_output2+0x175/0x350
    [ 340.296403] ? ip_finish_output+0x127/0x1d0
    [ 340.307665] ? nf_hook_slow+0x3c/0xb0
    [ 340.318230] ? ip_output+0x72/0xe0
    [ 340.328524] ? ip_fragment.constprop.54+0x80/0x80
    [ 340.340070] ? ip_local_out+0x35/0x40
    [ 340.350497] ? ip_queue_xmit+0x15c/0x3f0
    [ 340.361060] ? __kmalloc_reserve.isra.40+0x31/0x90
    [ 340.372484] ? __skb_clone+0x2e/0x130
    [ 340.382633] ? tcp_transmit_skb+0x558/0xa10
    [ 340.393262] ? tcp_connect+0x938/0xad0
    [ 340.403370] ? ktime_get_with_offset+0x4c/0xb0
    [ 340.414206] ? tcp_v4_connect+0x457/0x4e0
    [ 340.424471] ? __inet_stream_connect+0xb3/0x300
    [ 340.435195] ? inet_stream_connect+0x3b/0x60
    [ 340.445607] ? SYSC_connect+0xd9/0x110
    [ 340.455455] ? __audit_syscall_entry+0xaf/0x100
    [ 340.466112] ? syscall_trace_enter+0x1d0/0x2b0
    [ 340.476636] ? __audit_syscall_exit+0x209/0x290
    [ 340.487151] ? SyS_connect+0xe/0x10
    [ 340.496453] ? do_syscall_64+0x67/0x1b0
    [ 340.506078] ? entry_SYSCALL64_slow_path+0x25/0x25

    Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
    Signed-off-by: Eric Dumazet
    Reported-by: James Morris
    Tested-by: James Morris
    Tested-by: Casey Schaufler
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit f859b4af1c52493ec21173ccc73d0b60029b5b88 ]

    After parsing the sit netlink change info, we forget to update frag_off in
    ipip6_tunnel_update(). Fix it by assigning frag_off with new value.

    Reported-by: Jianlin Shi
    Signed-off-by: Hangbin Liu
    Acked-by: Nicolas Dichtel
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Hangbin Liu
     
  • [ Upstream commit f3069c6d33f6ae63a1668737bc78aaaa51bff7ca ]

    This is a fix for syzkaller719569, where memory registration was
    attempted without any underlying transport being loaded.

    Analysis of the case reveals that it is the setsockopt() RDS_GET_MR
    (2) and RDS_GET_MR_FOR_DEST (7) that are vulnerable.

    Here is an example stack trace when the bug is hit:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000000c0
    IP: __rds_rdma_map+0x36/0x440 [rds]
    PGD 2f93d03067 P4D 2f93d03067 PUD 2f93d02067 PMD 0
    Oops: 0000 [#1] SMP
    Modules linked in: bridge stp llc tun rpcsec_gss_krb5 nfsv4
    dns_resolver nfs fscache rds binfmt_misc sb_edac intel_powerclamp
    coretemp kvm_intel kvm irqbypass crct10dif_pclmul c rc32_pclmul
    ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd
    iTCO_wdt mei_me sg iTCO_vendor_support ipmi_si mei ipmi_devintf nfsd
    shpchp pcspkr i2c_i801 ioatd ma ipmi_msghandler wmi lpc_ich mfd_core
    auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2
    mgag200 i2c_algo_bit drm_kms_helper ixgbe syscopyarea ahci sysfillrect
    sysimgblt libahci mdio fb_sys_fops ttm ptp libata sd_mod mlx4_core drm
    crc32c_intel pps_core megaraid_sas i2c_core dca dm_mirror
    dm_region_hash dm_log dm_mod
    CPU: 48 PID: 45787 Comm: repro_set2 Not tainted 4.14.2-3.el7uek.x86_64 #2
    Hardware name: Oracle Corporation ORACLE SERVER X5-2L/ASM,MOBO TRAY,2U, BIOS 31110000 03/03/2017
    task: ffff882f9190db00 task.stack: ffffc9002b994000
    RIP: 0010:__rds_rdma_map+0x36/0x440 [rds]
    RSP: 0018:ffffc9002b997df0 EFLAGS: 00010202
    RAX: 0000000000000000 RBX: ffff882fa2182580 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: ffffc9002b997e40 RDI: ffff882fa2182580
    RBP: ffffc9002b997e30 R08: 0000000000000000 R09: 0000000000000002
    R10: ffff885fb29e3838 R11: 0000000000000000 R12: ffff882fa2182580
    R13: ffff882fa2182580 R14: 0000000000000002 R15: 0000000020000ffc
    FS: 00007fbffa20b700(0000) GS:ffff882fbfb80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000000c0 CR3: 0000002f98a66006 CR4: 00000000001606e0
    Call Trace:
    rds_get_mr+0x56/0x80 [rds]
    rds_setsockopt+0x172/0x340 [rds]
    ? __fget_light+0x25/0x60
    ? __fdget+0x13/0x20
    SyS_setsockopt+0x80/0xe0
    do_syscall_64+0x67/0x1b0
    entry_SYSCALL64_slow_path+0x25/0x25
    RIP: 0033:0x7fbff9b117f9
    RSP: 002b:00007fbffa20aed8 EFLAGS: 00000293 ORIG_RAX: 0000000000000036
    RAX: ffffffffffffffda RBX: 00000000000c84a4 RCX: 00007fbff9b117f9
    RDX: 0000000000000002 RSI: 0000400000000114 RDI: 000000000000109b
    RBP: 00007fbffa20af10 R08: 0000000000000020 R09: 00007fbff9dd7860
    R10: 0000000020000ffc R11: 0000000000000293 R12: 0000000000000000
    R13: 00007fbffa20b9c0 R14: 00007fbffa20b700 R15: 0000000000000021

    Code: 41 56 41 55 49 89 fd 41 54 53 48 83 ec 18 8b 87 f0 02 00 00 48
    89 55 d0 48 89 4d c8 85 c0 0f 84 2d 03 00 00 48 8b 87 00 03 00 00
    83 b8 c0 00 00 00 00 0f 84 25 03 00 0 0 48 8b 06 48 8b 56 08

    The fix is to check the existence of an underlying transport in
    __rds_rdma_map().

    Signed-off-by: Håkon Bugge
    Reported-by: syzbot
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Håkon Bugge
     
  • [ Upstream commit 6e474083f3daf3a3546737f5d7d502ad12eb257c ]

    Matthew found a roughly 40% tcp throughput regression with commit
    c67df11f(vhost_net: try batch dequing from skb array) as discussed
    in the following thread:
    https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html

    Eventually we figured out that it was a skb leak in handle_rx()
    when sending packets to the VM. This usually happens when a guest
    can not drain out vq as fast as vhost fills in, afterwards it sets
    off the traffic jam and leaks skb(s) which occurs as no headcount
    to send on the vq from vhost side.

    This can be avoided by making sure we have got enough headcount
    before actually consuming a skb from the batched rx array while
    transmitting, which is simply done by moving checking the zero
    headcount a bit ahead.

    Signed-off-by: Wei Xu
    Reported-by: Matthew Rosato
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Wei Xu
     
  • [ Upstream commit a7d5f107b4978e08eeab599ee7449af34d034053 ]

    When the function tipc_accept_from_sock() fails to create an instance of
    struct tipc_subscriber it omits to free the already created instance of
    struct tipc_conn instance before it returns.

    We fix that with this commit.

    Reported-by: David S. Miller
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jon Maloy
     
  • [ Upstream commit 83cf79a2fec3cf499eb6cb9eb608656fc2a82776 ]

    When the allocation of the addr buffer fails, we need to free
    our refcount on the inetdevice before returning.

    Signed-off-by: Julian Wiedmann
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Julian Wiedmann
     
  • [ Upstream commit 9e77d7a5549dc4d4999a60676373ab3fd1dae4db ]

    Commit 6fa1ba61520576cf1346c4ff09a056f2950cb3bf partially
    implemented the new ethtool API, by replacing get_settings()
    with get_link_ksettings(). This breaks ethtool, since the
    userspace tool (according to the new API specs) never tries
    the legacy set() call, when the new get() call succeeds.

    All attempts to chance some setting from userspace result in:
    > Cannot set new settings: Operation not supported

    Implement the missing set() call.

    Signed-off-by: Tobias Jakobi
    Tested-by: Holger Hoffstätte
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Tobias Jakobi
     
  • [ Upstream commit 134059fd2775be79e26c2dff87d25cc2f6ea5626 ]

    Offload IP header checksum to NIC.

    This fixes a previous patch which disabled checksum offloading
    for both IPv4 and IPv6 packets. So L3 checksum offload was
    getting disabled for IPv4 pkts. And HW is dropping these pkts
    for some reason.

    Without this patch, IPv4 TSO appears to be broken:

    WIthout this patch I get ~16kbyte/s, with patch close to 2mbyte/s
    when copying files via scp from test box to my home workstation.

    Looking at tcpdump on sender it looks like hardware drops IPv4 TSO skbs.
    This patch restores performance for me, ipv6 looks good too.

    Fixes: fa6d7cb5d76c ("net: thunderx: Fix TCP/UDP checksum offload for IPv6 pkts")
    Cc: Sunil Goutham
    Cc: Aleksey Makarov
    Cc: Eric Dumazet
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • [ Upstream commit fa6d7cb5d76cf0467c61420fc9238045aedfd379 ]

    Don't offload IP header checksum to NIC.

    This fixes a previous patch which enabled checksum offloading
    for both IPv4 and IPv6 packets. So L3 checksum offload was
    getting enabled for IPv6 pkts. And HW is dropping these pkts
    as it assumes the pkt is IPv4 when IP csum offload is set
    in the SQ descriptor.

    Fixes: 3a9024f52c2e ("net: thunderx: Enable TSO and checksum offloads for ipv6")
    Signed-off-by: Sunil Goutham
    Signed-off-by: Aleksey Makarov
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sunil Goutham
     
  • [ Upstream commit f9409e7f086fa6c4623769b4b2f4f17a024d8143 ]

    Quectel BG96 is an Qualcomm MDM9206 based IoT modem, supporting both
    CAT-M and NB-IoT. Tested hardware is BG96 mounted on Quectel development
    board (EVB). The USB id is added to qmi_wwan.c to allow QMI
    communication with the BG96.

    Signed-off-by: Sebastian Sjoholm
    Acked-by: Bjørn Mork
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sebastian Sjoholm
     

14 Dec, 2017

4 commits

  • Greg Kroah-Hartman
     
  • [ Upstream commit f4b3526d83c40dd8bf5948b9d7a1b2c340f0dcc8 ]

    The handler for the CB.ProbeUuid operation in the cache manager is
    implemented, but isn't listed in the switch-statement of operation
    selection, so won't be used. Fix this by adding it.

    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • [ Upstream commit 1199db603511d7463d9d3840f96f61967affc766 ]

    Fix the total-length calculation in afs_make_call() when the operation
    being dispatched has data from a series of pages attached.

    Despite the patched code looking like that it should reduce mathematically
    to the current code, it doesn't because the 32-bit unsigned arithmetic
    being used to calculate the page-offset-difference doesn't correctly extend
    to a 64-bit value when the result is effectively negative.

    Without this, some FS.StoreData operations that span multiple pages fail,
    reporting too little or too much data.

    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • [ Upstream commit 31fde034a8bd964a5c7c1a5663fc87a913158db2 ]

    The UMR's QP is created by calling mlx5_ib_create_qp directly, and
    therefore the send CQ and the recv CQ on the ibqp weren't assigned.

    Assign them right after calling the mlx5_ib_create_qp to assure
    that any access to those pointers will work as expected and won't
    crash the system as might happen as part of reset flow.

    Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
    Signed-off-by: Majd Dibbiny
    Reviewed-by: Yishai Hadas
    Signed-off-by: Leon Romanovsky
    Signed-off-by: Doug Ledford
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Majd Dibbiny