31 Jan, 2018

4 commits

  • Add suffix ULL to constant 80000 in order to avoid a potential integer
    overflow and give the compiler complete information about the proper
    arithmetic to use. Notice that this constant is used in a context that
    expects an expression of type u64.

    The current cast to u64 effectively applies to the whole expression
    as an argument of type u64 to be passed to div64_u64, but it does
    not prevent it from being evaluated using 32-bit arithmetic instead
    of 64-bit arithmetic.

    Also, once the expression is properly evaluated using 64-bit arithmentic,
    there is no need for the parentheses and the external cast to u64.

    Addresses-Coverity-ID: 1357588 ("Unintentional integer overflow")
    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: David S. Miller

    Gustavo A. R. Silva
     
  • Driver check the wrong register bit in rtl_ocp_tx_cond() that keep driver
    waiting until timeout.

    Fix this by waiting for the right register bit.

    Signed-off-by: Chunhao Lin
    Signed-off-by: David S. Miller

    Chunhao Lin
     
  • The Quectel EP06 is a Cat. 6 LTE modem. It uses the same interface as
    the EC20/EC25 for QMI, and requires the same "set DTR"-quirk to work.

    Signed-off-by: Kristian Evensen
    Acked-by: Bjørn Mork
    Signed-off-by: David S. Miller

    Kristian Evensen
     
  • - Backwards Compatibility:
    If userspace wants to determine whether RTM_NEWLINK supports the
    IFLA_IF_NETNSID property they should first send an RTM_GETLINK request
    with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply
    does not include IFLA_IF_NETNSID userspace should assume that
    IFLA_IF_NETNSID is not supported on this kernel.
    If the reply does contain an IFLA_IF_NETNSID property userspace
    can send an RTM_NEWLINK with a IFLA_IF_NETNSID property. If they receive
    EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property
    with RTM_NEWLINK. Userpace should then fallback to other means.

    - Security:
    Callers must have CAP_NET_ADMIN in the owning user namespace of the
    target network namespace.

    Signed-off-by: Christian Brauner
    Signed-off-by: David S. Miller

    Christian Brauner
     

30 Jan, 2018

36 commits

  • ipmr_vif_seq_show() prints the difference between two pointers with the
    format string %2zd (z for size_t), however the correct format string is
    %2td instead (t for ptrdiff_t).

    The same bug in ip6mr_vif_seq_show() was already fixed long ago by
    commit d430a227d272 ("bogus format in ip6mr").

    Signed-off-by: James Hogan
    Cc: Alexey Kuznetsov
    Cc: "David S. Miller"
    Cc: Hideaki YOSHIFUJI
    Cc: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    James Hogan
     
  • Wait for a response from the VNIC server before exiting after setting
    the MAC address. The resolves an issue with bonding a VNIC client in
    ALB or TLB modes. The bonding driver was changing the MAC address more
    rapidly than the device could respond, causing the following errors.

    "bond0: the hw address of slave eth2 is in use by the bond;
    couldn't find a slave with a free hw address to give it
    (this should not have happened)"

    If the function waits until the change is finalized, these errors are
    avoided.

    Signed-off-by: Thomas Falcon
    Signed-off-by: David S. Miller

    Thomas Falcon
     
  • The following soft lockup was caught. This is a deadlock caused by
    recusive locking.

    Process kworker/u40:1:28016 was holding spin lock "mbx->queue_lock" in
    qlcnic_83xx_mailbox_worker(), while a softirq came in and ask the same spin
    lock in qlcnic_83xx_enqueue_mbx_cmd(). This lock should be hold by disable
    bh..

    [161846.962125] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/u40:1:28016]
    [161846.962367] Modules linked in: tun ocfs2 xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bnx2fc fcoe libfcoe libfc sunrpc 8021q mrp garp bridge stp llc bonding dm_round_robin dm_multipath iTCO_wdt iTCO_vendor_support pcspkr sb_edac edac_core i2c_i801 shpchp lpc_ich mfd_core ioatdma ipmi_devintf ipmi_si ipmi_msghandler sg ext4 jbd2 mbcache2 sr_mod cdrom sd_mod igb i2c_algo_bit i2c_core ahci libahci megaraid_sas ixgbe dca ptp pps_core vxlan udp_tunnel ip6_udp_tunnel qla2xxx scsi_transport_fc qlcnic crc32c_intel be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ipv6 cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log dm_mod
    [161846.962454]
    [161846.962460] CPU: 1 PID: 28016 Comm: kworker/u40:1 Not tainted 4.1.12-94.5.9.el6uek.x86_64 #2
    [161846.962463] Hardware name: Oracle Corporation SUN SERVER X4-2L /ASSY,MB,X4-2L , BIOS 26050100 09/19/2017
    [161846.962489] Workqueue: qlcnic_mailbox qlcnic_83xx_mailbox_worker [qlcnic]
    [161846.962493] task: ffff8801f2e34600 ti: ffff88004ca5c000 task.ti: ffff88004ca5c000
    [161846.962496] RIP: e030:[] [] xen_hypercall_sched_op+0xa/0x20
    [161846.962506] RSP: e02b:ffff880202e43388 EFLAGS: 00000206
    [161846.962509] RAX: 0000000000000000 RBX: ffff8801f6996b70 RCX: ffffffff810013aa
    [161846.962511] RDX: ffff880202e433cc RSI: ffff880202e433b0 RDI: 0000000000000003
    [161846.962513] RBP: ffff880202e433d0 R08: 0000000000000000 R09: ffff8801fe893200
    [161846.962516] R10: ffff8801fe400538 R11: 0000000000000206 R12: ffff880202e4b000
    [161846.962518] R13: 0000000000000050 R14: 0000000000000001 R15: 000000000000020d
    [161846.962528] FS: 0000000000000000(0000) GS:ffff880202e40000(0000) knlGS:ffff880202e40000
    [161846.962531] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
    [161846.962533] CR2: 0000000002612640 CR3: 00000001bb796000 CR4: 0000000000042660
    [161846.962536] Stack:
    [161846.962538] ffff880202e43608 0000000000000000 ffffffff813f0442 ffff880202e433b0
    [161846.962543] 0000000000000000 ffff880202e433cc ffffffff00000001 0000000000000000
    [161846.962547] 00000009813f03d6 ffff880202e433e0 ffffffff813f0460 ffff880202e43440
    [161846.962552] Call Trace:
    [161846.962555]
    [161846.962565] [] ? xen_poll_irq_timeout+0x42/0x50
    [161846.962570] [] xen_poll_irq+0x10/0x20
    [161846.962578] [] xen_lock_spinning+0xe2/0x110
    [161846.962583] [] __raw_callee_save_xen_lock_spinning+0x11/0x20
    [161846.962592] [] ? _raw_spin_lock+0x57/0x80
    [161846.962609] [] qlcnic_83xx_enqueue_mbx_cmd+0x7c/0xe0 [qlcnic]
    [161846.962623] [] qlcnic_83xx_issue_cmd+0x58/0x210 [qlcnic]
    [161846.962636] [] qlcnic_83xx_sre_macaddr_change+0x162/0x1d0 [qlcnic]
    [161846.962649] [] qlcnic_83xx_change_l2_filter+0x2b/0x30 [qlcnic]
    [161846.962657] [] ? __skb_flow_dissect+0x18b/0x650
    [161846.962670] [] qlcnic_send_filter+0x205/0x250 [qlcnic]
    [161846.962682] [] qlcnic_xmit_frame+0x547/0x7b0 [qlcnic]
    [161846.962691] [] xmit_one+0x82/0x1a0
    [161846.962696] [] dev_hard_start_xmit+0x50/0xa0
    [161846.962701] [] sch_direct_xmit+0x112/0x220
    [161846.962706] [] __dev_queue_xmit+0x1df/0x5e0
    [161846.962710] [] dev_queue_xmit_sk+0x13/0x20
    [161846.962721] [] bond_dev_queue_xmit+0x35/0x80 [bonding]
    [161846.962729] [] __bond_start_xmit+0x1cb/0x210 [bonding]
    [161846.962736] [] bond_start_xmit+0x31/0x60 [bonding]
    [161846.962740] [] xmit_one+0x82/0x1a0
    [161846.962745] [] dev_hard_start_xmit+0x50/0xa0
    [161846.962749] [] __dev_queue_xmit+0x4ee/0x5e0
    [161846.962754] [] dev_queue_xmit_sk+0x13/0x20
    [161846.962760] [] vlan_dev_hard_start_xmit+0xb2/0x150 [8021q]
    [161846.962764] [] xmit_one+0x82/0x1a0
    [161846.962769] [] dev_hard_start_xmit+0x50/0xa0
    [161846.962773] [] __dev_queue_xmit+0x4ee/0x5e0
    [161846.962777] [] dev_queue_xmit_sk+0x13/0x20
    [161846.962789] [] br_dev_queue_push_xmit+0x54/0xa0 [bridge]
    [161846.962797] [] br_forward_finish+0x2f/0x90 [bridge]
    [161846.962807] [] ? ttwu_do_wakeup+0x1d/0x100
    [161846.962811] [] ? __alloc_skb+0x8b/0x1f0
    [161846.962818] [] __br_forward+0x8d/0x120 [bridge]
    [161846.962822] [] ? __kmalloc_reserve+0x3b/0xa0
    [161846.962829] [] ? update_rq_runnable_avg+0xee/0x230
    [161846.962836] [] br_forward+0x96/0xb0 [bridge]
    [161846.962845] [] br_handle_frame_finish+0x1ae/0x420 [bridge]
    [161846.962853] [] br_handle_frame+0x17f/0x260 [bridge]
    [161846.962862] [] ? br_handle_frame_finish+0x420/0x420 [bridge]
    [161846.962867] [] __netif_receive_skb_core+0x1f7/0x870
    [161846.962872] [] __netif_receive_skb+0x22/0x70
    [161846.962877] [] netif_receive_skb_internal+0x23/0x90
    [161846.962884] [] ? xenvif_idx_release+0xea/0x100 [xen_netback]
    [161846.962889] [] ? _raw_spin_unlock_irqrestore+0x20/0x50
    [161846.962893] [] netif_receive_skb_sk+0x24/0x90
    [161846.962899] [] xenvif_tx_submit+0x2ca/0x3f0 [xen_netback]
    [161846.962906] [] xenvif_tx_action+0x9c/0xd0 [xen_netback]
    [161846.962915] [] xenvif_poll+0x35/0x70 [xen_netback]
    [161846.962920] [] napi_poll+0xcb/0x1e0
    [161846.962925] [] net_rx_action+0x90/0x1c0
    [161846.962931] [] __do_softirq+0x10a/0x350
    [161846.962938] [] irq_exit+0x125/0x130
    [161846.962943] [] xen_evtchn_do_upcall+0x39/0x50
    [161846.962950] [] xen_do_hypervisor_callback+0x1e/0x40
    [161846.962952]
    [161846.962959] [] ? _raw_spin_lock+0x4a/0x80
    [161846.962964] [] ? _raw_spin_lock_irqsave+0x1e/0xa0
    [161846.962978] [] ? qlcnic_83xx_mailbox_worker+0xb9/0x2a0 [qlcnic]
    [161846.962991] [] ? process_one_work+0x151/0x4b0
    [161846.962995] [] ? check_events+0x12/0x20
    [161846.963001] [] ? worker_thread+0x120/0x480
    [161846.963005] [] ? __schedule+0x30b/0x890
    [161846.963010] [] ? process_one_work+0x4b0/0x4b0
    [161846.963015] [] ? process_one_work+0x4b0/0x4b0
    [161846.963021] [] ? kthread+0xce/0xf0
    [161846.963025] [] ? kthread_freezable_should_stop+0x70/0x70
    [161846.963031] [] ? ret_from_fork+0x42/0x70
    [161846.963035] [] ? kthread_freezable_should_stop+0x70/0x70
    [161846.963037] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc

    Signed-off-by: Junxiao Bi
    Signed-off-by: David S. Miller

    Junxiao Bi
     
  • socket can be disconnected and gets transformed back to a listening
    socket, if sk_frag.page is not released, which will be cloned into
    a new socket by sk_clone_lock, but the reference count of this page
    is increased, lead to a use after free or double free issue

    Signed-off-by: Li RongQing
    Cc: Eric Dumazet
    Signed-off-by: David S. Miller

    Li RongQing
     
  • When using ioctl to get address of interface, we can't
    get it anymore. For example, the command is show as below.

    # ifconfig eth0

    In the patch ("03aef17bb79b3"), the devinet_ioctl does not
    return a suitable value, even though we can find it in
    the kernel. Then fix it now.

    Fixes: 03aef17bb79b3 ("devinet_ioctl(): take copyin/copyout to caller")
    Cc: Al Viro
    Signed-off-by: Tonghao Zhang
    Acked-by: Al Viro
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • syzbot reported a lockdep splat in gen_new_estimator() /
    est_fetch_counters() when attempting to lock est->stats_lock.

    Since est_fetch_counters() is called from BH context from timer
    interrupt, we need to block BH as well when calling it from process
    context.

    Most qdiscs use per cpu counters and are immune to the problem,
    but net/sched/act_api.c and net/netfilter/xt_RATEEST.c are using
    a spinlock to protect their data. They both call gen_new_estimator()
    while object is created and not yet alive, so this bug could
    not trigger a deadlock, only a lockdep splat.

    Fixes: 1c0d32fde5bd ("net_sched: gen_estimator: complete rewrite of rate estimators")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Handle HRESP error by doing a SW reset of RX and TX and
    re-initializing the descriptors, RX and TX queue pointers.

    Signed-off-by: Harini Katakam
    Signed-off-by: Michal Simek
    Signed-off-by: David S. Miller

    Harini Katakam
     
  • On TTC table creation, the indirection TIRs should be used instead of
    the inner indirection TIRs.

    Fixes: 1ae1df3a1193 ("net/mlx5e: Refactor RSS related objects and code")
    Signed-off-by: Gal Pressman
    Reviewed-by: Shalom Lagziel
    Signed-off-by: Saeed Mahameed
    Signed-off-by: David S. Miller

    Gal Pressman
     
  • Heiner reported a lockdep splat [1]

    This is caused by attempting GFP_KERNEL allocation while RCU lock is
    held and BH blocked.

    We believe that addrconf_verify_rtnl() could run for a long period,
    so instead of using GFP_ATOMIC here as Ido suggested, we should break
    the critical section and restart it after the allocation.

    [1]
    [86220.125562] =============================
    [86220.125586] WARNING: suspicious RCU usage
    [86220.125612] 4.15.0-rc7-next-20180110+ #7 Not tainted
    [86220.125641] -----------------------------
    [86220.125666] kernel/sched/core.c:6026 Illegal context switch in RCU-bh read-side critical section!
    [86220.125711]
    other info that might help us debug this:

    [86220.125755]
    rcu_scheduler_active = 2, debug_locks = 1
    [86220.125792] 4 locks held by kworker/0:2/1003:
    [86220.125817] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [] process_one_work+0x1de/0x680
    [86220.125895] #1: ((addr_chk_work).work){+.+.}, at: [] process_one_work+0x1de/0x680
    [86220.125959] #2: (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x12/0x20
    [86220.126017] #3: (rcu_read_lock_bh){....}, at: [] addrconf_verify_rtnl+0x1e/0x510 [ipv6]
    [86220.126111]
    stack backtrace:
    [86220.126142] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 4.15.0-rc7-next-20180110+ #7
    [86220.126185] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS B246P105 06/01/2015
    [86220.126250] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6]
    [86220.126288] Call Trace:
    [86220.126312] dump_stack+0x70/0x9e
    [86220.126337] lockdep_rcu_suspicious+0xce/0xf0
    [86220.126365] ___might_sleep+0x1d3/0x240
    [86220.126390] __might_sleep+0x45/0x80
    [86220.126416] kmem_cache_alloc_trace+0x53/0x250
    [86220.126458] ? ipv6_add_addr+0xfe/0x6e0 [ipv6]
    [86220.126498] ipv6_add_addr+0xfe/0x6e0 [ipv6]
    [86220.126538] ipv6_create_tempaddr+0x24d/0x430 [ipv6]
    [86220.126580] ? ipv6_create_tempaddr+0x24d/0x430 [ipv6]
    [86220.126623] addrconf_verify_rtnl+0x339/0x510 [ipv6]
    [86220.126664] ? addrconf_verify_rtnl+0x339/0x510 [ipv6]
    [86220.126708] addrconf_verify_work+0xe/0x20 [ipv6]
    [86220.126738] process_one_work+0x258/0x680
    [86220.126765] worker_thread+0x35/0x3f0
    [86220.126790] kthread+0x124/0x140
    [86220.126813] ? process_one_work+0x680/0x680
    [86220.126839] ? kthread_create_worker_on_cpu+0x40/0x40
    [86220.126869] ? umh_complete+0x40/0x40
    [86220.126893] ? call_usermodehelper_exec_async+0x12a/0x160
    [86220.126926] ret_from_fork+0x4b/0x60
    [86220.126999] BUG: sleeping function called from invalid context at mm/slab.h:420
    [86220.127041] in_atomic(): 1, irqs_disabled(): 0, pid: 1003, name: kworker/0:2
    [86220.127082] 4 locks held by kworker/0:2/1003:
    [86220.127107] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [] process_one_work+0x1de/0x680
    [86220.127179] #1: ((addr_chk_work).work){+.+.}, at: [] process_one_work+0x1de/0x680
    [86220.127242] #2: (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x12/0x20
    [86220.127300] #3: (rcu_read_lock_bh){....}, at: [] addrconf_verify_rtnl+0x1e/0x510 [ipv6]
    [86220.127414] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 4.15.0-rc7-next-20180110+ #7
    [86220.127463] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS B246P105 06/01/2015
    [86220.127528] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6]
    [86220.127568] Call Trace:
    [86220.127591] dump_stack+0x70/0x9e
    [86220.127616] ___might_sleep+0x14d/0x240
    [86220.127644] __might_sleep+0x45/0x80
    [86220.127672] kmem_cache_alloc_trace+0x53/0x250
    [86220.127717] ? ipv6_add_addr+0xfe/0x6e0 [ipv6]
    [86220.127762] ipv6_add_addr+0xfe/0x6e0 [ipv6]
    [86220.127807] ipv6_create_tempaddr+0x24d/0x430 [ipv6]
    [86220.127854] ? ipv6_create_tempaddr+0x24d/0x430 [ipv6]
    [86220.127903] addrconf_verify_rtnl+0x339/0x510 [ipv6]
    [86220.127950] ? addrconf_verify_rtnl+0x339/0x510 [ipv6]
    [86220.127998] addrconf_verify_work+0xe/0x20 [ipv6]
    [86220.128032] process_one_work+0x258/0x680
    [86220.128063] worker_thread+0x35/0x3f0
    [86220.128091] kthread+0x124/0x140
    [86220.128117] ? process_one_work+0x680/0x680
    [86220.128146] ? kthread_create_worker_on_cpu+0x40/0x40
    [86220.128180] ? umh_complete+0x40/0x40
    [86220.128207] ? call_usermodehelper_exec_async+0x12a/0x160
    [86220.128243] ret_from_fork+0x4b/0x60

    Fixes: f3d9832e56c4 ("ipv6: addrconf: cleanup locking in ipv6_add_addr")
    Signed-off-by: Eric Dumazet
    Reported-by: Heiner Kallweit
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In current route cache aging logic, if a route has both RTF_EXPIRE and
    RTF_GATEWAY set, the route will only be removed if the neighbor cache
    has no NTF_ROUTER flag. Otherwise, even if the route has expired, it
    won't get deleted.
    Fix this logic to always check if the route has expired first and then
    do the gateway neighbor cache check if previous check decide to not
    remove the exception entry.

    Fixes: 1859bac04fb6 ("ipv6: remove from fib tree aged out RTF_CACHE dst")
    Signed-off-by: Wei Wang
    Signed-off-by: Eric Dumazet
    Acked-by: Martin KaFai Lau
    Acked-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Wei Wang
     
  • When compared to ixgbe and other previous Intel drivers the i40e and i40evf
    drivers actually reserve 2 additional descriptors in maybe_stop_tx for
    cache line alignment. We need to update DESC_NEEDED to reflect this as
    otherwise we are more likely to return TX_BUSY which will cause issues with
    things like xmit_more.

    Signed-off-by: Alexander Duyck
    Tested-by: Andrew Bowers
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • Make sure to cancel any pending work that might update driver coalesce
    settings when taking down an interface.

    Fixes: 6a8788f25625 ("bnxt_en: add support for software dynamic interrupt moderation")
    Signed-off-by: Andy Gospodarek
    Cc: Michael Chan
    Acked-by: Michael Chan
    Signed-off-by: David S. Miller

    Andy Gospodarek
     
  • Unsolicited IPv6 neighbor advertisements should be sent after DAD
    completes. Update ndisc_send_unsol_na to skip tentative, non-optimistic
    addresses and have those sent by addrconf_dad_completed after DAD.

    Fixes: 4a6e3c5def13c ("net: ipv6: send unsolicited NA on admin up")
    Reported-by: Vivek Venkatraman
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • When the frame check sequence (FCS) is split across the last two frames
    of a fragmented packet, part of the FCS gets counted twice, once when
    subtracting the FCS, and again when subtracting the previously received
    data.

    For example, if 1602 bytes are received, and the first fragment contains
    the first 1600 bytes (including the first two bytes of the FCS), and the
    second fragment contains the last two bytes of the FCS:

    'skb->len == 1600' from the first fragment

    size = lstatus & BD_LENGTH_MASK; # 1602
    size -= ETH_FCS_LEN; # 1598
    size -= skb->len; # -2

    Since the size is unsigned, it wraps around and causes a BUG later in
    the packet handling, as shown below:

    kernel BUG at ./include/linux/skbuff.h:2068!
    Oops: Exception in kernel mode, sig: 5 [#1]
    ...
    NIP [c021ec60] skb_pull+0x24/0x44
    LR [c01e2fbc] gfar_clean_rx_ring+0x498/0x690
    Call Trace:
    [df7edeb0] [c01e2c1c] gfar_clean_rx_ring+0xf8/0x690 (unreliable)
    [df7edf20] [c01e33a8] gfar_poll_rx_sq+0x3c/0x9c
    [df7edf40] [c023352c] net_rx_action+0x21c/0x274
    [df7edf90] [c0329000] __do_softirq+0xd8/0x240
    [df7edff0] [c000c108] call_do_irq+0x24/0x3c
    [c0597e90] [c00041dc] do_IRQ+0x64/0xc4
    [c0597eb0] [c000d920] ret_from_except+0x0/0x18
    --- interrupt: 501 at arch_cpu_idle+0x24/0x5c

    Change the size to a signed integer and then trim off any part of the
    FCS that was received prior to the last fragment.

    Fixes: 6c389fc931bc ("gianfar: fix size of scatter-gathered frames")
    Signed-off-by: Andy Spencer
    Signed-off-by: David S. Miller

    Andy Spencer
     
  • Cong Wang says:

    ====================
    net_sched: reflect tx_queue_len change for pfifo_fast

    This pathcset restores the pfifo_fast qdisc behavior of dropping
    packets based on latest dev->tx_queue_len. Patch 1 introduces
    a helper, patch 2 introduces a new Qdisc ops which is called when
    we modify tx_queue_len, patch 3 implements this ops for pfifo_fast.

    Please see each patch for details.

    ---
    v3: use skb_array_resize_multiple()
    v2: handle error case for ->change_tx_queue_len()
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • pfifo_fast used to drop based on qdisc_dev(qdisc)->tx_queue_len,
    so we have to resize skb array when we change tx_queue_len.

    Other qdiscs which read tx_queue_len are fine because they
    all save it to sch->limit or somewhere else in qdisc during init.
    They don't have to implement this, it is nicer if they do so
    that users don't have to re-configure qdisc after changing
    tx_queue_len.

    Cc: John Fastabend
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Introduce a new qdisc ops ->change_tx_queue_len() so that
    each qdisc could decide how to implement this if it wants.
    Previously we simply read dev->tx_queue_len, after pfifo_fast
    switches to skb array, we need this API to resize the skb array
    when we change dev->tx_queue_len.

    To avoid handling race conditions with TX BH, we need to
    deactivate all TX queues before change the value and bring them
    back after we are done, this also makes implementation easier.

    Cc: John Fastabend
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • This patch promotes the local change_tx_queue_len() to a core
    helper function, dev_change_tx_queue_len(), so that rtnetlink
    and net-sysfs could share the code. This also prepares for the
    following patch.

    Note, the -EFAULT in the original code doesn't make sense,
    we should propagate the errno from notifiers.

    Cc: John Fastabend
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • We don't stop device before reset owner, this means we could try to
    serve any virtqueue kick before reset dev->worker. This will result a
    warn since the work was pending at llist during owner resetting. Fix
    this by stopping device during owner reset.

    Reported-by: syzbot+eb17c6162478cc50632c@syzkaller.appspotmail.com
    Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server")
    Signed-off-by: Jason Wang
    Signed-off-by: David S. Miller

    Jason Wang
     
  • Nicolas Dichtel says:

    ====================
    net: Ease to follow an interface that moves to another netns

    The goal of this series is to ease the user to follow an interface that
    moves to another netns.

    After this series, with a patched iproute2:

    $ ip netns
    bar
    foo
    $ ip monitor link &
    $ ip link set dummy0 netns foo
    Deleted 5: dummy0: mtu 1500 qdisc noop state DOWN group default
    link/ether 6e:a7:82:35:96:46 brd ff:ff:ff:ff:ff:ff new-nsid 0 new-ifindex 6

    => new nsid: 0, new ifindex: 6 (was 5 in the previous netns)

    $ ip link set eth1 netns bar
    Deleted 3: eth1: mtu 1500 qdisc noop state DOWN group default
    link/ether 52:54:01:12:34:57 brd ff:ff:ff:ff:ff:ff new-nsid 1 new-ifindex 3

    => new nsid: 1, new ifindex: 3 (same ifindex)

    $ ip netns
    bar (id: 1)
    foo (id: 0)
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The goal is to let the user follow an interface that moves to another
    netns.

    CC: Jiri Benc
    CC: Christian Brauner
    Signed-off-by: Nicolas Dichtel
    Reviewed-by: Jiri Benc
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • The user should be able to follow any interface that moves to another
    netns. There is no reason to hide physical interfaces.

    CC: Jiri Benc
    CC: Christian Brauner
    Signed-off-by: Nicolas Dichtel
    Reviewed-by: Jiri Benc
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     
  • It was found that ethtool provides unexisting module name while
    it queries the specified network device for associated driver
    information. Then user tries to unload that module by provided
    module name and fails.

    This happens because ethtool reads value of DRV_NAME macro,
    while module name is defined at the driver's Makefile.

    This patch is to correct Cavium CN88xx Thunder NIC driver names
    (DRV_NAME macro) 'thunder-nicvf' to 'nicvf' and 'thunder-nic'
    to 'nicpf', sync bgx and xcv driver names accordingly to their
    module names.

    Signed-off-by: Vadim Lomovtsev
    Signed-off-by: David S. Miller

    Vadim Lomovtsev
     
  • Michael S. Tsirkin says:

    ====================
    ptr_ring fixes

    This fixes a bunch of issues around ptr_ring use in net core.
    One of these: "tap: fix use-after-free" is also needed on net,
    but can't be backported cleanly.

    I will post a net patch separately.

    Lightly tested - Jason, could you pls confirm this
    addresses the security issue you saw with ptr_ring?
    Testing reports would be appreciated too.
    ====================

    Signed-off-by: David S. Miller
    Tested-by: Jason Wang
    Acked-by: Jason Wang

    David S. Miller
     
  • Offset 128 overlaps the last word of the redzone.
    Use 132 which is always beyond that.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     
  • This is to make ptr_ring test build again.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     
  • Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     
  • We don't rely on lockless guarantees, but it
    seems cleaner than inverting __ptr_ring_peek.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     
  • In theory compiler could tear queue loads or stores in two. It does not
    seem to be happening in practice but it seems easier to convert the
    cases where this would be a problem to READ/WRITE_ONCE than worry about
    it.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     
  • __skb_array_empty should use __ptr_ring_empty since that's the only
    legal lockless function.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     
  • This reverts commit bcecb4bbf88aa03171c30652bca761cf27755a6b.

    If we try to allocate an extra entry as the above commit did, and when
    the requested size is UINT_MAX, addition overflows causing zero size to
    be passed to kmalloc().

    kmalloc then returns ZERO_SIZE_PTR with a subsequent crash.

    Reported-by: syzbot+87678bcf753b44c39b67@syzkaller.appspotmail.com
    Cc: John Fastabend
    Signed-off-by: Michael S. Tsirkin
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     
  • Similar to bcecb4bbf88a ("net: ptr_ring: otherwise safe empty checks can
    overrun array bounds") a lockless use of __ptr_ring_full might
    cause an out of bounds access.

    We can fix this, but it's easier to just disallow lockless
    __ptr_ring_full for now.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     
  • Lockless access to __ptr_ring_full is only legal if ring is
    never resized, otherwise it might cause use-after free errors.
    Simply drop the lockless test, we'll drop the packet
    a bit later when produce fails.

    Fixes: 362899b8 ("macvtap: switch to use skb array")
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     
  • Lockless __ptr_ring_empty requires that consumer head is read and
    written at once, atomically. Annotate accordingly to make sure compiler
    does it correctly. Switch locked callers to __ptr_ring_peek which does
    not support the lockless operation.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     
  • The only function safe to call without locks
    is __ptr_ring_empty. Move documentation about
    lockless use there to make sure people do not
    try to use __ptr_ring_peek outside locks.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     
  • The comment near __ptr_ring_peek says:

    * If ring is never resized, and if the pointer is merely
    * tested, there's no need to take the lock - see e.g. __ptr_ring_empty.

    but this was in fact never possible since consumer_head would sometimes
    point outside the ring. Refactor the code so that it's always
    pointing within a ring.

    Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
    Signed-off-by: Michael S. Tsirkin
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Michael S. Tsirkin