19 Nov, 2020

1 commit

  • DSA network devices rely on having their DSA management interface up and
    running otherwise their ndo_open() will return -ENETDOWN. Without doing
    this it would not be possible to use DSA devices as netconsole when
    configured on the command line. These devices also do not utilize the
    upper/lower linking so the check about the netpoll device having upper
    is not going to be a problem.

    The solution adopted here is identical to the one done for
    net/ipv4/ipconfig.c with 728c02089a0e ("net: ipv4: handle DSA enabled
    master network devices"), with the network namespace scope being
    restricted to that of the process configuring netpoll.

    Fixes: 04ff53f96a93 ("net: dsa: Add netconsole support")
    Tested-by: Vladimir Oltean
    Signed-off-by: Florian Fainelli
    Link: https://lore.kernel.org/r/20201117035236.22658-1-f.fainelli@gmail.com
    Signed-off-by: Jakub Kicinski

    Florian Fainelli
     

11 Sep, 2020

1 commit


27 Aug, 2020

1 commit

  • napi_disable() makes sure to set the NAPI_STATE_NPSVC bit to prevent
    netpoll from accessing rings before init is complete. However, the
    same is not done for fresh napi instances in netif_napi_add(),
    even though we expect NAPI instances to be added as disabled.

    This causes crashes during driver reconfiguration (enabling XDP,
    changing the channel count) - if there is any printk() after
    netif_napi_add() but before napi_enable().

    To ensure memory ordering is correct we need to use RCU accessors.

    Reported-by: Rob Sherwood
    Fixes: 2d8bff12699a ("netpoll: Close race condition between poll_one_napi and napi_disable")
    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Jakub Kicinski
     

08 May, 2020

4 commits


30 Apr, 2020

1 commit


28 Aug, 2019

1 commit

  • After commit baeababb5b85d5c4e6c917efe2a1504179438d3b
    ("tun: return NET_XMIT_DROP for dropped packets"),
    when tun_net_xmit drop packets, it will free skb and return NET_XMIT_DROP,
    netpoll_send_skb_on_dev will run into following use after free cases:
    1. retry netpoll_start_xmit with freed skb;
    2. queue freed skb in npinfo->txq.
    queue_process will also run into use after free case.

    hit netpoll_send_skb_on_dev first case with following kernel log:

    [ 117.864773] kernel BUG at mm/slub.c:306!
    [ 117.864773] invalid opcode: 0000 [#1] SMP PTI
    [ 117.864774] CPU: 3 PID: 2627 Comm: loop_printmsg Kdump: loaded Tainted: P OE 5.3.0-050300rc5-generic #201908182231
    [ 117.864775] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    [ 117.864775] RIP: 0010:kmem_cache_free+0x28d/0x2b0
    [ 117.864781] Call Trace:
    [ 117.864781] ? tun_net_xmit+0x21c/0x460
    [ 117.864781] kfree_skbmem+0x4e/0x60
    [ 117.864782] kfree_skb+0x3a/0xa0
    [ 117.864782] tun_net_xmit+0x21c/0x460
    [ 117.864782] netpoll_start_xmit+0x11d/0x1b0
    [ 117.864788] netpoll_send_skb_on_dev+0x1b8/0x200
    [ 117.864789] __br_forward+0x1b9/0x1e0 [bridge]
    [ 117.864789] ? skb_clone+0x53/0xd0
    [ 117.864790] ? __skb_clone+0x2e/0x120
    [ 117.864790] deliver_clone+0x37/0x50 [bridge]
    [ 117.864790] maybe_deliver+0x89/0xc0 [bridge]
    [ 117.864791] br_flood+0x6c/0x130 [bridge]
    [ 117.864791] br_dev_xmit+0x315/0x3c0 [bridge]
    [ 117.864792] netpoll_start_xmit+0x11d/0x1b0
    [ 117.864792] netpoll_send_skb_on_dev+0x1b8/0x200
    [ 117.864792] netpoll_send_udp+0x2c6/0x3e8
    [ 117.864793] write_msg+0xd9/0xf0 [netconsole]
    [ 117.864793] console_unlock+0x386/0x4e0
    [ 117.864793] vprintk_emit+0x17e/0x280
    [ 117.864794] vprintk_default+0x29/0x50
    [ 117.864794] vprintk_func+0x4c/0xbc
    [ 117.864794] printk+0x58/0x6f
    [ 117.864795] loop_fun+0x24/0x41 [printmsg_loop]
    [ 117.864795] kthread+0x104/0x140
    [ 117.864795] ? 0xffffffffc05b1000
    [ 117.864796] ? kthread_park+0x80/0x80
    [ 117.864796] ret_from_fork+0x35/0x40

    Signed-off-by: Feng Sun
    Signed-off-by: Xiaojun Zhao
    Signed-off-by: David S. Miller

    Feng Sun
     

03 Jun, 2019

1 commit

  • ifa_list is protected by rcu, yet code doesn't reflect this.

    Add the __rcu annotations and fix up all places that are now reported by
    sparse.

    I've done this in the same commit to not add intermediate patches that
    result in new warnings.

    Reported-by: Eric Dumazet
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

08 May, 2019

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    1) Support AES128-CCM ciphers in kTLS, from Vakul Garg.

    2) Add fib_sync_mem to control the amount of dirty memory we allow to
    queue up between synchronize RCU calls, from David Ahern.

    3) Make flow classifier more lockless, from Vlad Buslov.

    4) Add PHY downshift support to aquantia driver, from Heiner
    Kallweit.

    5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces
    contention on SLAB spinlocks in heavy RPC workloads.

    6) Partial GSO offload support in XFRM, from Boris Pismenny.

    7) Add fast link down support to ethtool, from Heiner Kallweit.

    8) Use siphash for IP ID generator, from Eric Dumazet.

    9) Pull nexthops even further out from ipv4/ipv6 routes and FIB
    entries, from David Ahern.

    10) Move skb->xmit_more into a per-cpu variable, from Florian
    Westphal.

    11) Improve eBPF verifier speed and increase maximum program size,
    from Alexei Starovoitov.

    12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit
    spinlocks. From Neil Brown.

    13) Allow tunneling with GUE encap in ipvs, from Jacky Hu.

    14) Improve link partner cap detection in generic PHY code, from
    Heiner Kallweit.

    15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan
    Maguire.

    16) Remove SKB list implementation assumptions in SCTP, your's truly.

    17) Various cleanups, optimizations, and simplifications in r8169
    driver. From Heiner Kallweit.

    18) Add memory accounting on TX and RX path of SCTP, from Xin Long.

    19) Switch PHY drivers over to use dynamic featue detection, from
    Heiner Kallweit.

    20) Support flow steering without masking in dpaa2-eth, from Ioana
    Ciocoi.

    21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri
    Pirko.

    22) Increase the strict parsing of current and future netlink
    attributes, also export such policies to userspace. From Johannes
    Berg.

    23) Allow DSA tag drivers to be modular, from Andrew Lunn.

    24) Remove legacy DSA probing support, also from Andrew Lunn.

    25) Allow ll_temac driver to be used on non-x86 platforms, from Esben
    Haabendal.

    26) Add a generic tracepoint for TX queue timeouts to ease debugging,
    from Cong Wang.

    27) More indirect call optimizations, from Paolo Abeni"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits)
    cxgb4: Fix error path in cxgb4_init_module
    net: phy: improve pause mode reporting in phy_print_status
    dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings
    net: macb: Change interrupt and napi enable order in open
    net: ll_temac: Improve error message on error IRQ
    net/sched: remove block pointer from common offload structure
    net: ethernet: support of_get_mac_address new ERR_PTR error
    net: usb: smsc: fix warning reported by kbuild test robot
    staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check
    net: dsa: support of_get_mac_address new ERR_PTR error
    net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats
    vrf: sit mtu should not be updated when vrf netdev is the link
    net: dsa: Fix error cleanup path in dsa_init_module
    l2tp: Fix possible NULL pointer dereference
    taprio: add null check on sched_nest to avoid potential null pointer dereference
    net: mvpp2: cls: fix less than zero check on a u32 variable
    net_sched: sch_fq: handle non connected flows
    net_sched: sch_fq: do not assume EDT packets are ordered
    net: hns3: use devm_kcalloc when allocating desc_cb
    net: hns3: some cleanup for struct hns3_enet_ring
    ...

    Linus Torvalds
     

09 Apr, 2019

1 commit

  • %pF and %pf are functionally equivalent to %pS and %ps conversion
    specifiers. The former are deprecated, therefore switch the current users
    to use the preferred variant.

    The changes have been produced by the following command:

    git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
    while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done

    And verifying the result.

    Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
    Cc: Andy Shevchenko
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: sparclinux@vger.kernel.org
    Cc: linux-um@lists.infradead.org
    Cc: xen-devel@lists.xenproject.org
    Cc: linux-acpi@vger.kernel.org
    Cc: linux-pm@vger.kernel.org
    Cc: drbd-dev@lists.linbit.com
    Cc: linux-block@vger.kernel.org
    Cc: linux-mmc@vger.kernel.org
    Cc: linux-nvdimm@lists.01.org
    Cc: linux-pci@vger.kernel.org
    Cc: linux-scsi@vger.kernel.org
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: linux-mm@kvack.org
    Cc: ceph-devel@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Signed-off-by: Sakari Ailus
    Acked-by: David Sterba (for btrfs)
    Acked-by: Mike Rapoport (for mm/memblock.c)
    Acked-by: Bjorn Helgaas (for drivers/pci)
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Petr Mladek

    Sakari Ailus
     

21 Mar, 2019

1 commit

  • With the following patches, we are going to use __netdev_pick_tx() in
    many modules. Rename it to netdev_pick_tx(), to make it clear is
    a public API.

    Also rename the existing netdev_pick_tx() to netdev_core_pick_tx(),
    to avoid name clashes.

    Suggested-by: Eric Dumazet
    Suggested-by: David Miller
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

28 Dec, 2018

1 commit

  • Pull networking updates from David Miller:

    1) New ipset extensions for matching on destination MAC addresses, from
    Stefano Brivio.

    2) Add ipv4 ttl and tos, plus ipv6 flow label and hop limit offloads to
    nfp driver. From Stefano Brivio.

    3) Implement GRO for plain UDP sockets, from Paolo Abeni.

    4) Lots of work from Michał Mirosław to eliminate the VLAN_TAG_PRESENT
    bit so that we could support the entire vlan_tci value.

    5) Rework the IPSEC policy lookups to better optimize more usecases,
    from Florian Westphal.

    6) Infrastructure changes eliminating direct manipulation of SKB lists
    wherever possible, and to always use the appropriate SKB list
    helpers. This work is still ongoing...

    7) Lots of PHY driver and state machine improvements and
    simplifications, from Heiner Kallweit.

    8) Various TSO deferral refinements, from Eric Dumazet.

    9) Add ntuple filter support to aquantia driver, from Dmitry Bogdanov.

    10) Batch dropping of XDP packets in tuntap, from Jason Wang.

    11) Lots of cleanups and improvements to the r8169 driver from Heiner
    Kallweit, including support for ->xmit_more. This driver has been
    getting some much needed love since he started working on it.

    12) Lots of new forwarding selftests from Petr Machata.

    13) Enable VXLAN learning in mlxsw driver, from Ido Schimmel.

    14) Packed ring support for virtio, from Tiwei Bie.

    15) Add new Aquantia AQtion USB driver, from Dmitry Bezrukov.

    16) Add XDP support to dpaa2-eth driver, from Ioana Ciocoi Radulescu.

    17) Implement coalescing on TCP backlog queue, from Eric Dumazet.

    18) Implement carrier change in tun driver, from Nicolas Dichtel.

    19) Support msg_zerocopy in UDP, from Willem de Bruijn.

    20) Significantly improve garbage collection of neighbor objects when
    the table has many PERMANENT entries, from David Ahern.

    21) Remove egdev usage from nfp and mlx5, and remove the facility
    completely from the tree as it no longer has any users. From Oz
    Shlomo and others.

    22) Add a NETDEV_PRE_CHANGEADDR so that drivers can veto the change and
    therefore abort the operation before the commit phase (which is the
    NETDEV_CHANGEADDR event). From Petr Machata.

    23) Add indirect call wrappers to avoid retpoline overhead, and use them
    in the GRO code paths. From Paolo Abeni.

    24) Add support for netlink FDB get operations, from Roopa Prabhu.

    25) Support bloom filter in mlxsw driver, from Nir Dotan.

    26) Add SKB extension infrastructure. This consolidates the handling of
    the auxiliary SKB data used by IPSEC and bridge netfilter, and is
    designed to support the needs to MPTCP which could be integrated in
    the future.

    27) Lots of XDP TX optimizations in mlx5 from Tariq Toukan.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1845 commits)
    net: dccp: fix kernel crash on module load
    drivers/net: appletalk/cops: remove redundant if statement and mask
    bnx2x: Fix NULL pointer dereference in bnx2x_del_all_vlans() on some hw
    net/net_namespace: Check the return value of register_pernet_subsys()
    net/netlink_compat: Fix a missing check of nla_parse_nested
    ieee802154: lowpan_header_create check must check daddr
    net/mlx4_core: drop useless LIST_HEAD
    mlxsw: spectrum: drop useless LIST_HEAD
    net/mlx5e: drop useless LIST_HEAD
    iptunnel: Set tun_flags in the iptunnel_metadata_reply from src
    net/mlx5e: fix semicolon.cocci warnings
    staging: octeon: fix build failure with XFRM enabled
    net: Revert recent Spectre-v1 patches.
    can: af_can: Fix Spectre v1 vulnerability
    packet: validate address length if non-zero
    nfc: af_nfc: Fix Spectre v1 vulnerability
    phonet: af_phonet: Fix Spectre v1 vulnerability
    net: core: Fix Spectre v1 vulnerability
    net: minor cleanup in skb_ext_add()
    net: drop the unused helper skb_ext_get()
    ...

    Linus Torvalds
     

07 Dec, 2018

1 commit

  • In order to pass extack together with NETDEV_PRE_UP notifications, it's
    necessary to route the extack to __dev_open() from diverse (possibly
    indirect) callers. One prominent API through which the notification is
    invoked is dev_open().

    Therefore extend dev_open() with and extra extack argument and update
    all users. Most of the calls end up just encoding NULL, but bond and
    team drivers have the extack readily available.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Reviewed-by: Ido Schimmel
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Petr Machata
     

04 Dec, 2018

1 commit

  • …k/linux-rcu into core/rcu

    Pull RCU changes from Paul E. McKenney:

    - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar.

    - Replace calls of RCU-bh and RCU-sched update-side functions
    to their vanilla RCU counterparts. This series is a step
    towards complete removal of the RCU-bh and RCU-sched update-side
    functions.

    ( Note that some of these conversions are going upstream via their
    respective maintainers. )

    - Documentation updates, including a number of flavor-consolidation
    updates from Joel Fernandes.

    - Miscellaneous fixes.

    - Automate generation of the initrd filesystem used for
    rcutorture testing.

    - Convert spin_is_locked() assertions to instead use lockdep.

    ( Note that some of these conversions are going upstream via their
    respective maintainers. )

    - SRCU updates, especially including a fix from Dennis Krein
    for a bag-on-head-class bug.

    - RCU torture-test updates.

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

02 Dec, 2018

1 commit

  • Now that call_rcu()'s callback is not invoked until after all bh-disable
    regions of code have completed (in addition to explicitly marked
    RCU read-side critical sections), call_rcu() can be used in place of
    call_rcu_bh(). Similarly, synchronize_rcu() can be used in place of
    synchronize_rcu_bh(). This commit therefore makes these changes.

    Signed-off-by: Paul E. McKenney
    Cc: "David S. Miller"
    Cc: Eric Dumazet
    Cc:

    Paul E. McKenney
     

06 Nov, 2018

1 commit


20 Oct, 2018

2 commits

  • This fixes a problem introduced by:
    commit 2cde6acd49da ("netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock")

    When using netconsole on a bond, __netpoll_cleanup can asynchronously
    recurse multiple times, each __netpoll_free_async call can result in
    more __netpoll_free_async's. This means there is now a race between
    cleanup_work queues on multiple netpoll_info's on multiple devices and
    the configuration of a new netpoll. For example if a netconsole is set
    to enable 0, reconfigured, and enable 1 immediately, this netconsole
    will likely not work.

    Given the reason for __netpoll_free_async is it can be called when rtnl
    is not locked, if it is locked, we should be able to execute
    synchronously. It appears to be locked everywhere it's called from.

    Generalize the design pattern from the teaming driver for current
    callers of __netpoll_free_async.

    CC: Neil Horman
    CC: "David S. Miller"
    Signed-off-by: Debabrata Banerjee
    Signed-off-by: David S. Miller

    Debabrata Banerjee
     
  • This reverts commit 6fe9487892b32cb1c8b8b0d552ed7222a527fe30.

    It is causing more serious regressions than the RCU warning
    it is fixing.

    Signed-off-by: David S. Miller

    David S. Miller
     

02 Oct, 2018

1 commit

  • The bonding driver lacks the rcu lock when it calls down into
    netdev_lower_get_next_private_rcu from bond_poll_controller, which
    results in a trace like:

    WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 netdev_lower_get_next_private_rcu+0x34/0x40
    CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1
    Workqueue: bond0 bond_mii_monitor
    RIP: 0010:netdev_lower_get_next_private_rcu+0x34/0x40
    Code: 48 89 fb e8 fe 29 63 ff 85 c0 74 1e 48 8b 45 00 48 81 c3 c0 00 00 00 48 8b 00 48 39 d8 74 0f 48 89 45 00 48 8b 40 f8 5b 5d c3 0b eb de 31 c0 eb f5 0f 1f 40 00 0f 1f 44 00 00 48 8>
    RSP: 0018:ffffc9000087fa68 EFLAGS: 00010046
    RAX: 0000000000000000 RBX: ffff880429614560 RCX: 0000000000000000
    RDX: 0000000000000001 RSI: 00000000ffffffff RDI: ffffffffa184ada0
    RBP: ffffc9000087fa80 R08: 0000000000000001 R09: 0000000000000000
    R10: ffffc9000087f9f0 R11: ffff880429798040 R12: ffff8804289d5980
    R13: ffffffffa1511f60 R14: 00000000000000c8 R15: 00000000ffffffff
    FS: 0000000000000000(0000) GS:ffff88042f880000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f4b78fce180 CR3: 000000018180f006 CR4: 00000000001606e0
    Call Trace:
    bond_poll_controller+0x52/0x170
    netpoll_poll_dev+0x79/0x290
    netpoll_send_skb_on_dev+0x158/0x2c0
    netpoll_send_udp+0x2d5/0x430
    write_ext_msg+0x1e0/0x210
    console_unlock+0x3c4/0x630
    vprintk_emit+0xfa/0x2f0
    printk+0x52/0x6e
    ? __netdev_printk+0x12b/0x220
    netdev_info+0x64/0x80
    ? bond_3ad_set_carrier+0xe9/0x180
    bond_select_active_slave+0x1fc/0x310
    bond_mii_monitor+0x709/0x9b0
    process_one_work+0x221/0x5e0
    worker_thread+0x4f/0x3b0
    kthread+0x100/0x140
    ? process_one_work+0x5e0/0x5e0
    ? kthread_delayed_work_timer_fn+0x90/0x90
    ret_from_fork+0x24/0x30

    We're also doing rcu dereferences a layer up in netpoll_send_skb_on_dev
    before we call down into netpoll_poll_dev, so just take the lock there.

    Suggested-by: Cong Wang
    Signed-off-by: Dave Jones
    Signed-off-by: David S. Miller

    Dave Jones
     

29 Sep, 2018

1 commit

  • Since we do no longer require NAPI drivers to provide
    an ndo_poll_controller(), napi_schedule() has not been done
    before poll_one_napi() invocation.

    So testing NAPI_STATE_SCHED is likely to cause early returns.

    While we are at it, remove outdated comment.

    Note to future bisections : This change might surface prior
    bugs in drivers. See commit 73f21c653f93 ("bnxt_en: Fix TX
    timeout during netpoll.") for one occurrence.

    Fixes: ac3d9dd034e5 ("netpoll: make ndo_poll_controller() optional")
    Signed-off-by: Eric Dumazet
    Tested-by: Song Liu
    Cc: Michael Chan
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Sep, 2018

1 commit

  • As diagnosed by Song Liu, ndo_poll_controller() can
    be very dangerous on loaded hosts, since the cpu
    calling ndo_poll_controller() might steal all NAPI
    contexts (for all RX/TX queues of the NIC). This capture
    can last for unlimited amount of time, since one
    cpu is generally not able to drain all the queues under load.

    It seems that all networking drivers that do use NAPI
    for their TX completions, should not provide a ndo_poll_controller().

    NAPI drivers have netpoll support already handled
    in core networking stack, since netpoll_poll_dev()
    uses poll_napi(dev) to iterate through registered
    NAPI contexts for a device.

    This patch allows netpoll_poll_dev() to process NAPI
    contexts even for drivers not providing ndo_poll_controller(),
    allowing for following patches in NAPI drivers.

    Also we export netpoll_poll_dev() so that it can be called
    by bonding/team drivers in following patches.

    Reported-by: Song Liu
    Signed-off-by: Eric Dumazet
    Tested-by: Song Liu
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Nov, 2017

1 commit

  • Use lockdep to check that IRQs are enabled or disabled as expected. This
    way the sanity check only shows overhead when concurrency correctness
    debug code is enabled.

    Signed-off-by: Frederic Weisbecker
    Acked-by: Thomas Gleixner
    Cc: David S. Miller
    Cc: Lai Jiangshan
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Tejun Heo
    Link: http://lkml.kernel.org/r/1509980490-4285-14-git-send-email-frederic@kernel.org
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

27 Jul, 2017

1 commit

  • Apparently netpoll_setup() assumes that netpoll.dev_name is a pointer
    when checking if the device name is set:

    if (np->dev_name) {
    ...

    However the field is a character array, therefore the condition always
    yields true. Check instead whether the first byte of the array has a
    non-zero value.

    Signed-off-by: Matthias Kaehlcke
    Signed-off-by: David S. Miller

    Matthias Kaehlcke
     

14 Jul, 2017

1 commit

  • When we convert atomic_t to refcount_t, a new kernel warning
    on "increment on 0" is introduced in the netpoll code,
    zap_completion_queue(). In fact for this special case, we know
    the refcount is 0 and we just have to set it to 1 to satisfy
    the following dev_kfree_skb_any(), so we can just use
    refcount_set(..., 1) instead.

    Fixes: 633547973ffc ("net: convert sk_buff.users from atomic_t to refcount_t")
    Reported-by: Dave Jones
    Cc: Reshetova, Elena
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    WANG Cong
     

01 Jul, 2017

2 commits

  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: David S. Miller

    Reshetova, Elena
     

16 Jun, 2017

1 commit

  • It seems like a historic accident that these return unsigned char *,
    and in many places that means casts are required, more often than not.

    Make these functions return void * and remove all the casts across
    the tree, adding a (u8 *) cast only where the unsigned char pointer
    was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    @@
    expression SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - fn(SKB, LEN)[0]
    + *(u8 *)fn(SKB, LEN)

    Note that the last part there converts from push(...)[0] to the
    more idiomatic *(u8 *)push(...).

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

22 Apr, 2017

1 commit

  • Reducing real_num_tx_queues needs to be in sync with skb queue_mapping
    otherwise skbs with queue_mapping greater than real_num_tx_queues
    can be sent to the underlying driver and can result in kernel panic.

    One such event is running netconsole and enabling VF on the same
    device. Or running netconsole and changing number of tx queues via
    ethtool on same device.

    e.g.
    Unable to handle kernel NULL pointer dereference
    tsk->{mm,active_mm}->context = 0000000000001525
    tsk->{mm,active_mm}->pgd = fff800130ff9a000
    \|/ ____ \|/
    "@'/ .. \`@"
    /_| \__/ |_\
    \__U_/
    kworker/48:1(475): Oops [#1]
    CPU: 48 PID: 475 Comm: kworker/48:1 Tainted: G OE
    4.11.0-rc3-davem-net+ #7
    Workqueue: events queue_process
    task: fff80013113299c0 task.stack: fff800131132c000
    TSTATE: 0000004480e01600 TPC: 00000000103f9e3c TNPC: 00000000103f9e40 Y:
    00000000 Tainted: G OE
    TPC:
    g0: 0000000000000000 g1: 0000000000003fff g2: 0000000000000000 g3:
    0000000000000001
    g4: fff80013113299c0 g5: fff8001fa6808000 g6: fff800131132c000 g7:
    00000000000000c0
    o0: fff8001fa760c460 o1: fff8001311329a50 o2: fff8001fa7607504 o3:
    0000000000000003
    o4: fff8001f96e63a40 o5: fff8001311d77ec0 sp: fff800131132f0e1 ret_pc:
    000000000049ed94
    RPC:
    l0: 0000000000000000 l1: 0000000000000800 l2: 0000000000000000 l3:
    0000000000000000
    l4: 000b2aa30e34b10d l5: 0000000000000000 l6: 0000000000000000 l7:
    fff8001fa7605028
    i0: fff80013111a8a00 i1: fff80013155a0780 i2: 0000000000000000 i3:
    0000000000000000
    i4: 0000000000000000 i5: 0000000000100000 i6: fff800131132f1a1 i7:
    00000000103fa4b0
    I7:
    Call Trace:
    [00000000103fa4b0] ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
    [0000000000998c74] netpoll_start_xmit+0xf4/0x200
    [0000000000998e10] queue_process+0x90/0x160
    [0000000000485fa8] process_one_work+0x188/0x480
    [0000000000486410] worker_thread+0x170/0x4c0
    [000000000048c6b8] kthread+0xd8/0x120
    [0000000000406064] ret_from_fork+0x1c/0x2c
    [0000000000000000] (null)
    Disabling lock debugging due to kernel taint
    Caller[00000000103fa4b0]: ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
    Caller[0000000000998c74]: netpoll_start_xmit+0xf4/0x200
    Caller[0000000000998e10]: queue_process+0x90/0x160
    Caller[0000000000485fa8]: process_one_work+0x188/0x480
    Caller[0000000000486410]: worker_thread+0x170/0x4c0
    Caller[000000000048c6b8]: kthread+0xd8/0x120
    Caller[0000000000406064]: ret_from_fork+0x1c/0x2c
    Caller[0000000000000000]: (null)

    Signed-off-by: Tushar Dave
    Signed-off-by: David S. Miller

    Tushar Dave
     

17 Nov, 2016

1 commit

  • Callers of netpoll_poll_lock() own NAPI_STATE_SCHED

    Callers of netpoll_poll_unlock() have BH blocked between
    the NAPI_STATE_SCHED being cleared and poll_lock is released.

    We can avoid the spinlock which has no contention, and use cmpxchg()
    on poll_owner which we need to set anyway.

    This removes a possible lockdep violation after the cited commit,
    since sk_busy_loop() re-enables BH before calling busy_poll_stop()

    Fixes: 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Jul, 2016

1 commit

  • An important information for the napi_poll tracepoint is knowing
    the work done (packets processed) by the napi_poll() call. Add
    both the work done and budget, as they are related.

    Handle trace_napi_poll() param change in dropwatch/drop_monitor
    and in python perf script netdev-times.py in backward compat way,
    as python fortunately supports optional parameter handling.

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     

06 Apr, 2016

1 commit


25 Mar, 2016

1 commit

  • netpoll_setup() does a dev_hold() on np->dev, the netpoll device. If it
    fails, it correctly does a dev_put() but leaves np->dev set. If we call
    netpoll_cleanup() after the failure, np->dev is still set so we do another
    dev_put(), which decrements the refcount an extra time.

    It's questionable to call netpoll_cleanup() after netpoll_setup() fails,
    but it can be difficult to find the problem, and we can easily avoid it in
    this case. The extra decrements can lead to hangs like this:

    unregister_netdevice: waiting for bond0 to become free. Usage count = -3

    Set and clear np->dev at the points where we dev_hold() and dev_put() the
    device.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: David S. Miller

    Bjorn Helgaas
     

30 Sep, 2015

1 commit

  • For some reason we were carrying the budget value around between the
    various calls to napi->poll. If for example one of the drivers called had
    a bug in which it returned a non-zero value for work this could result in
    the budget value becoming negative.

    Rather than carry around a value of budget that is 0 or less we can instead
    just loop through and pass 0 to each napi->poll call. If any driver
    returns a value for work done that is non-zero then we can report that
    driver and continue rather than allowing a bad actor to make the budget
    value negative and pass that negative value to napi->poll.

    Note, the only actual change here is that instead of letting budget become
    negative we are keeping it at 0 regardless of the value returned for work
    since it should not be possible for the polling routine to do any actual
    work with a budget of 0. So if the polling routine returns a non-0 value
    we are just reporting it and continuing with a budget of 0 rather than
    letting that work value be subtracted from the budget of 0.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

24 Sep, 2015

1 commit

  • Drivers might call napi_disable while not holding the napi instance poll_lock.
    In those instances, its possible for a race condition to exist between
    poll_one_napi and napi_disable. That is to say, poll_one_napi only tests the
    NAPI_STATE_SCHED bit to see if there is work to do during a poll, and as such
    the following may happen:

    CPU0 CPU1
    ndo_tx_timeout napi_poll_dev
    napi_disable poll_one_napi
    test_and_set_bit (ret 0)
    test_bit (ret 1)
    reset adapter napi_poll_routine

    If the adapter gets a tx timeout without a napi instance scheduled, its possible
    for the adapter to think it has exclusive access to the hardware (as the napi
    instance is now scheduled via the napi_disable call), while the netpoll code
    thinks there is simply work to do. The result is parallel hardware access
    leading to corrupt data structures in the driver, and a crash.

    Additionaly, there is another, more critical race between netpoll and
    napi_disable. The disabled napi state is actually identical to the scheduled
    state for a given napi instance. The implication being that, if a napi instance
    is disabled, a netconsole instance would see the napi state of the device as
    having been scheduled, and poll it, likely while the driver was dong something
    requiring exclusive access. In the case above, its fairly clear that not having
    the rings in a state ready to be polled will cause any number of crashes.

    The fix should be pretty easy. netpoll uses its own bit to indicate that that
    the napi instance is in a state of being serviced by netpoll (NAPI_STATE_NPSVC).
    We can just gate disabling on that bit as well as the sched bit. That should
    prevent netpoll from conducting a napi poll if we convert its set bit to a
    test_and_set_bit operation to provide mutual exclusion

    Change notes:
    V2)
    Remove a trailing whtiespace
    Resubmit with proper subject prefix

    V3)
    Clean up spacing nits

    Signed-off-by: Neil Horman
    CC: "David S. Miller"
    CC: jmaxwell@redhat.com
    Tested-by: jmaxwell@redhat.com
    Signed-off-by: David S. Miller

    Neil Horman
     

29 Aug, 2015

1 commit


14 Jan, 2015

1 commit


22 Nov, 2014

1 commit