13 Jan, 2021
4 commits
-
[ Upstream commit 4ae2bb81649dc03dfc95875f02126b14b773f7ab ]
Accesses to dev->xps_rxqs_map (when using dev->num_tc) should be
protected by the rtnl lock, like we do for netif_set_xps_queue. I didn't
see an actual bug being triggered, but let's be safe here and take the
rtnl lock while accessing the map in sysfs.Fixes: 8af2c06ff4b1 ("net-sysfs: Add interface for Rx queue(s) map per Tx queue")
Signed-off-by: Antoine Tenart
Reviewed-by: Alexander Duyck
Signed-off-by: Jakub Kicinski
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 2d57b4f142e0b03e854612b8e28978935414bced ]
Two race conditions can be triggered when storing xps rxqs, resulting in
various oops and invalid memory accesses:1. Calling netdev_set_num_tc while netif_set_xps_queue:
- netif_set_xps_queue uses dev->tc_num as one of the parameters to
compute the size of new_dev_maps when allocating it. dev->tc_num is
also used to access the map, and the compiler may generate code to
retrieve this field multiple times in the function.- netdev_set_num_tc sets dev->tc_num.
If new_dev_maps is allocated using dev->tc_num and then dev->tc_num
is set to a higher value through netdev_set_num_tc, later accesses to
new_dev_maps in netif_set_xps_queue could lead to accessing memory
outside of new_dev_maps; triggering an oops.2. Calling netif_set_xps_queue while netdev_set_num_tc is running:
2.1. netdev_set_num_tc starts by resetting the xps queues,
dev->tc_num isn't updated yet.2.2. netif_set_xps_queue is called, setting up the map with the
*old* dev->num_tc.2.3. netdev_set_num_tc updates dev->tc_num.
2.4. Later accesses to the map lead to out of bound accesses and
oops.A similar issue can be found with netdev_reset_tc.
One way of triggering this is to set an iface up (for which the driver
uses netdev_set_num_tc in the open path, such as bnx2x) and writing to
xps_rxqs in a concurrent thread. With the right timing an oops is
triggered.Both issues have the same fix: netif_set_xps_queue, netdev_set_num_tc
and netdev_reset_tc should be mutually exclusive. We do that by taking
the rtnl lock in xps_rxqs_store.Fixes: 8af2c06ff4b1 ("net-sysfs: Add interface for Rx queue(s) map per Tx queue")
Signed-off-by: Antoine Tenart
Reviewed-by: Alexander Duyck
Signed-off-by: Jakub Kicinski
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit fb25038586d0064123e393cadf1fadd70a9df97a ]
Accesses to dev->xps_cpus_map (when using dev->num_tc) should be
protected by the rtnl lock, like we do for netif_set_xps_queue. I didn't
see an actual bug being triggered, but let's be safe here and take the
rtnl lock while accessing the map in sysfs.Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes")
Signed-off-by: Antoine Tenart
Reviewed-by: Alexander Duyck
Signed-off-by: Jakub Kicinski
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit 1ad58225dba3f2f598d2c6daed4323f24547168f ]
Two race conditions can be triggered when storing xps cpus, resulting in
various oops and invalid memory accesses:1. Calling netdev_set_num_tc while netif_set_xps_queue:
- netif_set_xps_queue uses dev->tc_num as one of the parameters to
compute the size of new_dev_maps when allocating it. dev->tc_num is
also used to access the map, and the compiler may generate code to
retrieve this field multiple times in the function.- netdev_set_num_tc sets dev->tc_num.
If new_dev_maps is allocated using dev->tc_num and then dev->tc_num
is set to a higher value through netdev_set_num_tc, later accesses to
new_dev_maps in netif_set_xps_queue could lead to accessing memory
outside of new_dev_maps; triggering an oops.2. Calling netif_set_xps_queue while netdev_set_num_tc is running:
2.1. netdev_set_num_tc starts by resetting the xps queues,
dev->tc_num isn't updated yet.2.2. netif_set_xps_queue is called, setting up the map with the
*old* dev->num_tc.2.3. netdev_set_num_tc updates dev->tc_num.
2.4. Later accesses to the map lead to out of bound accesses and
oops.A similar issue can be found with netdev_reset_tc.
One way of triggering this is to set an iface up (for which the driver
uses netdev_set_num_tc in the open path, such as bnx2x) and writing to
xps_cpus in a concurrent thread. With the right timing an oops is
triggered.Both issues have the same fix: netif_set_xps_queue, netdev_set_num_tc
and netdev_reset_tc should be mutually exclusive. We do that by taking
the rtnl lock in xps_cpus_store.Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes")
Signed-off-by: Antoine Tenart
Reviewed-by: Alexander Duyck
Signed-off-by: Jakub Kicinski
Signed-off-by: Greg Kroah-Hartman
02 Oct, 2020
1 commit
-
Fix follow warnings:
[net/core/net-sysfs.c:1161]: (warning) %u in format string (no. 1)
requires 'unsigned int' but the argument type is 'int'.
[net/core/net-sysfs.c:1162]: (warning) %u in format string (no. 1)
requires 'unsigned int' but the argument type is 'int'.Reported-by: Hulk Robot
Signed-off-by: Ye Bin
Signed-off-by: David S. Miller
13 Aug, 2020
1 commit
-
We must accept an empty mask in store_rps_map(), or we are not able
to disable RPS on a queue.Fixes: 07bbecb34106 ("net: Restrict receive packets queuing to housekeeping CPUs")
Signed-off-by: Eric Dumazet
Reported-by: Maciej Żenczykowski
Cc: Alex Belits
Cc: Nitesh Narayan Lal
Cc: Peter Zijlstra (Intel)
Reviewed-by: Maciej Żenczykowski
Acked-by: Peter Zijlstra (Intel)
Acked-by: Nitesh Narayan Lal
Signed-off-by: David S. Miller
04 Aug, 2020
1 commit
-
Pull scheduler updates from Ingo Molnar:
- Improve uclamp performance by using a static key for the fast path
- Add the "sched_util_clamp_min_rt_default" sysctl, to optimize for
better power efficiency of RT tasks on battery powered devices.
(The default is to maximize performance & reduce RT latencies.)- Improve utime and stime tracking accuracy, which had a fixed boundary
of error, which created larger and larger relative errors as the
values become larger. This is now replaced with more precise
arithmetics, using the new mul_u64_u64_div_u64() helper in math64.h.- Improve the deadline scheduler, such as making it capacity aware
- Improve frequency-invariant scheduling
- Misc cleanups in energy/power aware scheduling
- Add sched_update_nr_running tracepoint to track changes to nr_running
- Documentation additions and updates
- Misc cleanups and smaller fixes
* tag 'sched-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
sched/doc: Factorize bits between sched-energy.rst & sched-capacity.rst
sched/doc: Document capacity aware scheduling
sched: Document arch_scale_*_capacity()
arm, arm64: Fix selection of CONFIG_SCHED_THERMAL_PRESSURE
Documentation/sysctl: Document uclamp sysctl knobs
sched/uclamp: Add a new sysctl to control RT default boost value
sched/uclamp: Fix a deadlock when enabling uclamp static key
sched: Remove duplicated tick_nohz_full_enabled() check
sched: Fix a typo in a comment
sched/uclamp: Remove unnecessary mutex_init()
arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSURE
sched: Cleanup SCHED_THERMAL_PRESSURE kconfig entry
arch_topology, sched/core: Cleanup thermal pressure definition
trace/events/sched.h: fix duplicated word
linux/sched/mm.h: drop duplicated words in comments
smp: Fix a potential usage of stale nr_cpus
sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal
sched: nohz: stop passing around unused "ticks" parameter.
sched: Better document ttwu()
sched: Add a tracepoint to track rq->nr_running
...
22 Jul, 2020
1 commit
-
When I cat 'tx_timeout' by sysfs, it displays as follows. It's better to
add a newline for easy reading.root@syzkaller:~# cat /sys/devices/virtual/net/lo/queues/tx-0/tx_timeout
0root@syzkaller:~#Signed-off-by: Xiongfeng Wang
Signed-off-by: David S. Miller
08 Jul, 2020
1 commit
-
With the existing implementation of store_rps_map(), packets are queued
in the receive path on the backlog queues of other CPUs irrespective of
whether they are isolated or not. This could add a latency overhead to
any RT workload that is running on the same CPU.Ensure that store_rps_map() only uses available housekeeping CPUs for
storing the rps_map.Signed-off-by: Alex Belits
Signed-off-by: Nitesh Narayan Lal
Signed-off-by: Peter Zijlstra (Intel)
Link: https://lkml.kernel.org/r/20200625223443.2684-4-nitesh@redhat.com
16 May, 2020
1 commit
-
The assumption that a device node is associated either with the
netdev's device, or the parent of that device, does not hold for all
drivers. E.g. Freescale's DPAA has two layers of platform devices
above the netdev. Instead, recursively walk up the tree from the
netdev, allowing any parent to match against the sought after node.Signed-off-by: Tobias Waldekranz
Reviewed-by: Florian Fainelli
Signed-off-by: David S. Miller
24 Apr, 2020
2 commits
-
gro_flush_timeout and napi_defer_hard_irqs can be read
from napi_complete_done() while other cpus write the value,
whithout explicit synchronization.Use READ_ONCE()/WRITE_ONCE() to annotate the races.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Back in commit 3b47d30396ba ("net: gro: add a per device gro flush timer")
we added the ability to arm one high resolution timer, that we used
to keep not-complete packets in GRO engine a bit longer, hoping that further
frames might be added to them.Since then, we added the napi_complete_done() interface, and commit
364b6055738b ("net: busy-poll: return busypolling status to drivers")
allowed drivers to avoid re-arming NIC interrupts if we made a promise
that their NAPI poll() handler would be called in the near future.This infrastructure can be leveraged, thanks to a new device parameter,
which allows to arm the napi hrtimer, instead of re-arming the device
hard IRQ.We have noticed that on some servers with 32 RX queues or more, the chit-chat
between the NIC and the host caused by IRQ delivery and re-arming could hurt
throughput by ~20% on 100Gbit NIC.In contrast, hrtimers are using local (percpu) resources and might have lower
cost.The new tunable, named napi_defer_hard_irqs, is placed in the same hierarchy
than gro_flush_timeout (/sys/class/net/ethX/)By default, both gro_flush_timeout and napi_defer_hard_irqs are zero.
This patch does not change the prior behavior of gro_flush_timeout
if used alone : NIC hard irqs should be rearmed as before.One concrete usage can be :
echo 20000 >/sys/class/net/eth1/gro_flush_timeout
echo 10 >/sys/class/net/eth1/napi_defer_hard_irqsIf at least one packet is retired, then we will reset napi counter
to 10 (napi_defer_hard_irqs), ensuring at least 10 periodic scans
of the queue.On busy queues, this should avoid NIC hard IRQ, while before this patch IRQ
avoidance was only possible if napi->poll() was exhausting its budget
and not call napi_complete_done().This feature also can be used to work around some non-optimal NIC irq
coalescing strategies.Having the ability to insert XX usec delays between each napi->poll()
can increase cache efficiency, since we increase batch sizes.It also keeps serving cpus not idle too long, reducing tail latencies.
Co-developed-by: Luigi Rizzo
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
21 Apr, 2020
1 commit
-
Similar to speed, duplex and dorment, report the testing status
in sysfs.Signed-off-by: Andrew Lunn
Reviewed-by: Florian Fainelli
Signed-off-by: David S. Miller
10 Apr, 2020
1 commit
-
The variable ret is being initialized with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King
Signed-off-by: David S. Miller
27 Feb, 2020
2 commits
-
Add a function to change the owner of the queue entries for a network device
when it is moved between network namespaces.Currently, when moving network devices between network namespaces the
ownership of the corresponding queue sysfs entries are not changed. This leads
to problems when tools try to operate on the corresponding sysfs files. Fix
this.Signed-off-by: Christian Brauner
Signed-off-by: David S. Miller -
Add a function to change the owner of a network device when it is moved
between network namespaces.Currently, when moving network devices between network namespaces the
ownership of the corresponding sysfs entries is not changed. This leads
to problems when tools try to operate on the corresponding sysfs files.
This leads to a bug whereby a network device that is created in a
network namespaces owned by a user namespace will have its corresponding
sysfs entry owned by the root user of the corresponding user namespace.
If such a network device has to be moved back to the host network
namespace the permissions will still be set to the user namespaces. This
means unprivileged users can e.g. trigger uevents for such incorrectly
owned devices. They can also modify the settings of the device itself.
Both of these things are unwanted.For example, workloads will create network devices in the host network
namespace. Other tools will then proceed to move such devices between
network namespaces owner by other user namespaces. While the ownership
of the device itself is updated in
net/core/net-sysfs.c:dev_change_net_namespace() the corresponding sysfs
entry for the device is not:drwxr-xr-x 5 nobody nobody 0 Jan 25 18:08 .
drwxr-xr-x 9 nobody nobody 0 Jan 25 18:08 ..
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_assign_type
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_len
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 address
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 broadcast
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_changes
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_down_count
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_up_count
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_id
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_port
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dormant
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 duplex
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 flags
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 gro_flush_timeout
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifalias
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifindex
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 iflink
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 link_mode
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 mtu
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 name_assign_type
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 netdev_group
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 operstate
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_id
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_name
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_switch_id
drwxr-xr-x 2 nobody nobody 0 Jan 25 18:09 power
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 proto_down
drwxr-xr-x 4 nobody nobody 0 Jan 25 18:09 queues
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 speed
drwxr-xr-x 2 nobody nobody 0 Jan 25 18:09 statistics
lrwxrwxrwx 1 nobody nobody 0 Jan 25 18:08 subsystem -> ../../../../class/net
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 tx_queue_len
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 type
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:08 ueventHowever, if a device is created directly in the network namespace then
the device's sysfs permissions will be correctly updated:drwxr-xr-x 5 root root 0 Jan 25 18:12 .
drwxr-xr-x 9 nobody nobody 0 Jan 25 18:08 ..
-r--r--r-- 1 root root 4096 Jan 25 18:12 addr_assign_type
-r--r--r-- 1 root root 4096 Jan 25 18:12 addr_len
-r--r--r-- 1 root root 4096 Jan 25 18:12 address
-r--r--r-- 1 root root 4096 Jan 25 18:12 broadcast
-rw-r--r-- 1 root root 4096 Jan 25 18:12 carrier
-r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_changes
-r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_down_count
-r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_up_count
-r--r--r-- 1 root root 4096 Jan 25 18:12 dev_id
-r--r--r-- 1 root root 4096 Jan 25 18:12 dev_port
-r--r--r-- 1 root root 4096 Jan 25 18:12 dormant
-r--r--r-- 1 root root 4096 Jan 25 18:12 duplex
-rw-r--r-- 1 root root 4096 Jan 25 18:12 flags
-rw-r--r-- 1 root root 4096 Jan 25 18:12 gro_flush_timeout
-rw-r--r-- 1 root root 4096 Jan 25 18:12 ifalias
-r--r--r-- 1 root root 4096 Jan 25 18:12 ifindex
-r--r--r-- 1 root root 4096 Jan 25 18:12 iflink
-r--r--r-- 1 root root 4096 Jan 25 18:12 link_mode
-rw-r--r-- 1 root root 4096 Jan 25 18:12 mtu
-r--r--r-- 1 root root 4096 Jan 25 18:12 name_assign_type
-rw-r--r-- 1 root root 4096 Jan 25 18:12 netdev_group
-r--r--r-- 1 root root 4096 Jan 25 18:12 operstate
-r--r--r-- 1 root root 4096 Jan 25 18:12 phys_port_id
-r--r--r-- 1 root root 4096 Jan 25 18:12 phys_port_name
-r--r--r-- 1 root root 4096 Jan 25 18:12 phys_switch_id
drwxr-xr-x 2 root root 0 Jan 25 18:12 power
-rw-r--r-- 1 root root 4096 Jan 25 18:12 proto_down
drwxr-xr-x 4 root root 0 Jan 25 18:12 queues
-r--r--r-- 1 root root 4096 Jan 25 18:12 speed
drwxr-xr-x 2 root root 0 Jan 25 18:12 statistics
lrwxrwxrwx 1 nobody nobody 0 Jan 25 18:12 subsystem -> ../../../../class/net
-rw-r--r-- 1 root root 4096 Jan 25 18:12 tx_queue_len
-r--r--r-- 1 root root 4096 Jan 25 18:12 type
-rw-r--r-- 1 root root 4096 Jan 25 18:12 ueventNow, when creating a network device in a network namespace owned by a
user namespace and moving it to the host the permissions will be set to
the id that the user namespace root user has been mapped to on the host
leading to all sorts of permission issues:458752
drwxr-xr-x 5 458752 458752 0 Jan 25 18:12 .
drwxr-xr-x 9 root root 0 Jan 25 18:08 ..
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 addr_assign_type
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 addr_len
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 address
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 broadcast
-rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_changes
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_down_count
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_up_count
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dev_id
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dev_port
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dormant
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 duplex
-rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 flags
-rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 gro_flush_timeout
-rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 ifalias
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 ifindex
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 iflink
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 link_mode
-rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 mtu
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 name_assign_type
-rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 netdev_group
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 operstate
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_port_id
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_port_name
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_switch_id
drwxr-xr-x 2 458752 458752 0 Jan 25 18:12 power
-rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 proto_down
drwxr-xr-x 4 458752 458752 0 Jan 25 18:12 queues
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 speed
drwxr-xr-x 2 458752 458752 0 Jan 25 18:12 statistics
lrwxrwxrwx 1 root root 0 Jan 25 18:12 subsystem -> ../../../../class/net
-rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 tx_queue_len
-r--r--r-- 1 458752 458752 4096 Jan 25 18:12 type
-rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 ueventSigned-off-by: Christian Brauner
Signed-off-by: David S. Miller
18 Dec, 2019
1 commit
-
Dev_hold has to be called always in rx_queue_add_kobject.
Otherwise usage count drops below 0 in case of failure in
kobject_init_and_add.Fixes: b8eb718348b8 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject")
Reported-by: syzbot
Cc: Tetsuo Handa
Cc: David Miller
Cc: Lukas Bulwahn
Signed-off-by: Jouni Hogander
Signed-off-by: David S. Miller
07 Dec, 2019
1 commit
-
Dev_hold has to be called always in netdev_queue_add_kobject.
Otherwise usage count drops below 0 in case of failure in
kobject_init_and_add.Fixes: b8eb718348b8 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject")
Reported-by: Hulk Robot
Cc: Tetsuo Handa
Cc: David Miller
Cc: Lukas Bulwahn
Signed-off-by: David S. Miller
21 Nov, 2019
2 commits
-
kobject_put() should only be called in error path.
Fixes: b8eb718348b8 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject")
Signed-off-by: Eric Dumazet
Cc: Jouni Hogander
Signed-off-by: David S. Miller -
kobject_init_and_add takes reference even when it fails. This has
to be given up by the caller in error handling. Otherwise memory
allocated by kobject_init_and_add is never freed. Originally found
by Syzkaller:BUG: memory leak
unreferenced object 0xffff8880679f8b08 (size 8):
comm "netdev_register", pid 269, jiffies 4294693094 (age 12.132s)
hex dump (first 8 bytes):
72 78 2d 30 00 36 20 d4 rx-0.6 .
backtrace:
[] __kmalloc_track_caller+0x16e/0x290
[] kvasprintf+0xb1/0x140
[] kvasprintf_const+0x56/0x160
[] kobject_set_name_vargs+0x5b/0x140
[] kobject_init_and_add+0xd8/0x170
[] net_rx_queue_update_kobjects+0x152/0x560
[] netdev_register_kobject+0x210/0x380
[] register_netdevice+0xa1b/0xf00
[] __tun_chr_ioctl+0x20d5/0x3dd0
[] tun_chr_ioctl+0x2f/0x40
[] do_vfs_ioctl+0x1c7/0x1510
[] ksys_ioctl+0x99/0xb0
[] __x64_sys_ioctl+0x78/0xb0
[] do_syscall_64+0x16f/0x580
[] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[] 0xffffffffffffffffCc: David Miller
Cc: Lukas Bulwahn
Signed-off-by: Jouni Hogander
Signed-off-by: David S. Miller
31 May, 2019
1 commit
-
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later versionextracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 3029 file(s).
Signed-off-by: Thomas Gleixner
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman
08 May, 2019
2 commits
-
Pull networking updates from David Miller:
"Highlights:1) Support AES128-CCM ciphers in kTLS, from Vakul Garg.
2) Add fib_sync_mem to control the amount of dirty memory we allow to
queue up between synchronize RCU calls, from David Ahern.3) Make flow classifier more lockless, from Vlad Buslov.
4) Add PHY downshift support to aquantia driver, from Heiner
Kallweit.5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces
contention on SLAB spinlocks in heavy RPC workloads.6) Partial GSO offload support in XFRM, from Boris Pismenny.
7) Add fast link down support to ethtool, from Heiner Kallweit.
8) Use siphash for IP ID generator, from Eric Dumazet.
9) Pull nexthops even further out from ipv4/ipv6 routes and FIB
entries, from David Ahern.10) Move skb->xmit_more into a per-cpu variable, from Florian
Westphal.11) Improve eBPF verifier speed and increase maximum program size,
from Alexei Starovoitov.12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit
spinlocks. From Neil Brown.13) Allow tunneling with GUE encap in ipvs, from Jacky Hu.
14) Improve link partner cap detection in generic PHY code, from
Heiner Kallweit.15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan
Maguire.16) Remove SKB list implementation assumptions in SCTP, your's truly.
17) Various cleanups, optimizations, and simplifications in r8169
driver. From Heiner Kallweit.18) Add memory accounting on TX and RX path of SCTP, from Xin Long.
19) Switch PHY drivers over to use dynamic featue detection, from
Heiner Kallweit.20) Support flow steering without masking in dpaa2-eth, from Ioana
Ciocoi.21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri
Pirko.22) Increase the strict parsing of current and future netlink
attributes, also export such policies to userspace. From Johannes
Berg.23) Allow DSA tag drivers to be modular, from Andrew Lunn.
24) Remove legacy DSA probing support, also from Andrew Lunn.
25) Allow ll_temac driver to be used on non-x86 platforms, from Esben
Haabendal.26) Add a generic tracepoint for TX queue timeouts to ease debugging,
from Cong Wang.27) More indirect call optimizations, from Paolo Abeni"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits)
cxgb4: Fix error path in cxgb4_init_module
net: phy: improve pause mode reporting in phy_print_status
dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings
net: macb: Change interrupt and napi enable order in open
net: ll_temac: Improve error message on error IRQ
net/sched: remove block pointer from common offload structure
net: ethernet: support of_get_mac_address new ERR_PTR error
net: usb: smsc: fix warning reported by kbuild test robot
staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check
net: dsa: support of_get_mac_address new ERR_PTR error
net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats
vrf: sit mtu should not be updated when vrf netdev is the link
net: dsa: Fix error cleanup path in dsa_init_module
l2tp: Fix possible NULL pointer dereference
taprio: add null check on sched_nest to avoid potential null pointer dereference
net: mvpp2: cls: fix less than zero check on a u32 variable
net_sched: sch_fq: handle non connected flows
net_sched: sch_fq: do not assume EDT packets are ordered
net: hns3: use devm_kcalloc when allocating desc_cb
net: hns3: some cleanup for struct hns3_enet_ring
... -
Pull driver core/kobject updates from Greg KH:
"Here is the "big" set of driver core patches for 5.2-rc1There are a number of ACPI patches in here as well, as Rafael said
they should go through this tree due to the driver core changes they
required. They have all been acked by the ACPI developers.There are also a number of small subsystem-specific changes in here,
due to some changes to the kobject core code. Those too have all been
acked by the various subsystem maintainers.As for content, it's pretty boring outside of the ACPI changes:
- spdx cleanups
- kobject documentation updates
- default attribute groups for kobjects
- other minor kobject/driver core fixesAll have been in linux-next for a while with no reported issues"
* tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (47 commits)
kobject: clean up the kobject add documentation a bit more
kobject: Fix kernel-doc comment first line
kobject: Remove docstring reference to kset
firmware_loader: Fix a typo ("syfs" -> "sysfs")
kobject: fix dereference before null check on kobj
Revert "driver core: platform: Fix the usage of platform device name(pdev->name)"
init/config: Do not select BUILD_BIN2C for IKCONFIG
Provide in-kernel headers to make extending kernel easier
kobject: Improve doc clarity kobject_init_and_add()
kobject: Improve docs for kobject_add/del
driver core: platform: Fix the usage of platform device name(pdev->name)
livepatch: Replace klp_ktype_patch's default_attrs with groups
cpufreq: schedutil: Replace default_attrs field with groups
padata: Replace padata_attr_type default_attrs field with groups
irqdesc: Replace irq_kobj_type's default_attrs field with groups
net-sysfs: Replace ktype default_attrs field with groups
block: Replace all ktype default_attrs with groups
samples/kobject: Replace foo_ktype's default_attrs field with groups
kobject: Add support for default attribute groups to kobj_type
driver core: Postpone DMA tear-down until after devres release for probe failure
...
26 Apr, 2019
1 commit
-
The kobj_type default_attrs field is being replaced by the
default_groups field. Replace the default_attrs fields in rx_queue_ktype
and netdev_queue_ktype with default_groups. Use the ATTRIBUTE_GROUPS
macro to create rx_queue_default_groups and netdev_queue_default_groups.This patch was tested by verifying that the sysfs files for the
attributes in the default groups were created.Signed-off-by: Kimberly Brown
Signed-off-by: Greg Kroah-Hartman
18 Apr, 2019
1 commit
-
Conflict resolution of af_smc.c from Stephen Rothwell.
Signed-off-by: David S. Miller
16 Apr, 2019
1 commit
-
This reverts commit 6b70fc94afd165342876e53fc4b2f7d085009945.
The reverted bugfix will cause another issue.
Reported by syzbot+6024817a931b2830bc93@syzkaller.appspotmail.com.
See https://syzkaller.appspot.com/x/log.txt?x=1737671b200000 for
details.Signed-off-by: Wang Hai
Acked-by: Andy Shevchenko
Signed-off-by: David S. Miller
28 Mar, 2019
1 commit
24 Mar, 2019
1 commit
-
We prefer static_branch_unlikely() over static_key_false() these days.
Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller
22 Mar, 2019
1 commit
-
When registering struct net_device, it will call
register_netdevice ->
netdev_register_kobject ->
device_initialize(dev);
dev_set_name(dev, "%s", ndev->name)
device_add(dev)
register_queue_kobjects(ndev)In netdev_register_kobject(), if device_add(dev) or
register_queue_kobjects(ndev) failed. Register_netdevice()
will return error, causing netdev_freemem(ndev) to be
called to free net_device, however put_device(&dev->dev)->..->
kobject_cleanup() won't be called, resulting in a memory leak.syzkaller report this:
BUG: memory leak
unreferenced object 0xffff8881f4fad168 (size 8):
comm "syz-executor.0", pid 3575, jiffies 4294778002 (age 20.134s)
hex dump (first 8 bytes):
77 70 61 6e 30 00 ff ff wpan0...
backtrace:
[] kstrdup_const+0x3d/0x50 mm/util.c:73
[] kvasprintf_const+0x112/0x170 lib/kasprintf.c:48
[] kobject_set_name_vargs+0x55/0x130 lib/kobject.c:281
[] dev_set_name+0xbb/0xf0 drivers/base/core.c:1915
[] netdev_register_kobject+0xc0/0x410 net/core/net-sysfs.c:1727
[] register_netdevice+0xa51/0xeb0 net/core/dev.c:8711
[] cfg802154_update_iface_num.isra.2+0x13/0x90 [ieee802154]
[] ieee802154_llsec_fill_key_id+0x1d5/0x570 [ieee802154]
[] 0xffffffffc1500e0e
[] platform_drv_probe+0xc6/0x180 drivers/base/platform.c:614
[] really_probe+0x491/0x7c0 drivers/base/dd.c:509
[] driver_probe_device+0xdc/0x240 drivers/base/dd.c:671
[] device_driver_attach+0xf2/0x130 drivers/base/dd.c:945
[] __driver_attach+0x10e/0x210 drivers/base/dd.c:1022
[] bus_for_each_dev+0x154/0x1e0 drivers/base/bus.c:304
[] bus_add_driver+0x427/0x5e0 drivers/base/bus.c:645Reported-by: Hulk Robot
Fixes: 1fa5ae857bb1 ("driver core: get rid of struct device's bus_id string array")
Signed-off-by: Wang Hai
Reviewed-by: Andy Shevchenko
Reviewed-by: Stephen Hemminger
Signed-off-by: David S. Miller
20 Mar, 2019
1 commit
-
In netdev_queue_add_kobject and rx_queue_add_kobject,
if sysfs_create_group failed, kobject_put will call
netdev_queue_release to decrease dev refcont, however
dev_hold has not be called. So we will see this while
unregistering dev:unregister_netdevice: waiting for bcsh0 to become free. Usage count = -1
Reported-by: Hulk Robot
Fixes: d0d668371679 ("net: don't decrement kobj reference count on init failure")
Signed-off-by: YueHaibing
Signed-off-by: David S. Miller
05 Mar, 2019
2 commits
-
Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.Signed-off-by: Andy Shevchenko
Signed-off-by: David S. Miller
04 Mar, 2019
1 commit
-
syzkaller report this:
BUG: memory leak
unreferenced object 0xffff88837a71a500 (size 256):
comm "syz-executor.2", pid 9770, jiffies 4297825125 (age 17.843s)
hex dump (first 32 bytes):
00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N..........
ff ff ff ff ff ff ff ff 20 c0 ef 86 ff ff ff ff ........ .......
backtrace:
[] netdev_register_kobject+0x124/0x2e0 net/core/net-sysfs.c:1751
[] register_netdevice+0xcc1/0x1270 net/core/dev.c:8516
[] tun_set_iff drivers/net/tun.c:2649 [inline]
[] __tun_chr_ioctl+0x2218/0x3d20 drivers/net/tun.c:2883
[] vfs_ioctl fs/ioctl.c:46 [inline]
[] do_vfs_ioctl+0x1a5/0x10e0 fs/ioctl.c:690
[] ksys_ioctl+0x89/0xa0 fs/ioctl.c:705
[] __do_sys_ioctl fs/ioctl.c:712 [inline]
[] __se_sys_ioctl fs/ioctl.c:710 [inline]
[] __x64_sys_ioctl+0x74/0xb0 fs/ioctl.c:710
[] do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
[] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[] 0xffffffffffffffffIt should call kset_unregister to free 'dev->queues_kset'
in error path of register_queue_kobjects, otherwise will cause a mem leak.Reported-by: Hulk Robot
Fixes: 1d24eb4815d1 ("xps: Transmit Packet Steering")
Signed-off-by: YueHaibing
Signed-off-by: David S. Miller
07 Feb, 2019
2 commits
-
Now that we have a dedicated NDO for getting a port's parent ID, get rid
of SWITCHDEV_ATTR_ID_PORT_PARENT_ID and convert all callers to use the
NDO exclusively. This is a preliminary change to getting rid of
switchdev_ops eventually.Signed-off-by: Florian Fainelli
Reviewed-by: Ido Schimmel
Signed-off-by: David S. Miller -
In preparation for getting rid of switchdev_ops, create a dedicated NDO
operation for getting the port's parent identifier. There are
essentially two classes of drivers that need to implement getting the
port's parent ID which are VF/PF drivers with a built-in switch, and
pure switchdev drivers such as mlxsw, ocelot, dsa etc.We introduce a helper function: dev_get_port_parent_id() which supports
recursion into the lower devices to obtain the first port's parent ID.Convert the bridge, core and ipv4 multicast routing code to check for
such ndo_get_port_parent_id() and call the helper function when valid
before falling back to switchdev_port_attr_get(). This will allow us to
convert all relevant drivers in one go instead of having to implement
both switchdev_port_attr_get() and ndo_get_port_parent_id() operations,
then get rid of switchdev_port_attr_get().Acked-by: Jiri Pirko
Signed-off-by: Florian Fainelli
Reviewed-by: Ido Schimmel
Signed-off-by: David S. Miller
07 Dec, 2018
1 commit
-
In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. One prominent API through which the notification is
invoked is dev_change_flags().Therefore extend dev_change_flags() with and extra extack argument and
update all users. Most of the calls end up just encoding NULL, but
several sites (VLAN, ipvlan, VRF, rtnetlink) do have extack available.Since the function declaration line is changed anyway, name the other
function arguments to placate checkpatch.Signed-off-by: Petr Machata
Acked-by: Jiri Pirko
Reviewed-by: Ido Schimmel
Reviewed-by: David Ahern
Signed-off-by: David S. Miller
10 Aug, 2018
1 commit
-
The definition of static_key_slow_inc() has cpus_read_lock in place. In the
virtio_net driver, XPS queues are initialized after setting the queue:cpu
affinity in virtnet_set_affinity() which is already protected within
cpus_read_lock. Lockdep prints a warning when we are trying to acquire
cpus_read_lock when it is already held.This patch adds an ability to call __netif_set_xps_queue under
cpus_read_lock().
Acked-by: Jason Wang============================================
WARNING: possible recursive locking detected
4.18.0-rc3-next-20180703+ #1 Not tainted
--------------------------------------------
swapper/0/1 is trying to acquire lock:
00000000cf973d46 (cpu_hotplug_lock.rw_sem){++++}, at: static_key_slow_inc+0xe/0x20but task is already holding lock:
00000000cf973d46 (cpu_hotplug_lock.rw_sem){++++}, at: init_vqs+0x513/0x5a0other info that might help us debug this:
Possible unsafe locking scenario:CPU0
----
lock(cpu_hotplug_lock.rw_sem);
lock(cpu_hotplug_lock.rw_sem);*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by swapper/0/1:
#0: 00000000244bc7da (&dev->mutex){....}, at: __driver_attach+0x5a/0x110
#1: 00000000cf973d46 (cpu_hotplug_lock.rw_sem){++++}, at: init_vqs+0x513/0x5a0
#2: 000000005cd8463f (xps_map_mutex){+.+.}, at: __netif_set_xps_queue+0x8d/0xc60v2: move cpus_read_lock() out of __netif_set_xps_queue()
Cc: "Nambiar, Amritha"
Cc: "Michael S. Tsirkin"
Cc: Jason Wang
Fixes: 8af2c06ff4b1 ("net-sysfs: Add interface for Rx queue(s) map per Tx queue")Signed-off-by: Andrei Vagin
Signed-off-by: David S. Miller
21 Jul, 2018
3 commits
-
Make net_ns_get_ownership() reusable by networking code outside of core.
This is useful, for example, to allow bridge related sysfs files to be
owned by container root.Add a function comment since this is a potentially dangerous function to
use given the way that kobject_get_ownership() works by initializing uid
and gid before calling .get_ownership().Signed-off-by: Tyler Hicks
Signed-off-by: David S. Miller -
When creating various objects in /sys/class/net/... make sure that they
belong to container's owner instead of global root (if they belong to a
container/namespace).Co-Developed-by: Tyler Hicks
Signed-off-by: Dmitry Torokhov
Signed-off-by: Tyler Hicks
Signed-off-by: David S. Miller -
An upcoming change will allow container root to open some /sys/class/net
files for writing. The tx_maxrate attribute can result in changes
to actual hardware devices so err on the side of caution by requiring
CAP_NET_ADMIN in the init namespace in the corresponding attribute store
operation.Signed-off-by: Tyler Hicks
Signed-off-by: David S. Miller