31 Mar, 2020
6 commits
-
The previous patch allowed device drivers to publish their default
binding between packet trap policers and packet trap groups. However,
some users might not be content with this binding and would like to
change it.In case user space passed a packet trap policer identifier when setting
a packet trap group, invoke the appropriate device driver callback and
pass the new policer identifier.v2:
* Check for presence of 'DEVLINK_ATTR_TRAP_POLICER_ID' in
devlink_trap_group_set() and bail if not present
* Add extack error message in case trap group was partially modifiedSigned-off-by: Ido Schimmel
Reviewed-by: Jiri Pirko
Acked-by: Jakub Kicinski
Signed-off-by: David S. Miller -
Packet trap groups are used to aggregate logically related packet traps.
Currently, these groups allow user space to batch operations such as
setting the trap action of all member traps.In order to prevent the CPU from being overwhelmed by too many trapped
packets, it is desirable to bind a packet trap policer to these groups.
For example, to limit all the packets that encountered an exception
during routing to 10Kpps.Allow device drivers to bind default packet trap policers to packet trap
groups when the latter are registered with devlink.The next patch will enable user space to change this default binding.
Signed-off-by: Ido Schimmel
Reviewed-by: Jiri Pirko
Reviewed-by: Jakub Kicinski
Signed-off-by: David S. Miller -
Devices capable of offloading the kernel's datapath and perform
functions such as bridging and routing must also be able to send (trap)
specific packets to the kernel (i.e., the CPU) for processing.For example, a device acting as a multicast-aware bridge must be able to
trap IGMP membership reports to the kernel for processing by the bridge
module.In most cases, the underlying device is capable of handling packet rates
that are several orders of magnitude higher compared to those that can
be handled by the CPU.Therefore, in order to prevent the underlying device from overwhelming
the CPU, devices usually include packet trap policers that are able to
police the trapped packets to rates that can be handled by the CPU.This patch allows capable device drivers to register their supported
packet trap policers with devlink. User space can then tune the
parameters of these policer (currently, rate and burst size) and read
from the device the number of packets that were dropped by the policer,
if supported.Subsequent patches in the series will allow device drivers to create
default binding between these policers and packet trap groups and allow
user space to change the binding.v2:
* Add 'strict_start_type' in devlink policy
* Have device drivers provide max/min rate/burst size for each policer.
Use them to check validity of user provided parametersSigned-off-by: Ido Schimmel
Reviewed-by: Jiri Pirko
Reviewed-by: Jakub Kicinski
Signed-off-by: David S. Miller -
On low memory system, run time dumps can consume too much memory. Add
administrator ability to disable auto dumps per reporter as part of the
error flow handle routine.This attribute is not relevant while executing
DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET.By default, auto dump is activated for any reporter that has a dump method,
as part of the reporter registration to devlink.Signed-off-by: Eran Ben Elisha
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller -
When health reporter is registered to devlink, devlink will implicitly set
auto recover if and only if the reporter has a recover method. No reason
to explicitly get the auto recover flag from the driver.Remove this flag from all drivers that called
devlink_health_reporter_create.All existing health reporters set auto recovery to true if they have a
recover method.Yet, administrator can unset auto recover via netlink command as prior to
this patch.Signed-off-by: Eran Ben Elisha
Reviewed-by: Jiri Pirko
Reviewed-by: Jakub Kicinski
Signed-off-by: David S. Miller -
The rest of the devlink code sets the extack message using
NL_SET_ERR_MSG_MOD. Change the existing appearances of NL_SET_ERR_MSG
to NL_SET_ERR_MSG_MOD.Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller
27 Mar, 2020
9 commits
-
Implement support for the DEVLINK_CMD_REGION_NEW command for creating
snapshots. This new command parallels the existing
DEVLINK_CMD_REGION_DEL.In order for DEVLINK_CMD_REGION_NEW to work for a region, the new
".snapshot" operation must be implemented in the region's ops structure.The desired snapshot id must be provided. This helps avoid confusion on
the purpose of DEVLINK_CMD_REGION_NEW, and keeps the API simpler.The requested id will be inserted into the xarray tracking the number of
snapshots using each id. If this id is already used by another snapshot
on any region, an error will be returned.Signed-off-by: Jacob Keller
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller -
Each snapshot created for a devlink region must have an id. These ids
are supposed to be unique per "event" that caused the snapshot to be
created. Drivers call devlink_region_snapshot_id_get to obtain a new id
to use for a new event trigger. The id values are tracked per devlink,
so that the same id number can be used if a triggering event creates
multiple snapshots on different regions.There is no mechanism for snapshot ids to ever be reused. Introduce an
xarray to store the count of how many snapshots are using a given id,
replacing the snapshot_id field previously used for picking the next id.The devlink_region_snapshot_id_get() function will use xa_alloc to
insert an initial value of 1 value at an available slot between 0 and
U32_MAX.The new __devlink_snapshot_id_increment() and
__devlink_snapshot_id_decrement() functions will be used to track how
many snapshots currently use an id.Drivers must now call devlink_snapshot_id_put() in order to release
their reference of the snapshot id after adding region snapshots.By tracking the total number of snapshots using a given id, it is
possible for the decrement() function to erase the id from the xarray
when it is not in use.With this method, a snapshot id can become reused again once all
snapshots that referred to it have been deleted via
DEVLINK_CMD_REGION_DEL, and the driver has finished adding snapshots.This work also paves the way to introduce a mechanism for userspace to
request a snapshot.Signed-off-by: Jacob Keller
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller -
The devlink_snapshot_id_get() function returns a snapshot id. The
snapshot id is a u32, so there is no way to indicate an error code.A future change is going to possibly add additional cases where this
function could fail. Refactor the function to return the snapshot id in
an argument, so that it can return zero or an error value.This ensures that snapshot ids cannot be confused with error values, and
aids in the future refactor of snapshot id allocation management.Because there is no current way to release previously used snapshot ids,
add a simple check ensuring that an error is reported in case the
snapshot_id would over flow.Signed-off-by: Jacob Keller
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller -
A future change is going to implement a new devlink command to request
a snapshot on demand. As part of this, the logic for handling the
snapshot ids will be refactored. To simplify the snapshot id allocation
function, move it to a separate function prefixed by `__`. This helper
function will assume the lock is held.While no other callers will exist, it simplifies refactoring the logic
because there is no need to complicate the function with gotos to handle
unlocking on failure.Signed-off-by: Jacob Keller
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller -
The devlink_region_snapshot_create function returns -ENOMEM when the
maximum number of snapshots has been reached. This is confusing because
it is not an issue of being out of memory. Change this to use -ENOSPC
instead.Reported-by: Jiri Pirko
Signed-off-by: Jacob Keller
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller -
A future change is going to add a new devlink command to request
a snapshot on demand. This function will want to call the
devlink_region_snapshot_create function while already holding the
devlink instance lock.Extract the logic of this function into a static function prefixed by
`__` to indicate that it is an internal helper function. Modify the
original function to be implemented in terms of the new locked
function.Signed-off-by: Jacob Keller
Reviewed-by: Jiri Pirko
Reviewed-by: Jakub Kicinski
Signed-off-by: David S. Miller -
The function documentation comment for devlink_region_snapshot_create
included a literal tab character between 'future analyses' that was
difficult to spot as it happened to only display as one space wide.Fix the comment to use a space here instead of a stray tab appearing in
the middle of a sentence.Signed-off-by: Jacob Keller
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller -
It does not makes sense that two snapshots for a given region would use
different destructors. Simplify snapshot creation by adding
a .destructor op for regions.This operation will replace the data_destructor for the snapshot
creation, and makes snapshot creation easier.Noticed-by: Jakub Kicinski
Signed-off-by: Jacob Keller
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller -
Modify the devlink region code in preparation for adding new operations
on regions.Create a devlink_region_ops structure, and move the name pointer from
within the devlink_region structure into the ops structure (similar to
the devlink_health_reporter_ops).This prepares the regions to enable support of additional operations in
the future such as requesting snapshots, or accessing the region
directly without a snapshot.In order to re-use the constant strings in the mlx4 driver their
declaration must be changed to 'const char * const' to ensure the
compiler realizes that both the data and the pointer cannot change.Signed-off-by: Jacob Keller
Reviewed-by: Jakub Kicinski
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller
26 Mar, 2020
1 commit
-
devlink_nl_cmd_eswitch_set_doit() doesn't hold devlink->lock mutex while
invoking driver callback. This is likely due to eswitch mode setting
involves adding/remove devlink ports, health reporters or
other devlink objects for a devlink device.So it is driver responsiblity to ensure thread safe eswitch state
transition happening via either sriov legacy enablement or via devlink
eswitch set callback.Therefore, get() callback should also be invoked without holding
devlink->lock mutex.
Vendor driver can use same internal lock which it uses during eswitch
mode set() callback.
This makes get() and set() implimentation symmetric in devlink core and
in vendor drivers.Hence, remove holding devlink->lock mutex during eswitch get() callback.
Failing to do so results into below deadlock scenario when mlx5_core
driver is improved to handle eswitch mode set critical section invoked
by devlink and sriov sysfs interface in subsequent patch.devlink_nl_cmd_eswitch_set_doit()
mlx5_eswitch_mode_set()
mutex_lock(esw->mode_lock) lock); lock); mode_lock)
Reviewed-by: Mark Bloch
Signed-off-by: Parav Pandit
Signed-off-by: Saeed Mahameed
24 Mar, 2020
3 commits
-
Packet trap groups are now explicitly registered by drivers and not
implicitly registered when the packet traps are registered. Therefore,
there is no need to encode entire group structure the trap is associated
with inside the trap structure.Instead, only pass the group identifier. Refer to it as initial group
identifier, as future patches will allow user space to move traps
between groups.Signed-off-by: Ido Schimmel
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller -
Now that drivers explicitly register their supported packet trap groups
there is no for devlink to create them on-demand and destroy them when
their reference count reaches zero.Signed-off-by: Ido Schimmel
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller -
Currently, packet trap groups are implicitly registered by drivers upon
packet trap registration. When the traps are registered, each is
associated with a group and the group is created by devlink, if it does
not exist already.This makes it difficult for drivers to pass additional attributes for
the groups.Therefore, as a preparation for future patches that require passing
additional group attributes, add an API to explicitly register /
unregister these groups.Next patches will convert existing drivers to use this API.
Signed-off-by: Ido Schimmel
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller
13 Mar, 2020
1 commit
-
Minor overlapping changes, nothing serious.
Signed-off-by: David S. Miller
04 Mar, 2020
3 commits
-
Currently mlx5 PCI PF and VF devlink devices register their ports as
physical port in non-representors mode.Introduce a new port flavour as virtual so that virtual devices can
register 'virtual' flavour to make it more clear to users.An example of one PCI PF and 2 PCI virtual functions, each having
one devlink port.$ devlink port show
pci/0000:06:00.0/1: type eth netdev ens2f0 flavour physical port 0
pci/0000:06:00.2/1: type eth netdev ens2f2 flavour virtual port 0
pci/0000:06:00.3/1: type eth netdev ens2f3 flavour virtual port 0Reviewed-by: Jiri Pirko
Signed-off-by: Parav Pandit
Acked-by: Jakub Kicinski
Signed-off-by: David S. Miller -
DEVLINK_ATTR_REGION_CHUNK_ADDR and DEVLINK_ATTR_REGION_CHUNK_LEN
lack entries in the netlink policy. Corresponding nla_get_u64()s
may read beyond the end of the message.Fixes: 4e54795a27f5 ("devlink: Add support for region snapshot read command")
Signed-off-by: Jakub Kicinski
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller -
DEVLINK_ATTR_PARAM_VALUE_DATA may have different types
so it's not checked by the normal netlink policy. Make
sure the attribute length is what we expect.Fixes: e3b7ca18ad7b ("devlink: Add param set command")
Signed-off-by: Jakub Kicinski
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller
29 Feb, 2020
1 commit
-
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:struct foo {
int stuff;
struct boo array[];
};By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.Also, notice that, dynamic memory allocations won't be affected by
this change:"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")Signed-off-by: Gustavo A. R. Silva
Signed-off-by: David S. Miller
28 Feb, 2020
1 commit
-
The mptcp conflict was overlapping additions.
The SMC conflict was an additional and removal happening at the same
time.Signed-off-by: David S. Miller
27 Feb, 2020
1 commit
-
list_for_each_entry_rcu() has built-in RCU and lock checking.
Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled.The devlink->lock is held when devlink_dpipe_table_find()
is called in non RCU read side section. Therefore, pass struct devlink
to devlink_dpipe_table_find() for lockdep checking.Signed-off-by: Madhuparna Bhowmik
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller
26 Feb, 2020
2 commits
-
Add cookie argument to devlink_trap_report() allowing driver to pass in
the user cookie. Pass on the cookie down to drop monitor code.Signed-off-by: Jiri Pirko
Signed-off-by: Ido Schimmel
Reviewed-by: Jakub Kicinski
Signed-off-by: David S. Miller -
Allow driver to indicate cookie metadata for registered traps.
Signed-off-by: Jiri Pirko
Signed-off-by: Ido Schimmel
Reviewed-by: Jakub Kicinski
Signed-off-by: David S. Miller
25 Feb, 2020
1 commit
-
Add packet traps that can report packets that were dropped during ACL
processing.Signed-off-by: Jiri Pirko
Signed-off-by: Ido Schimmel
Signed-off-by: David S. Miller
24 Feb, 2020
1 commit
-
devlink_dpipe_table_find() should be called under either
rcu_read_lock() or devlink->lock. devlink_dpipe_table_register()
calls devlink_dpipe_table_find() without holding the lock
and acquires it later. Therefore hold the devlink->lock
from the beginning of devlink_dpipe_table_register().Suggested-by: Jiri Pirko
Signed-off-by: Madhuparna Bhowmik
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller
19 Feb, 2020
1 commit
-
Add a new API for start/end binary array brackets [] to force array
around binary data as required from JSON. With this restriction, re-open
API to set binary fmsg data.Signed-off-by: Aya Levin
Reviewed-by: Jiri Pirko
Signed-off-by: Saeed Mahameed
05 Feb, 2020
1 commit
-
commit fdd41ec21e15 ("devlink: Return right error code in case of errors
for region read") modified the region read code to report errors
properly in unexpected cases.In the case where the start_offset and ret_offset match, it unilaterally
converted this into an error. This causes an issue for the "dump"
version of the command. In this case, the devlink region dump will
always report an invalid argument:000000000000ffd0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
000000000000ffe0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
devlink answers: Invalid argument
000000000000fff0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffThis occurs because the expected flow for the dump is to return 0 after
there is no further data.The simplest fix would be to stop converting the error code to -EINVAL
if start_offset == ret_offset. However, avoid unnecessary work by
checking for when start_offset is larger than the region size and
returning 0 upfront.Fixes: fdd41ec21e15 ("devlink: Return right error code in case of errors for region read")
Signed-off-by: Jacob Keller
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller
25 Jan, 2020
1 commit
-
Devlink health recover notifications were added only on driver direct
updates of health_state through devlink_health_reporter_state_update().
Add notifications on updates of health_state by devlink flows of report
and recover.Moved functions devlink_nl_health_reporter_fill() and
devlink_recover_notify() to avoid forward declaration.Fixes: 97ff3bd37fac ("devlink: add devink notification when reporter update health state")
Signed-off-by: Moshe Shemesh
Signed-off-by: David S. Miller
20 Jan, 2020
1 commit
19 Jan, 2020
3 commits
-
Add packet trap that can report NVE packets that the device decided to
drop because their overlay source MAC is multicast.Signed-off-by: Amit Cohen
Acked-by: Jiri Pirko
Signed-off-by: Ido Schimmel
Signed-off-by: David S. Miller -
Add packet traps that can report packets that were dropped during tunnel
decapsulation.Signed-off-by: Amit Cohen
Acked-by: Jiri Pirko
Signed-off-by: Ido Schimmel
Signed-off-by: David S. Miller -
Add packet trap that can report packets that reached the router, but are
non-routable. For example, IGMP queries can be flooded by the device in
layer 2 and reach the router. Such packets should not be routed and
instead dropped.Signed-off-by: Amit Cohen
Signed-off-by: Ido Schimmel
Signed-off-by: David S. Miller
12 Jan, 2020
1 commit
-
The function to obtain a unique snapshot id was mistakenly typo'd as
devlink_region_shapshot_id_get. Fix this typo by renaming the function
and all of its users.Signed-off-by: Jacob Keller
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller
11 Jan, 2020
1 commit
-
The commit cited below causes devlink to emit a warning if a type was
not set on a devlink port for longer than 30 seconds to "prevent
misbehavior of drivers". This proved to be problematic when
unregistering the backing netdev. The flow is always:devlink_port_type_clear() // schedules the warning
unregister_netdev() // blocking
devlink_port_unregister() // cancels the warningThe call to unregister_netdev() can block for long periods of time for
various reasons: RTNL lock is contended, large amounts of configuration
to unroll following dismantle of the netdev, etc. This results in
devlink emitting a warning despite the driver behaving correctly.In emulated environments (of future hardware) which are usually very
slow, the warning can also be emitted during port creation as more than
30 seconds can pass between the time the devlink port is registered and
when its type is set.In addition, syzbot has hit this warning [1] 1974 times since 07/11/19
without being able to produce a reproducer. Probably because
reproduction depends on the load or other bugs (e.g., RTNL not being
released).To prevent bogus warnings, increase the timeout to 1 hour.
[1] https://syzkaller.appspot.com/bug?id=e99b59e9c024a666c9f7450dc162a4b74d09d9cb
Fixes: 136bf27fc0e9 ("devlink: add warning in case driver does not set port type")
Signed-off-by: Ido Schimmel
Reported-by: syzbot+b0a18ed7b08b735d2f41@syzkaller.appspotmail.com
Reported-by: Alex Veber
Tested-by: Alex Veber
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller
09 Jan, 2020
1 commit
-
add a devlink notification when reporter update the health
state.Signed-off-by: Vikas Gupta
Signed-off-by: David S. Miller