16 Jul, 2018
23 commits
-
This patch adds common functions to handle mellanox metadata headers.
These functions are used by IPsec and TLS to process FPGA metadata.Signed-off-by: Boris Pismenny
Signed-off-by: David S. Miller -
This patch enables TLS Rx based on available HW capabilities.
Signed-off-by: Boris Pismenny
Signed-off-by: David S. Miller -
This patch adds software statistics for TLS to count important
events.Signed-off-by: Boris Pismenny
Signed-off-by: David S. Miller -
Implement the TLS rx offload data path according to the
requirements of the TLS generic NIC offload infrastructure.Special metadata ethertype is used to pass information to
the hardware.When hardware loses synchronization a special resync request
metadata message is used to request resync.Signed-off-by: Boris Pismenny
Signed-off-by: Ilya Lesokhin
Signed-off-by: David S. Miller -
Add the mlx5 implementation of the TLS Rx routines to add/del TLS
contexts, also add the tls_dev_resync_rx routine
to work with the TLS inline Rx crypto offload infrastructure.Signed-off-by: Boris Pismenny
Signed-off-by: Ilya Lesokhin
Signed-off-by: David S. Miller -
In Innova TLS, TLS contexts are added or deleted
via a command message over the SBU connection.
The HW then sends a response message over the same connection.Complete the implementation for Innova TLS (FPGA-based) hardware by
adding support for rx inline crypto offload.Signed-off-by: Boris Pismenny
Signed-off-by: Ilya Lesokhin
Signed-off-by: David S. Miller -
For symmetry, we rename mlx5e_tls_offload_context to
mlx5e_tls_offload_context_tx before we add mlx5e_tls_offload_context_rx.Signed-off-by: Boris Pismenny
Reviewed-by: Aviad Yehezkel
Reviewed-by: Tariq Toukan
Signed-off-by: David S. Miller -
zerocopy_from_iter iterates over the message, but it doesn't revert the
updates made by the iov iteration. This patch fixes it. Now, the iov can
be used after calling zerocopy_from_iter.Fixes: 3c4d75591 ("tls: kernel TLS support")
Signed-off-by: Boris Pismenny
Signed-off-by: David S. Miller -
This patch completes the generic infrastructure to offload TLS crypto to a
network device. It enables the kernel to skip decryption and
authentication of some skbs marked as decrypted by the NIC. In the fast
path, all packets received are decrypted by the NIC and the performance
is comparable to plain TCP.This infrastructure doesn't require a TCP offload engine. Instead, the
NIC only decrypts packets that contain the expected TCP sequence number.
Out-Of-Order TCP packets are provided unmodified. As a result, at the
worst case a received TLS record consists of both plaintext and ciphertext
packets. These partially decrypted records must be reencrypted,
only to be decrypted.The notable differences between SW KTLS Rx and this offload are as
follows:
1. Partial decryption - Software must handle the case of a TLS record
that was only partially decrypted by HW. This can happen due to packet
reordering.
2. Resynchronization - tls_read_size calls the device driver to
resynchronize HW after HW lost track of TLS record framing in
the TCP stream.Signed-off-by: Boris Pismenny
Signed-off-by: David S. Miller -
This patch allows tls_set_sw_offload to fill the context in case it was
already allocated previously.We will use it in TLS_DEVICE to fill the RX software context.
Signed-off-by: Boris Pismenny
Signed-off-by: David S. Miller -
This patch splits tls_sw_release_resources_rx into two functions one
which releases all inner software tls structures and another that also
frees the containing structure.In TLS_DEVICE we will need to release the software structures without
freeeing the containing structure, which contains other information.Signed-off-by: Boris Pismenny
Signed-off-by: David S. Miller -
Previously, decrypt_skb also updated the TLS context.
Now, decrypt_skb only decrypts the payload using the current context,
while decrypt_skb_update also updates the state.Later, in the tls_device Rx flow, we will use decrypt_skb directly.
Signed-off-by: Boris Pismenny
Signed-off-by: David S. Miller -
For symmetry, we rename tls_offload_context to
tls_offload_context_tx before we add tls_offload_context_rx.Signed-off-by: Boris Pismenny
Signed-off-by: David S. Miller -
Prevent coalescing of decrypted and encrypted SKBs in GRO
and TCP layer.Signed-off-by: Boris Pismenny
Signed-off-by: Ilya Lesokhin
Signed-off-by: David S. Miller -
Add new netdev tls op for resynchronizing HW tls context
Signed-off-by: Boris Pismenny
Signed-off-by: David S. Miller -
This patch adds a netdev feature to configure TLS RX inline crypto offload.
Signed-off-by: Ilya Lesokhin
Signed-off-by: Boris Pismenny
Signed-off-by: David S. Miller -
The decrypted bit is propogated to cloned/copied skbs.
This will be used later by the inline crypto receive side offload
of tls.Signed-off-by: Boris Pismenny
Signed-off-by: Ilya Lesokhin
Signed-off-by: David S. Miller -
Maxime Chevallier says:
====================
net: mvpp2: add debugfs interfaceThe PPv2 Header Parser and Classifier are not straightforward to debug,
having easy access to some of the many lookup tables configuration is
helpful during development and debug.This series adds a basic debugfs interface, allowing to read data from
the Header Parser and some of the Classifier tables.For now, the interface is read-only, and contains only some basic info.
This was actually used during RSS development, and might be useful to
troubleshoot some issues we might find.The first patch of the series converts the mvpp2 files to SPDX, which
eases adding the new debugfs dedicated file.The second patch adds the interface, and exposes basic Header Parser data.
The 3rd patch adds a hit counter for the Header Parser TCAM.
The 4th patch exposes classifier info.
The 5th patch adds some hit counters for some of the classifier engines.
Changes since V1:
- Rebased on the lastest net-next
- Made cls_flow_get non static so that it can be used in mvpp2_debugfs
====================Signed-off-by: David S. Miller
-
The classification operations that are used for RSS make use of several
lookup tables. Having hit counters for these tables is really helpful
to determine what flows were matched by ingress traffic, and see the
path of packets among all the classifier tables.This commit adds hit counters for the 3 tables used at the moment :
- The decoding table (also called lookup_id table), that links flows
identified by the Header Parser to the flow table.There's one entry per flow, located at :
.../mvpp2//flows/XX/dec_hitsNote that there are 21 flows in the decoding table, whereas there are
52 flows in the Header Parser. That's because there are several kind
of traffic that will match a given flow. Reading the hit counter from
one sub-flow will clear all hit counter that have the same flow_id.This also applies to the flow_hits.
- The flow table, that contains all the different lookups to be
performed by the classifier for each packet of a given flow. The match
is done on the first entry of the flow sequence.- The C2 engine entries, that are used to assign the default rx queue,
and enable or disable RSS for a given port.There's one entry per flow, located at:
.../mvpp2//flows/XX/flow_hitsThere is one C2 entry per port, so the c2 hit counter is located at :
.../mvpp2//ethX/c2_hitsAll hit counter values are 16-bits clear-on-read values.
Signed-off-by: Maxime Chevallier
Signed-off-by: David S. Miller -
The classifier configuration for RSS is quite complex, with several
lookup tables being used. This commit adds useful info in debugfs to
see how the different tables are configured :Added 2 new entries in the per-port directory :
- .../eth0/default_rxq : The default rx queue on that port
- .../eth0/rss_enable : Indicates if RSS is enabled in the C2 entryAdded the 'flows' directory :
It contains one entry per sub-flow. a 'sub-flow' is a unique path from
Header Parser to the flow table. Multiple sub-flows can point to the
same 'flow' (each flow has an id from 8 to 29, which is its index in the
Lookup Id table) :- .../flows/00/...
/01/...
...
/51/id : The flow id. There are 21 unique flows. There's one
flow per combination of the following parameters :
- L4 protocol (TCP, UDP, none)
- L3 protocol (IPv4, IPv6)
- L3 parameters (Fragmented or not)
- L2 parameters (Vlan tag presence or not)
.../type : The flow type. This is an even higher level flow,
that we manipulate with ethtool. It can be :
"udp4" "tcp4" "udp6" "tcp6" "ipv4" "ipv6" "other".
.../eth0/...
.../eth1/engine : The hash generation engine used for this
flow on the given port
.../hash_opts : The hash generation options indicating on
what data we base the hash (vlan tag, src
IP, src port, etc.)Signed-off-by: Maxime Chevallier
Signed-off-by: David S. Miller -
One helpful feature to help debug the Header Parser TCAM filter in PPv2
is to be able to see if the entries did match something when a packet
comes in. This can be done by using the built-in hit counter for TCAM
entries.This commit implements reading the counter, and exposing its value on
debugfs for each filter entry.The counter is a 16-bits clear-on-read value, located at:
.../mvpp2//parser/XXX/hitsSigned-off-by: Maxime Chevallier
Signed-off-by: David S. Miller -
Marvell PPv2 Packer Header Parser has a TCAM based filter, that is not
trivial to configure and debug. Being able to dump TCAM entries from
userspace can be really helpful to help development of new features
and debug existing ones.This commit adds a basic debugfs interface for the PPv2 driver, focusing
on TCAM related features./mvpp2/ --- f2000000.ethernet
\- f4000000.ethernet --- parser --- 000 ...
| \- 001
| \- ...
| \- 255 --- ai
| \- header_data
| \- lookup_id
| \- sram
| \- valid
\- eth1 ...
\- eth2 --- mac_filter
\- parser_entries
\- vid_filterThere's one directory per PPv2 instance, named after pdev->name to make
sure names are uniques. In each of these directories, there's :- one directory per interface on the controller, each containing :
- "mac_filter", which lists all filtered addresses for this port
(based on TCAM, not on the kernel's uc / mc lists)- "parser_entries", which lists the indices of all valid TCAM
entries that have this port in their port map- "vid_filter", which lists the vids allowed on this port, based on
TCAM- one "parser" directory (the parser is common to all ports), containing :
- one directory per TCAM entry (256 of them, from 0 to 255), each
containing :- "ai" : Contains the 1 byte Additional Info field from TCAM, and
- "header_data" : Contains the 8 bytes Header Data extracted from
the packet- "lookup_id" : Contains the 4 bits LU_ID
- "sram" : contains the raw SRAM data, which is the result of the TCAM
lookup. This readonly at the moment.- "valid" : Indicates if the entry is valid of not.
All entries are read-only, and everything is output in hex form.
Signed-off-by: Maxime Chevallier
Signed-off-by: David S. Miller -
Use the appropriate SPDX license identifiers and drop the license text.
This patch is only cosmetic.Signed-off-by: Antoine Tenart
Signed-off-by: Maxime Chevallier
Signed-off-by: David S. Miller
15 Jul, 2018
14 commits
-
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-07-15The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Various different arm32 JIT improvements in order to optimize code emission
and make the JIT code itself more robust, from Russell.2) Support simultaneous driver and offloaded XDP in order to allow for advanced
use-cases where some work is offloaded to the NIC and some to the host. Also
add ability for bpftool to load programs and maps beyond just the cgroup case,
from Jakub.3) Add BPF JIT support in nfp for multiplication as well as division. For the
latter in particular, it uses the reciprocal algorithm to emulate it, from Jiong.4) Add BTF pretty print functionality to bpftool in plain and JSON output
format, from Okash.5) Add build and installation to the BPF helper man page into bpftool, from Quentin.
6) Add a TCP BPF callback for listening sockets which is triggered right after
the socket transitions to TCP_LISTEN state, from Andrey.7) Add a new cgroup tree command to bpftool which iterates over the whole cgroup
tree and prints all attached programs, from Roman.8) Improve xdp_redirect_cpu sample to support parsing of double VLAN tagged
packets, from Jesper.
====================Signed-off-by: David S. Miller
-
Andrey Ignatov says:
====================
This patchset adds TCP-BPF callback for listening sockets.Patch 0001 provides more details and is the main patch in the set.
Patch 0006 adds selftest for the new callback.
Other patches are bug fixes and improvements in TCP-BPF selftest
to make it easier to extend in 0006.
====================Acked-by: Lawrence Brakmo
Signed-off-by: Daniel Borkmann -
Cover new TCP-BPF callback in test_tcpbpf: when listen() is called on
socket, set BPF_SOCK_OPS_STATE_CB_FLAG so that BPF_SOCK_OPS_STATE_CB
callback can be called on future state transition, and when such a
transition happens (TCP_LISTEN -> TCP_CLOSE), track it in the map and
verify it in user space later.Signed-off-by: Andrey Ignatov
Acked-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann -
Reduce amount of copy/paste for debug info when result is verified in
the test and keep that info together with values being checked so that
they won't get out of sync.It also improves debug experience: instead of checking manually what
doesn't match in debug output for all fields, only unexpected field is
printed.Signed-off-by: Andrey Ignatov
Acked-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann -
Switch to cgroup_helpers to simplify the code and fix cgroup cleanup:
before cgroup was not cleaned up after the test.It also removes SYSTEM macro, that only printed error, but didn't
terminate the test.Signed-off-by: Andrey Ignatov
Acked-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann -
Lack of const in cgroup helpers signatures forces to write ugly client
code. Fix it.Signed-off-by: Andrey Ignatov
Acked-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann -
Sync BPF_SOCK_OPS_TCP_LISTEN_CB related UAPI changes to tools/.
Signed-off-by: Andrey Ignatov
Acked-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann -
Add new TCP-BPF callback that is called on listen(2) right after socket
transition to TCP_LISTEN state.It fills the gap for listening sockets in TCP-BPF. For example BPF
program can set BPF_SOCK_OPS_STATE_CB_FLAG when socket becomes listening
and track later transition from TCP_LISTEN to TCP_CLOSE with
BPF_SOCK_OPS_STATE_CB callback.Before there was no way to do it with TCP-BPF and other options were
much harder to work with. E.g. socket state tracking can be done with
tracepoints (either raw or regular) but they can't be attached to cgroup
and their lifetime has to be managed separately.Signed-off-by: Andrey Ignatov
Acked-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann -
Ido Schimmel says:
====================
mlxsw: Add VRRP supportWhen a router that is acting as the default gateway of a host stops
functioning, the host will encounter packet loss until the router starts
functioning again.To increase the reliability of the default gateway without performing
reconfiguration on the host, a host can use a Virtual Router Redundancy
Protocol (VRRP) Router. This virtual router is composed from several
routers where only one is actually forwarding packets from the host (the
master router) while the other routers act as backup routers. The
election of the master router is determined by the VRRP protocol [1].Packets addressed to the virtual router are always sent to the virtual
router MAC address (IPv4: 00-00-5E-00-01-XX, IPv6: 00-00-5E-00-02-XX).
Such packets can only be accepted by the master router and must be
discarded by the backup routers.In Linux, VRRP is usually implemented by configuring a macvlan with the
virtual router MAC on top of the router interface that is connected to
the host / LAN. The macvlan on the master router is assigned the virtual
IP (VIP) that the host uses as its gateway.In order to support VRRP in mlxsw, we first need to enable macvlan upper
devices on top of mlxsw netdevs and their uppers. This is done by the
first patch, which also takes care of sanitizing macvlan configurations
that are not currently supported by the driver.The second patch directs packets with destination MAC addresses as the
macvlans to the router so that they will undergo an L3 lookup. This is
consistent with the kernel's behavior where the macvlan's Rx handler
will re-inject such packets to the Rx path so that they will be picked
up by the IPvX protocol handlers and undergo an L3 lookup. Note that the
driver prevents the macvlans from being enslaved to other devices, to
ensure the packets will be picked up by the protocol handler and not by
another Rx handler.The third patch adds packet traps for VRRP control packets for both IPv4
and IPv6. Finally, the last patch optimizes the reception of VRRP MACs
by potentially skipping one L2 lookup for them.
====================Signed-off-by: David S. Miller
-
Hosts using a VRRP router send their packets with a destination MAC of
the VRRP router which is of the following form [1]:IPv4 - 00-00-5E-00-01-{VRID}
IPv6 - 00-00-5E-00-02-{VRID}Where VRID is the ID of the virtual router. Such packets are directed to
the router block in the ASIC by an FDB entry that was added in the
previous patch.However, in certain cases it is possible to skip this FDB lookup and
send such packets directly to the router. This is accomplished by adding
these special MAC addresses to the RIF cache. If the cache is hit, the
packet will skip the L2 lookup and ingress the router with the RIF
specified in the cache entry.1. https://tools.ietf.org/html/rfc5798#section-7.3
Signed-off-by: Ido Schimmel
Reviewed-by: Petr Machata
Signed-off-by: David S. Miller -
Virtual Router Redundancy Protocol packets are used to communicate the
state of the Master router associated with the virtual router ID (VRID).These are link-local multicast packets sent with IP protocol 112 that
are trapped in the router block in the ASIC.Add a trap for these packets and mark the trapped packets to prevent
them from potentially being re-flooded by the bridge driver.Signed-off-by: Ido Schimmel
Reviewed-by: Petr Machata
Signed-off-by: David S. Miller -
An IP packet received on a netdev with a macvlan upper whose MAC matches
the packet's destination MAC will be re-injected to the Rx path as if it
was received by the macvlan, and perform an L3 lookup.Reflect this functionality to the ASIC by programming FDB entries that
will direct MACs of macvlan uppers to the router.In a similar fashion to router interfaces (RIFs) that are programmed
upon the addition of the first IP address on an interface and destroyed
upon the removal of the last IP address, the FDB entries for the macvlan
are added and destroyed based on the addition of the first and removal
of the last IP address on the macvlan.Signed-off-by: Ido Schimmel
Reviewed-by: Petr Machata
Signed-off-by: David S. Miller -
In order to allow more unicast MAC addresses (e.g., VRRP virtual MAC) to
be directed to the router we need to enable macvlan uppers on top of
mlxsw netdevs.Allow macvlan upper devices on top of mlxsw netdevs and sanitize
configurations that can't work. For example, a macvlan can't be enslaved
to a bridge as without ACLs the device doesn't take the destination MAC
into account when classifying a packet to a bridge instance (i.e., a
FID).Signed-off-by: Ido Schimmel
Reviewed-by: Petr Machata
Signed-off-by: David S. Miller -
tcp_rcv_nxt_update() is already executed in tcp_data_queue().
This line is redundant.See bellow,
tcp_queue_rcv
tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq);
tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq); <<<< redundantSigned-off-by: Yafang Shao
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
14 Jul, 2018
3 commits
-
This patch augments the output of bpftool's map dump and map lookup
commands to print data along side btf info, if the correspondin btf
info is available. The outputs for each of map dump and map lookup
commands are augmented in two ways:1. when neither of -j and -p are supplied, btf-ful map data is printed
whose aim is human readability. This means no commitments for json- or
backward- compatibility.2. when either -j or -p are supplied, a new json object named
"formatted" is added for each key-value pair. This object contains the
same data as the key-value pair, but with btf info. "formatted" object
promises json- and backward- compatibility. Below is a sample output.$ bpftool map dump -p id 8
[{
"key": ["0x0f","0x00","0x00","0x00"
],
"value": ["0x03", "0x00", "0x00", "0x00", ...
],
"formatted": {
"key": 15,
"value": {
"int_field": 3,
...
}
}
}
]This patch calls btf_dumper introduced in previous patch to accomplish
the above. Indeed, btf-ful info is only displayed if btf data for the
given map is available. Otherwise existing output is displayed as-is.Signed-off-by: Okash Khawaja
Acked-by: Martin KaFai Lau
Reviewed-by: Jakub Kicinski
Signed-off-by: Daniel Borkmann -
This consumes functionality exported in the previous patch. It does the
main job of printing with BTF data. This is used in the following patch
to provide a more readable output of a map's dump. It relies on
json_writer to do json printing. Below is sample output where map keys
are ints and values are of type struct A:typedef int int_type;
enum E {
E0,
E1,
};struct B {
int x;
int y;
};struct A {
int m;
unsigned long long n;
char o;
int p[8];
int q[4][8];
enum E r;
void *s;
struct B t;
const int u;
int_type v;
unsigned int w1: 3;
unsigned int w2: 3;
};$ sudo bpftool map dump id 14
[{
"key": 0,
"value": {
"m": 1,
"n": 2,
"o": "c",
"p": [15,16,17,18,15,16,17,18
],
"q": [[25,26,27,28,25,26,27,28
],[35,36,37,38,35,36,37,38
],[45,46,47,48,45,46,47,48
],[55,56,57,58,55,56,57,58
]
],
"r": 1,
"s": 0x7ffd80531cf8,
"t": {
"x": 5,
"y": 10
},
"u": 100,
"v": 20,
"w1": 0x7,
"w2": 0x3
}
}
]This patch uses json's {} and [] to imply struct/union and array. More
explicit information can be added later. For example, a command line
option can be introduced to print whether a key or value is struct
or union, name of a struct etc. This will however come at the expense
of duplicating info when, for example, printing an array of structs.
enums are printed as ints without their names.Signed-off-by: Okash Khawaja
Acked-by: Martin KaFai Lau
Reviewed-by: Jakub Kicinski
Signed-off-by: Daniel Borkmann -
This patch introduces btf__resolve_type() function and exports two
existing functions from libbpf. btf__resolve_type follows modifier
types like const and typedef until it hits a type which actually takes
up memory, and then returns it. This function follows similar pattern
to btf__resolve_size but instead of computing size, it just returns
the type.These functions will be used in the followig patch which parses
information inside array of `struct btf_type *`. btf_name_by_offset is
used for printing variable names.Signed-off-by: Okash Khawaja
Acked-by: Martin KaFai Lau
Acked-by: Song Liu
Reviewed-by: Jakub Kicinski
Signed-off-by: Daniel Borkmann