Eric Lee / smarc-fsl-linux-kernel

22 Mar, 2020

1 commit

d5bee7374 net/tls: Annotate access to sk_prot with READ_ONCE/WRITE_ONCE ... Browse Code »

sockmap performs lockless writes to sk->sk_prot on the following paths:

tcp_bpf_{recvmsg|sendmsg} / sock_map_unref
sk_psock_put
sk_psock_drop
sk_psock_restore_proto
WRITE_ONCE(sk->sk_prot, proto)

To prevent load/store tearing [1], and to make tooling aware of intentional
shared access [2], we need to annotate other sites that access sk_prot with
READ_ONCE/WRITE_ONCE macros.

Change done with Coccinelle with following semantic patch:

@@
expression E;
identifier I;
struct sock *sk;
identifier sk_prot =~ "^sk_prot$";
@@
(
E =
-sk->sk_prot
+READ_ONCE(sk->sk_prot)
|
-sk->sk_prot = E
+WRITE_ONCE(sk->sk_prot, E)
|
-sk->sk_prot
+READ_ONCE(sk->sk_prot)
->I
)

Signed-off-by: Jakub Sitnicki
Signed-off-by: David S. Miller

Jakub Sitnicki
2020-03-22 11:08:17 +0800

20 Feb, 2020

1 commit

06f5201c6 net/tls: Fix to avoid gettig invalid tls record ... Browse Code »

Current code doesn't check if tcp sequence number is starting from (/after)
1st record's start sequnce number. It only checks if seq number is before
1st record's end sequnce number. This problem will always be a possibility
in re-transmit case. If a record which belongs to a requested seq number is
already deleted, tls_get_record will start looking into list and as per the
check it will look if seq number is before the end seq of 1st record, which
will always be true and will return 1st record always, it should in fact
return NULL.
As part of the fix, start looking each record only if the sequence number
lies in the list else return NULL.
There is one more check added, driver look for the start marker record to
handle tcp packets which are before the tls offload start sequence number,
hence return 1st record if the record is tls start marker and seq number is
before the 1st record's starting sequence number.

Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Rohit Maheshwari
Reviewed-by: Jakub Kicinski
Signed-off-by: David S. Miller

Rohit Maheshwari
2020-02-20 08:32:06 +0800

20 Dec, 2019

1 commit

8d5a49e9e net/tls: add helper for testing if socket is RX offloaded ... Browse Code »

There is currently no way for driver to reliably check that
the socket it has looked up is in fact RX offloaded. Add
a helper. This allows drivers to catch misbehaving firmware.

Signed-off-by: Jakub Kicinski
Signed-off-by: David S. Miller

Jakub Kicinski
2019-12-20 09:46:51 +0800

07 Dec, 2019

1 commit

4a5cdc604 net/tls: Fix return values to avoid ENOTSUPP ... Browse Code »

ENOTSUPP is not available in userspace, for example:

setsockopt failed, 524, Unknown error 524

Signed-off-by: Valentin Vidic
Acked-by: Jakub Kicinski
Signed-off-by: David S. Miller

Valentin Vidic
2019-12-07 12:15:39 +0800

10 Nov, 2019

1 commit

14684b930 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

One conflict in the BPF samples Makefile, some fixes in 'net' whilst
we were converting over to Makefile.target rules in 'net-next'.

Signed-off-by: David S. Miller

David S. Miller
2019-11-10 03:04:37 +0800

07 Nov, 2019

2 commits

79ffe6087 net/tls: add a TX lock ... Browse Code »

TLS TX needs to release and re-acquire the socket lock if send buffer
fills up.

TLS SW TX path currently depends on only allowing one thread to enter
the function by the abuse of sk_write_pending. If another writer is
already waiting for memory no new ones are allowed in.

This has two problems:
- writers don't wake other threads up when they leave the kernel;
meaning that this scheme works for single extra thread (second
application thread or delayed work) because memory becoming
available will send a wake up request, but as Mallesham and
Pooja report with larger number of threads it leads to threads
being put to sleep indefinitely;
- the delayed work does not get _scheduled_ but it may _run_ when
other writers are present leading to crashes as writers don't
expect state to change under their feet (same records get pushed
and freed multiple times); it's hard to reliably bail from the
work, however, because the mere presence of a writer does not
guarantee that the writer will push pending records before exiting.

Ensuring wakeups always happen will make the code basically open
code a mutex. Just use a mutex.

The TLS HW TX path does not have any locking (not even the
sk_write_pending hack), yet it uses a per-socket sg_tx_data
array to push records.

Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
Reported-by: Mallesham Jatharakonda
Reported-by: Pooja Trivedi
Signed-off-by: Jakub Kicinski
Reviewed-by: Simon Horman
Signed-off-by: David S. Miller

Jakub Kicinski
2019-11-07 09:33:32 +0800
02b1fa07b net/tls: don't pay attention to sk_write_pending when pushing partial records ... Browse Code »

sk_write_pending being not zero does not guarantee that partial
record will be pushed. If the thread waiting for memory times out
the pending record may get stuck.

In case of tls_device there is no path where parial record is
set and writer present in the first place. Partial record is
set only in tls_push_sg() and tls_push_sg() will return an
error immediately. All tls_device callers of tls_push_sg()
will return (and not wait for memory) if it failed.

Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: Jakub Kicinski
Reviewed-by: Simon Horman
Signed-off-by: David S. Miller

Jakub Kicinski
2019-11-07 09:33:32 +0800

07 Oct, 2019

3 commits

4de30a8d5 net/tls: pass context to tls_device_decrypted() ... Browse Code »

Avoid unnecessary pointer chasing and calculations, callers already
have most of the state tls_device_decrypted() needs.

Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-10-07 21:58:27 +0800
34ef1ed19 net/tls: make allocation failure unlikely ... Browse Code »

Make sure GCC realizes it's unlikely that allocations will fail.

Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-10-07 21:58:27 +0800
93277b258 net/tls: mark sk->err being set as unlikely ... Browse Code »

Tell GCC sk->err is not likely to be set.

Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-10-07 21:58:27 +0800

06 Oct, 2019

3 commits

a4d26fdbc net/tls: add TlsDeviceRxResync statistic ... Browse Code »

Add a statistic for number of RX resyncs sent down to the NIC.

Signed-off-by: Jakub Kicinski
Signed-off-by: David S. Miller

Jakub Kicinski
2019-10-06 07:29:00 +0800
9ec1c6ac2 net/tls: add device decrypted trace point ... Browse Code »

Add a tracepoint to the TLS offload's fast path. This tracepoint
can be used to track the decrypted and encrypted status of received
records. Records decrypted by the device should have decrypted set
to 1, records which have neither decrypted nor decrypted set are
partially decrypted, require re-encryption and therefore are most
expensive to deal with.

Signed-off-by: Jakub Kicinski
Signed-off-by: David S. Miller

Jakub Kicinski
2019-10-06 07:29:00 +0800
8538d29ce net/tls: add tracing for device/offload events ... Browse Code »

Add tracing of device-related interaction to aid performance
analysis, especially around resync:

tls:tls_device_offload_set
tls:tls_device_rx_resync_send
tls:tls_device_rx_resync_nh_schedule
tls:tls_device_rx_resync_nh_delay
tls:tls_device_tx_resync_req
tls:tls_device_tx_resync_send

Signed-off-by: Jakub Kicinski
Signed-off-by: David S. Miller

Jakub Kicinski
2019-10-06 07:29:00 +0800

08 Sep, 2019

4 commits

e681cc603 net/tls: align non temporal copy to cache lines ... Browse Code »

Unlike normal TCP code TLS has to touch the cache lines
it copies into to fill header info. On memory-heavy workloads
having non temporal stores and normal accesses targeting
the same cache line leads to significant overhead.

Measured 3% overhead running 3600 round robin connections
with additional memory heavy workload.

Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-09-08 00:10:34 +0800
e7b159a48 net/tls: remove the record tail optimization ... Browse Code »

For TLS device offload the tag/message authentication code are
filled in by the device. The kernel merely reserves space for
them. Because device overwrites it, the contents of the tag make
do no matter. Current code tries to save space by reusing the
header as the tag. This, however, leads to an additional frag
being created and defeats buffer coalescing (which trickles
all the way down to the drivers).

Remove this optimization, and try to allocate the space for
the tag in the usual way, leave the memory uninitialized.
If memory allocation fails rewind the record pointer so that
we use the already copied user data as tag.

Note that the optimization was actually buggy, as the tag
for TLS 1.2 is 16 bytes, but header is just 13, so the reuse
may had looked past the end of the page..

Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-09-08 00:10:34 +0800
d4774ac0d net/tls: use RCU for the adder to the offload record list ... Browse Code »

All modifications to TLS record list happen under the socket
lock. Since records form an ordered queue readers are only
concerned about elements being removed, additions can happen
concurrently.

Use RCU primitives to ensure the correct access types
(READ_ONCE/WRITE_ONCE).

Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-09-08 00:10:34 +0800
7ccd45191 net/tls: unref frags in order ... Browse Code »

It's generally more cache friendly to walk arrays in order,
especially those which are likely not in cache.

Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-09-08 00:10:34 +0800

05 Sep, 2019

3 commits

6e3d02b67 net/tls: dedup the record cleanup ... Browse Code »

If retransmit record hint fall into the cleanup window we will
free it by just walking the list. No need to duplicate the code.

Signed-off-by: Jakub Kicinski
Reviewed-by: John Hurley
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-09-05 15:49:49 +0800
3544c98ac net/tls: narrow down the critical area of device_offload_lock ... Browse Code »

On setsockopt path we need to hold device_offload_lock from
the moment we check netdev is up until the context is fully
ready to be added to the tls_device_list.

No need to hold it around the get_netdev_for_sock().
Change the code and remove the confusing comment.

Signed-off-by: Jakub Kicinski
Reviewed-by: John Hurley
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-09-05 15:49:49 +0800
90962b489 net/tls: don't jump to return ... Browse Code »

Reusing parts of error path for normal exit will make
next commit harder to read, untangle the two.

Signed-off-by: Jakub Kicinski
Reviewed-by: John Hurley
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-09-05 15:49:49 +0800

01 Sep, 2019

1 commit

15a7dea75 net/tls: use RCU protection on icsk->icsk_ulp_data ... Browse Code »

We need to make sure context does not get freed while diag
code is interrogating it. Free struct tls_context with
kfree_rcu().

We add the __rcu annotation directly in icsk, and cast it
away in the datapath accessor. Presumably all ULPs will
do a similar thing.

Signed-off-by: Jakub Kicinski
Signed-off-by: David S. Miller

Jakub Kicinski
2019-09-01 14:44:28 +0800

20 Aug, 2019

1 commit

446bf64b6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net ... Browse Code »

Merge conflict of mlx5 resolved using instructions in merge
commit 9566e650bf7fdf58384bb06df634f7531ca3a97e.

Signed-off-by: David S. Miller

David S. Miller
2019-08-20 02:54:03 +0800

09 Aug, 2019

1 commit

414776621 net/tls: prevent skb_orphan() from leaking TLS plain text with offload ... Browse Code »

sk_validate_xmit_skb() and drivers depend on the sk member of
struct sk_buff to identify segments requiring encryption.
Any operation which removes or does not preserve the original TLS
socket such as skb_orphan() or skb_clone() will cause clear text
leaks.

Make the TCP socket underlying an offloaded TLS connection
mark all skbs as decrypted, if TLS TX is in offload mode.
Then in sk_validate_xmit_skb() catch skbs which have no socket
(or a socket with no validation) and decrypted flag set.

Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and
sk->sk_validate_xmit_skb are slightly interchangeable right now,
they all imply TLS offload. The new checks are guarded by
CONFIG_TLS_DEVICE because that's the option guarding the
sk_buff->decrypted member.

Second, smaller issue with orphaning is that it breaks
the guarantee that packets will be delivered to device
queues in-order. All TLS offload drivers depend on that
scheduling property. This means skb_orphan_partial()'s
trick of preserving partial socket references will cause
issues in the drivers. We need a full orphan, and as a
result netem delay/throttling will cause all TLS offload
skbs to be dropped.

Reusing the sk_buff->decrypted flag also protects from
leaking clear text when incoming, decrypted skb is redirected
(e.g. by TC).

See commit 0608c69c9a80 ("bpf: sk_msg, sock{map|hash} redirect
through ULP") for justification why the internal flag is safe.
The only location which could leak the flag in is tcp_bpf_sendmsg(),
which is taken care of by clearing the previously unused bit.

v2:
- remove superfluous decrypted mark copy (Willem);
- remove the stale doc entry (Boris);
- rely entirely on EOR marking to prevent coalescing (Boris);
- use an internal sendpages flag instead of marking the socket
(Boris).
v3 (Willem):
- reorganize the can_skb_orphan_partial() condition;
- fix the flag leak-in through tcp_bpf_sendmsg.

Signed-off-by: Jakub Kicinski
Acked-by: Willem de Bruijn
Reviewed-by: Boris Pismenny
Signed-off-by: David S. Miller

Jakub Kicinski
2019-08-09 13:39:35 +0800

31 Jul, 2019

1 commit

b54c9d5bd net: Use skb_frag_off accessors ... Browse Code »

Use accessor functions for skb fragment's page_offset instead
of direct references, in preparation for bvec conversion.

Signed-off-by: Jonathan Lemon
Signed-off-by: David S. Miller

Jonathan Lemon
2019-07-31 05:21:32 +0800

23 Jul, 2019

1 commit

d8e18a516 net: Use skb accessors in network core ... Browse Code »

In preparation for unifying the skb_frag and bio_vec, use the fine
accessors which already exist and use skb_frag_t instead of
struct skb_frag_struct.

Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: David S. Miller

Matthew Wilcox (Oracle)
2019-07-23 11:47:56 +0800

09 Jul, 2019

3 commits

ab232e61e net/tls: add missing prot info init ... Browse Code »

Turns out TLS_TX in HW offload mode does not initialize tls_prot_info.
Since commit 9cd81988cce1 ("net/tls: use version from prot") we actually
use this field on the datapath. Luckily we always compare it to TLS 1.3,
and assume 1.2 otherwise. So since zero is not equal to 1.3, everything
worked fine.

Fixes: 9cd81988cce1 ("net/tls: use version from prot")
Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-07-09 11:21:09 +0800
b5d9a834f net/tls: don't clear TX resync flag on error ... Browse Code »

Introduce a return code for the tls_dev_resync callback.

When the driver TX resync fails, kernel can retry the resync again
until it succeeds. This prevents drivers from attempting to offload
TLS packets if the connection is known to be out of sync.

We don't worry about the RX resync since they will be retried naturally
as more encrypted records get received.

Signed-off-by: Dirk van der Merwe
Reviewed-by: Jakub Kicinski
Signed-off-by: David S. Miller

Dirk van der Merwe
2019-07-09 11:21:09 +0800
af144a983 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Two cases of overlapping changes, nothing fancy.

Signed-off-by: David S. Miller

David S. Miller
2019-07-09 10:48:57 +0800

02 Jul, 2019

2 commits

acd3e96d5 net/tls: make sure offload also gets the keys wiped ... Browse Code »

Commit 86029d10af18 ("tls: zero the crypto information from tls_context
before freeing") added memzero_explicit() calls to clear the key material
before freeing struct tls_context, but it missed tls_device.c has its
own way of freeing this structure. Replace the missing free.

Fixes: 86029d10af18 ("tls: zero the crypto information from tls_context before freeing")
Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-07-02 10:22:36 +0800
618bac459 net/tls: reject offload of TLS 1.3 ... Browse Code »

Neither drivers nor the tls offload code currently supports TLS
version 1.3. Check the TLS version when installing connection
state. TLS 1.3 will just fallback to the kernel crypto for now.

Fixes: 130b392c6cd6 ("net: tls: Add tls 1.3 support")
Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-07-02 10:21:31 +0800

12 Jun, 2019

6 commits

501800740 net/tls: add kernel-driven resync mechanism for TX ... Browse Code »

TLS offload drivers keep track of TCP seq numbers to make sure
the packets are fed into the HW in order.

When packets get dropped on the way through the stack, the driver
will get out of sync and have to use fallback encryption, but unless
TCP seq number is resynced it will never match the packets correctly
(or even worse - use incorrect record sequence number after TCP seq
wraps).

Existing drivers (mlx5) feed the entire record on every out-of-order
event, allowing FW/HW to always be in sync.

This patch adds an alternative, more akin to the RX resync. When
driver sees a frame which is past its expected sequence number the
stream must have gotten out of order (if the sequence number is
smaller than expected its likely a retransmission which doesn't
require resync). Driver will ask the stack to perform TX sync
before it submits the next full record, and fall back to software
crypto until stack has performed the sync.

Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-06-12 03:22:27 +0800
eeb2efaf3 net/tls: generalize the resync callback ... Browse Code »

Currently only RX direction is ever resynced, however, TX may
also get out of sequence if packets get dropped on the way to
the driver. Rename the resync callback and add a direction
parameter.

Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-06-12 03:22:27 +0800
f953d33ba net/tls: add kernel-driven TLS RX resync ... Browse Code »

TLS offload device may lose sync with the TCP stream if packets
arrive out of order. Drivers can currently request a resync at
a specific TCP sequence number. When a record is found starting
at that sequence number kernel will inform the device of the
corresponding record number.

This requires the device to constantly scan the stream for a
known pattern (constant bytes of the header) after sync is lost.

This patch adds an alternative approach which is entirely under
the control of the kernel. Kernel tracks records it had to fully
decrypt, even though TLS socket is in TLS_HW mode. If multiple
records did not have any decrypted parts - it's a pretty strong
indication that the device is out of sync.

We choose the min number of fully encrypted records to be 2,
which should hopefully be more than will get retransmitted at
a time.

After kernel decides the device is out of sync it schedules a
resync request. If the TCP socket is empty the resync gets
performed immediately. If socket is not empty we leave the
record parser to resync when next record comes.

Before resync in message parser we peek at the TCP socket and
don't attempt the sync if the socket already has some of the
next record queued.

On resync failure (encrypted data continues to flow in) we
retry with exponential backoff, up to once every 128 records
(with a 16k record thats at most once every 2M of data).

Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-06-12 03:22:26 +0800
fe58a5a02 net/tls: rename handle_device_resync() ... Browse Code »

handle_device_resync() doesn't describe the function very well.
The function checks if resync should be issued upon parsing of
a new record.

Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-06-12 03:22:26 +0800
89fec474f net/tls: pass record number as a byte array ... Browse Code »

TLS offload code casts record number to a u64. The buffer
should be aligned to 8 bytes, but its actually a __be64, and
the rest of the TLS code treats it as big int. Make the
offload callbacks take a byte array, drivers can make the
choice to do the ugly cast if they want to.

Prepare for copying the record number onto the stack by
defining a constant for max size of the byte array.

Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-06-12 03:22:26 +0800
496737395 net/tls: simplify seq calculation in handle_device_resync() ... Browse Code »

We subtract "TLS_HEADER_SIZE - 1" from req_seq, then if they
match we add the same constant to seq. Just add it to seq,
and we don't have to touch req_seq.

Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-06-12 03:22:26 +0800

08 Jun, 2019

1 commit

a6cdeeb16 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Some ISDN files that got removed in net-next had some changes
done in mainline, take the removals.

Signed-off-by: David S. Miller

David S. Miller
2019-06-08 02:00:14 +0800

05 Jun, 2019

3 commits

fb0f886fa net/tls: don't pass version to tls_advance_record_sn() ... Browse Code »

All callers pass prot->version as the last parameter
of tls_advance_record_sn(), yet tls_advance_record_sn()
itself needs a pointer to prot. Pass prot from callers.

Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-06-05 05:33:50 +0800
9cd81988c net/tls: use version from prot ... Browse Code »

ctx->prot holds the same information as per-direction contexts.
Almost all code gets TLS version from this structure, convert
the last two stragglers, this way we can improve the cache
utilization by moving the per-direction data into cold cache lines.

Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-06-05 05:33:50 +0800
1fe275d43 net/tls: don't re-check msg decrypted status in tls_device_decrypted() ... Browse Code »

tls_device_decrypted() is only called from decrypt_skb_update(),
when ctx->decrypted == false, there is no need to re-check the bit.

Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
Signed-off-by: David S. Miller

Jakub Kicinski
2019-06-05 05:33:50 +0800