Eric Lee / smarc-fsl-linux-kernel

01 Sep, 2020

1 commit

ebc4ecd48 bpf: {cpu,dev}map: Change various functions return type from int to void ... Browse Code »

The functions bq_enqueue(), bq_flush_to_queue(), and bq_xmit_all() in
{cpu,dev}map.c always return zero. Changing the return type from int
to void makes the code easier to follow.

Suggested-by: David Ahern
Signed-off-by: Björn Töpel
Signed-off-by: Daniel Borkmann
Acked-by: Jesper Dangaard Brouer
Acked-by: Toke Høiland-Jørgensen
Link: https://lore.kernel.org/bpf/20200901083928.6199-1-bjorn.topel@gmail.com

Björn Töpel
2020-09-01 21:45:58 +0800

28 Aug, 2020

1 commit

f4d052592 bpf: Add map_meta_equal map ops ... Browse Code »

Some properties of the inner map is used in the verification time.
When an inner map is inserted to an outer map at runtime,
bpf_map_meta_equal() is currently used to ensure those properties
of the inserting inner map stays the same as the verification
time.

In particular, the current bpf_map_meta_equal() checks max_entries which
turns out to be too restrictive for most of the maps which do not use
max_entries during the verification time. It limits the use case that
wants to replace a smaller inner map with a larger inner map. There are
some maps do use max_entries during verification though. For example,
the map_gen_lookup in array_map_ops uses the max_entries to generate
the inline lookup code.

To accommodate differences between maps, the map_meta_equal is added
to bpf_map_ops. Each map-type can decide what to check when its
map is used as an inner map during runtime.

Also, some map types cannot be used as an inner map and they are
currently black listed in bpf_map_meta_alloc() in map_in_map.c.
It is not unusual that the new map types may not aware that such
blacklist exists. This patch enforces an explicit opt-in
and only allows a map to be used as an inner map if it has
implemented the map_meta_equal ops. It is based on the
discussion in [1].

All maps that support inner map has its map_meta_equal points
to bpf_map_meta_equal in this patch. A later patch will
relax the max_entries check for most maps. bpf_types.h
counts 28 map types. This patch adds 23 ".map_meta_equal"
by using coccinelle. -5 for
BPF_MAP_TYPE_PROG_ARRAY
BPF_MAP_TYPE_(PERCPU)_CGROUP_STORAGE
BPF_MAP_TYPE_STRUCT_OPS
BPF_MAP_TYPE_ARRAY_OF_MAPS
BPF_MAP_TYPE_HASH_OF_MAPS

The "if (inner_map->inner_map_meta)" check in bpf_map_meta_alloc()
is moved such that the same error is returned.

[1]: https://lore.kernel.org/bpf/20200522022342.899756-1-kafai@fb.com/

Signed-off-by: Martin KaFai Lau
Signed-off-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/20200828011806.1970400-1-kafai@fb.com

Martin KaFai Lau
2020-08-28 21:41:30 +0800

05 Jul, 2020

1 commit

f91c031e6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next ... Browse Code »

Daniel Borkmann says:

====================
pull-request: bpf-next 2020-07-04

The following pull-request contains BPF updates for your *net-next* tree.

We've added 73 non-merge commits during the last 17 day(s) which contain
a total of 106 files changed, 5233 insertions(+), 1283 deletions(-).

The main changes are:

1) bpftool ability to show PIDs of processes having open file descriptors
for BPF map/program/link/BTF objects, relying on BPF iterator progs
to extract this info efficiently, from Andrii Nakryiko.

2) Addition of BPF iterator progs for dumping TCP and UDP sockets to
seq_files, from Yonghong Song.

3) Support access to BPF map fields in struct bpf_map from programs
through BTF struct access, from Andrey Ignatov.

4) Add a bpf_get_task_stack() helper to be able to dump /proc/*/stack
via seq_file from BPF iterator progs, from Song Liu.

5) Make SO_KEEPALIVE and related options available to bpf_setsockopt()
helper, from Dmitry Yakunin.

6) Optimize BPF sk_storage selection of its caching index, from Martin
KaFai Lau.

7) Removal of redundant synchronize_rcu()s from BPF map destruction which
has been a historic leftover, from Alexei Starovoitov.

8) Several improvements to test_progs to make it easier to create a shell
loop that invokes each test individually which is useful for some CIs,
from Jesper Dangaard Brouer.

9) Fix bpftool prog dump segfault when compiled without skeleton code on
older clang versions, from John Fastabend.

10) Bunch of cleanups and minor improvements, from various others.
====================

Signed-off-by: David S. Miller

David S. Miller
2020-07-05 08:48:34 +0800

23 Jun, 2020

1 commit

2872e9ac3 bpf: Set map_btf_{name, id} for all map types ... Browse Code »

Set map_btf_name and map_btf_id for all map types so that map fields can
be accessed by bpf programs.

Signed-off-by: Andrey Ignatov
Signed-off-by: Daniel Borkmann
Acked-by: John Fastabend
Acked-by: Martin KaFai Lau
Link: https://lore.kernel.org/bpf/a825f808f22af52b018dbe82f1c7d29dab5fc978.1592600985.git.rdna@fb.com

Andrey Ignatov
2020-06-23 04:22:58 +0800

18 Jun, 2020

1 commit

99c51064f devmap: Use bpf_map_area_alloc() for allocating hash buckets ... Browse Code »

Syzkaller discovered that creating a hash of type devmap_hash with a large
number of entries can hit the memory allocator limit for allocating
contiguous memory regions. There's really no reason to use kmalloc_array()
directly in the devmap code, so just switch it to the existing
bpf_map_area_alloc() function that is used elsewhere.

Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Reported-by: Xiumei Mu
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov
Acked-by: John Fastabend
Link: https://lore.kernel.org/bpf/20200616142829.114173-1-toke@redhat.com

Toke Høiland-Jørgensen
2020-06-18 01:01:19 +0800

10 Jun, 2020

2 commits

281920b7e bpf: Devmap adjust uapi for attach bpf program ... Browse Code »

V2:
- Defer changing BPF-syscall to start at file-descriptor 1
- Use {} to zero initialise struct.

The recent commit fbee97feed9b ("bpf: Add support to attach bpf program to a
devmap entry"), introduced ability to attach (and run) a separate XDP
bpf_prog for each devmap entry. A bpf_prog is added via a file-descriptor.
As zero were a valid FD, not using the feature requires using value minus-1.
The UAPI is extended via tail-extending struct bpf_devmap_val and using
map->value_size to determine the feature set.

This will break older userspace applications not using the bpf_prog feature.
Consider an old userspace app that is compiled against newer kernel
uapi/bpf.h, it will not know that it need to initialise the member
bpf_prog.fd to minus-1. Thus, users will be forced to update source code to
get program running on newer kernels.

This patch remove the minus-1 checks, and have zero mean feature isn't used.

Followup patches either for kernel or libbpf should handle and avoid
returning file-descriptor zero in the first place.

Fixes: fbee97feed9b ("bpf: Add support to attach bpf program to a devmap entry")
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/159170950687.2102545.7235914718298050113.stgit@firesoul

Jesper Dangaard Brouer
2020-06-10 02:36:18 +0800
26afa0a4e bpf: Reset data_meta before running programs attached to devmap entry ... Browse Code »

This is a new context that does not handle metadata at the moment, so
mark data_meta invalid.

Fixes: fbee97feed9b ("bpf: Add support to attach bpf program to a devmap entry")
Signed-off-by: David Ahern
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200608151723.9539-1-dsahern@kernel.org

David Ahern
2020-06-10 02:19:35 +0800

02 Jun, 2020

4 commits

1b698fa5d xdp: Rename convert_to_xdp_frame in xdp_convert_buff_to_frame ... Browse Code »

In order to use standard 'xdp' prefix, rename convert_to_xdp_frame
utility routine in xdp_convert_buff_to_frame and replace all the
occurrences

Signed-off-by: Lorenzo Bianconi
Signed-off-by: Alexei Starovoitov
Acked-by: Jesper Dangaard Brouer
Link: https://lore.kernel.org/bpf/6344f739be0d1a08ab2b9607584c4d5478c8c083.1590698295.git.lorenzo@kernel.org

Lorenzo Bianconi
2020-06-02 06:02:53 +0800
64b59025c xdp: Add xdp_txq_info to xdp_buff ... Browse Code »

Add xdp_txq_info as the Tx counterpart to xdp_rxq_info. At the
moment only the device is added. Other fields (queue_index)
can be added as use cases arise.

>From a UAPI perspective, add egress_ifindex to xdp context for
bpf programs to see the Tx device.

Update the verifier to only allow accesses to egress_ifindex by
XDP programs with BPF_XDP_DEVMAP expected attach type.

Signed-off-by: David Ahern
Signed-off-by: Alexei Starovoitov
Acked-by: Toke Høiland-Jørgensen
Link: https://lore.kernel.org/bpf/20200529220716.75383-4-dsahern@kernel.org
Signed-off-by: Alexei Starovoitov

David Ahern
2020-06-02 05:48:32 +0800
fbee97fee bpf: Add support to attach bpf program to a devmap entry ... Browse Code »

Add BPF_XDP_DEVMAP attach type for use with programs associated with a
DEVMAP entry.

Allow DEVMAPs to associate a program with a device entry by adding
a bpf_prog.fd to 'struct bpf_devmap_val'. Values read show the program
id, so the fd and id are a union. bpf programs can get access to the
struct via vmlinux.h.

The program associated with the fd must have type XDP with expected
attach type BPF_XDP_DEVMAP. When a program is associated with a device
index, the program is run on an XDP_REDIRECT and before the buffer is
added to the per-cpu queue. At this point rxq data is still valid; the
next patch adds tx device information allowing the prorgam to see both
ingress and egress device indices.

XDP generic is skb based and XDP programs do not work with skb's. Block
the use case by walking maps used by a program that is to be attached
via xdpgeneric and fail if any of them are DEVMAP / DEVMAP_HASH with

Block attach of BPF_XDP_DEVMAP programs to devices.

Signed-off-by: David Ahern
Signed-off-by: Alexei Starovoitov
Acked-by: Toke Høiland-Jørgensen
Link: https://lore.kernel.org/bpf/20200529220716.75383-3-dsahern@kernel.org
Signed-off-by: Alexei Starovoitov

David Ahern
2020-06-02 05:48:32 +0800
7f1c04269 devmap: Formalize map value as a named struct ... Browse Code »

Add 'struct bpf_devmap_val' to formalize the expected values that can
be passed in for a DEVMAP. Update devmap code to use the struct.

Signed-off-by: David Ahern
Signed-off-by: Alexei Starovoitov
Acked-by: Toke Høiland-Jørgensen
Link: https://lore.kernel.org/bpf/20200529220716.75383-2-dsahern@kernel.org
Signed-off-by: Alexei Starovoitov

David Ahern
2020-06-02 05:48:31 +0800

23 Apr, 2020

1 commit

788f87ac6 xdp: export the DEV_MAP_BULK_SIZE macro ... Browse Code »

Export the DEV_MAP_BULK_SIZE macro to the header file so that drivers
can directly use it as the maximum number of xdp_frames received in the
.ndo_xdp_xmit() callback.

Signed-off-by: Ioana Ciornei
Acked-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Ioana Ciornei
2020-04-23 11:11:29 +0800

27 Jan, 2020

2 commits

b23bfa563 bpf, xdp: Remove no longer required rcu_read_{un}lock() ... Browse Code »

Now that we depend on rcu_call() and synchronize_rcu() to also wait
for preempt_disabled region to complete the rcu read critical section
in __dev_map_flush() is no longer required. Except in a few special
cases in drivers that need it for other reasons.

These originally ensured the map reference was safe while a map was
also being free'd. And additionally that bpf program updates via
ndo_bpf did not happen while flush updates were in flight. But flush
by new rules can only be called from preempt-disabled NAPI context.
The synchronize_rcu from the map free path and the rcu_call from the
delete path will ensure the reference there is safe. So lets remove
the rcu_read_lock and rcu_read_unlock pair to avoid any confusion
around how this is being protected.

If the rcu_read_lock was required it would mean errors in the above
logic and the original patch would also be wrong.

Now that we have done above we put the rcu_read_lock in the driver
code where it is needed in a driver dependent way. I think this
helps readability of the code so we know where and why we are
taking read locks. Most drivers will not need rcu_read_locks here
and further XDP drivers already have rcu_read_locks in their code
paths for reading xdp programs on RX side so this makes it symmetric
where we don't have half of rcu critical sections define in driver
and the other half in devmap.

Signed-off-by: John Fastabend
Signed-off-by: Daniel Borkmann
Acked-by: Jesper Dangaard Brouer
Link: https://lore.kernel.org/bpf/1580084042-11598-4-git-send-email-john.fastabend@gmail.com

John Fastabend
2020-01-27 18:16:25 +0800
42a84a8cd bpf, xdp: Update devmap comments to reflect napi/rcu usage ... Browse Code »

Now that we rely on synchronize_rcu and call_rcu waiting to
exit perempt-disable regions (NAPI) lets update the comments
to reflect this.

Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
Signed-off-by: John Fastabend
Signed-off-by: Daniel Borkmann
Acked-by: Björn Töpel
Acked-by: Song Liu
Link: https://lore.kernel.org/bpf/1580084042-11598-2-git-send-email-john.fastabend@gmail.com

John Fastabend
2020-01-27 18:16:20 +0800

24 Jan, 2020

1 commit

485ec2ea9 bpf, devmap: Pass lockdep expression to RCU lists ... Browse Code »

head is traversed using hlist_for_each_entry_rcu outside an RCU
read-side critical section but under the protection of dtab->index_lock.

Hence, add corresponding lockdep expression to silence false-positive
lockdep warnings, and harden RCU lists.

Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Signed-off-by: Amol Grover
Signed-off-by: Daniel Borkmann
Acked-by: Jesper Dangaard Brouer
Acked-by: Toke Høiland-Jørgensen
Link: https://lore.kernel.org/bpf/20200123120437.26506-1-frextrite@gmail.com

Amol Grover
2020-01-24 06:01:16 +0800

17 Jan, 2020

3 commits

58aa94f92 devmap: Adjust tracepoint for map-less queue flush ... Browse Code »

Now that we don't have a reference to a devmap when flushing the device
bulk queue, let's change the the devmap_xmit tracepoint to remote the
map_id and map_index fields entirely. Rearrange the fields so 'drops' and
'sent' stay in the same position in the tracepoint struct, to make it
possible for the xdp_monitor utility to read both the old and the new
format.

Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/157918768613.1458396.9165902403373826572.stgit@toke.dk

Jesper Dangaard Brouer
2020-01-17 12:03:34 +0800
1d233886d xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths ... Browse Code »

Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
we can re-use the bulking for the non-map version of the bpf_redirect()
helper. This is a simple matter of having xdp_do_redirect_slow() queue the
frame on the bulk queue instead of sending it out with __bpf_tx_xdp().

Unfortunately we can't make the bpf_redirect() helper return an error if
the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
have a reference to the network namespace of the ingress device at the time
the helper is called. So we have to leave it as-is and keep the device
lookup in xdp_do_redirect_slow().

Since this leaves less reason to have the non-map redirect code in a
separate function, so we get rid of the xdp_do_redirect_slow() function
entirely. This does lose us the tracepoint disambiguation, but fortunately
the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint
entry structures. This means both can contain a map index, so we can just
amend the tracepoint definitions so we always emit the xdp_redirect(_err)
tracepoints, but with the map ID only populated if a map is present. This
means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep
the definitions around in case someone is still listening for them.

With this change, the performance of the xdp_redirect sample program goes
from 5Mpps to 8.4Mpps (a 68% increase).

Since the flush functions are no longer map-specific, rename the flush()
functions to drop _map from their names. One of the renamed functions is
the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To
keep from having to update all drivers, use a #define to keep the old name
working, and only update the virtual drivers in this patch.

Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov
Acked-by: John Fastabend
Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk

Toke Høiland-Jørgensen
2020-01-17 12:03:34 +0800
75ccae62c xdp: Move devmap bulk queue into struct net_device ... Browse Code »

Commit 96360004b862 ("xdp: Make devmap flush_list common for all map
instances"), changed devmap flushing to be a global operation instead of a
per-map operation. However, the queue structure used for bulking was still
allocated as part of the containing map.

This patch moves the devmap bulk queue into struct net_device. The
motivation for this is reusing it for the non-map variant of XDP_REDIRECT,
which will be changed in a subsequent commit. To avoid other fields of
struct net_device moving to different cache lines, we also move a couple of
other members around.

We defer the actual allocation of the bulk queue structure until the
NETDEV_REGISTER notification devmap.c. This makes it possible to check for
ndo_xdp_xmit support before allocating the structure, which is not possible
at the time struct net_device is allocated. However, we keep the freeing in
free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER.

Because of this change, we lose the reference back to the map that
originated the redirect, so change the tracepoint to always return 0 as the
map ID and index. Otherwise no functional change is intended with this
patch.

After this patch, the relevant part of struct net_device looks like this,
according to pahole:

/* --- cacheline 14 boundary (896 bytes) --- */
struct netdev_queue * _tx __attribute__((__aligned__(64))); /* 896 8 */
unsigned int num_tx_queues; /* 904 4 */
unsigned int real_num_tx_queues; /* 908 4 */
struct Qdisc * qdisc; /* 912 8 */
unsigned int tx_queue_len; /* 920 4 */
spinlock_t tx_global_lock; /* 924 4 */
struct xdp_dev_bulk_queue * xdp_bulkq; /* 928 8 */
struct xps_dev_maps * xps_cpus_map; /* 936 8 */
struct xps_dev_maps * xps_rxqs_map; /* 944 8 */
struct mini_Qdisc * miniq_egress; /* 952 8 */
/* --- cacheline 15 boundary (960 bytes) --- */
struct hlist_head qdisc_hash[16]; /* 960 128 */
/* --- cacheline 17 boundary (1088 bytes) --- */
struct timer_list watchdog_timer; /* 1088 40 */

/* XXX last struct has 4 bytes of padding */

int watchdog_timeo; /* 1128 4 */

/* XXX 4 bytes hole, try to pack */

struct list_head todo_list; /* 1136 16 */
/* --- cacheline 18 boundary (1152 bytes) --- */

Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov
Acked-by: Björn Töpel
Acked-by: John Fastabend
Link: https://lore.kernel.org/bpf/157918768397.1458396.12673224324627072349.stgit@toke.dk

Toke Høiland-Jørgensen
2020-01-17 12:03:34 +0800

20 Dec, 2019

2 commits

96360004b xdp: Make devmap flush_list common for all map instances ... Browse Code »

The devmap flush list is used to track entries that need to flushed
from via the xdp_do_flush_map() function. This list used to be
per-map, but there is really no reason for that. Instead make the
flush list global for all devmaps, which simplifies __dev_map_flush()
and dev_map_init_map().

Signed-off-by: Björn Töpel
Signed-off-by: Alexei Starovoitov
Acked-by: Toke Høiland-Jørgensen
Link: https://lore.kernel.org/bpf/20191219061006.21980-6-bjorn.topel@gmail.com

Björn Töpel
2019-12-20 13:09:43 +0800
0536b8523 xdp: Simplify devmap cleanup ... Browse Code »

After the RCU flavor consolidation [1], call_rcu() and
synchronize_rcu() waits for preempt-disable regions (NAPI) in addition
to the read-side critical sections. As a result of this, the cleanup
code in devmap can be simplified

* There is no longer a need to flush in __dev_map_entry_free, since we
know that this has been done when the call_rcu() callback is
triggered.

* When freeing the map, there is no need to explicitly wait for a
flush. It's guaranteed to be done after the synchronize_rcu() call
in dev_map_free(). The rcu_barrier() is still needed, so that the
map is not freed prior the elements.

[1] https://lwn.net/Articles/777036/

Signed-off-by: Björn Töpel
Signed-off-by: Alexei Starovoitov
Acked-by: Toke Høiland-Jørgensen
Link: https://lore.kernel.org/bpf/20191219061006.21980-2-bjorn.topel@gmail.com

Björn Töpel
2019-12-20 13:09:43 +0800

25 Nov, 2019

1 commit

071cdecec xdp: Fix cleanup on map free for devmap_hash map type ... Browse Code »

Tetsuo pointed out that it was not only the device unregister hook that was
broken for devmap_hash types, it was also cleanup on map free. So better
fix this as well.

While we're at it, there's no reason to allocate the netdev_map array for
DEVMAP_HASH, so skip that and adjust the cost accordingly.

Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Reported-by: Tetsuo Handa
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov
Acked-by: John Fastabend
Link: https://lore.kernel.org/bpf/20191121133612.430414-1-toke@redhat.com

Toke Høiland-Jørgensen
2019-11-25 08:58:46 +0800

22 Oct, 2019

1 commit

ce197d83a xdp: Handle device unregister for devmap_hash map type ... Browse Code »

It seems I forgot to add handling of devmap_hash type maps to the device
unregister hook for devmaps. This omission causes devices to not be
properly released, which causes hangs.

Fix this by adding the missing handler.

Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Reported-by: Tetsuo Handa
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov
Acked-by: Martin KaFai Lau
Link: https://lore.kernel.org/bpf/20191019111931.2981954-1-toke@redhat.com

Toke Høiland-Jørgensen
2019-10-22 06:51:41 +0800

19 Oct, 2019

1 commit

05679ca6f xdp: Prevent overflow in devmap_hash cost calculation for 32-bit builds ... Browse Code »

Tetsuo pointed out that without an explicit cast, the cost calculation for
devmap_hash type maps could overflow on 32-bit builds. This adds the
missing cast.

Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Reported-by: Tetsuo Handa
Signed-off-by: Toke Høiland-Jørgensen
Signed-off-by: Alexei Starovoitov
Acked-by: Yonghong Song
Link: https://lore.kernel.org/bpf/20191017105702.2807093-1-toke@redhat.com

Toke Høiland-Jørgensen
2019-10-19 07:18:42 +0800

16 Sep, 2019

1 commit

af58e7ee6 xdp: Fix race in dev_map_hash_update_elem() when replacing element ... Browse Code »

syzbot found a crash in dev_map_hash_update_elem(), when replacing an
element with a new one. Jesper correctly identified the cause of the crash
as a race condition between the initial lookup in the map (which is done
before taking the lock), and the removal of the old element.

Rather than just add a second lookup into the hashmap after taking the
lock, fix this by reworking the function logic to take the lock before the
initial lookup.

Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Reported-and-tested-by: syzbot+4e7a85b1432052e8d6f8@syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen
Acked-by: Jesper Dangaard Brouer
Signed-off-by: Daniel Borkmann

Toke Høiland-Jørgensen
2019-09-16 16:19:51 +0800

30 Jul, 2019

2 commits

6f9d451ab xdp: Add devmap_hash map type for looking up devices by hashed index ... Browse Code »

A common pattern when using xdp_redirect_map() is to create a device map
where the lookup key is simply ifindex. Because device maps are arrays,
this leaves holes in the map, and the map has to be sized to fit the
largest ifindex, regardless of how many devices actually are actually
needed in the map.

This patch adds a second type of device map where the key is looked up
using a hashmap, instead of being used as an array index. This allows maps
to be densely packed, so they can be smaller.

Signed-off-by: Toke Høiland-Jørgensen
Acked-by: Yonghong Song
Acked-by: Jesper Dangaard Brouer
Signed-off-by: Alexei Starovoitov

Toke Høiland-Jørgensen
2019-07-30 04:50:48 +0800
fca16e510 xdp: Refactor devmap allocation code for reuse ... Browse Code »

The subsequent patch to add a new devmap sub-type can re-use much of the
initialisation and allocation code, so refactor it into separate functions.

Signed-off-by: Toke Høiland-Jørgensen
Acked-by: Yonghong Song
Acked-by: Jesper Dangaard Brouer
Signed-off-by: Alexei Starovoitov

Toke Høiland-Jørgensen
2019-07-30 04:50:48 +0800

29 Jun, 2019

2 commits

0cdbb4b09 devmap: Allow map lookups from eBPF ... Browse Code »

We don't currently allow lookups into a devmap from eBPF, because the map
lookup returns a pointer directly to the dev->ifindex, which shouldn't be
modifiable from eBPF.

However, being able to do lookups in devmaps is useful to know (e.g.)
whether forwarding to a specific interface is enabled. Currently, programs
work around this by keeping a shadow map of another type which indicates
whether a map index is valid.

Since we now have a flag to make maps read-only from the eBPF side, we can
simply lift the lookup restriction if we make sure this flag is always set.

Signed-off-by: Toke Høiland-Jørgensen
Acked-by: Jonathan Lemon
Acked-by: Andrii Nakryiko
Signed-off-by: Daniel Borkmann

Toke Høiland-Jørgensen
2019-06-29 07:31:09 +0800
d5df2830c devmap/cpumap: Use flush list instead of bitmap ... Browse Code »

The socket map uses a linked list instead of a bitmap to keep track of
which entries to flush. Do the same for devmap and cpumap, as this means we
don't have to care about the map index when enqueueing things into the
map (and so we can cache the map lookup).

Signed-off-by: Toke Høiland-Jørgensen
Acked-by: Jonathan Lemon
Acked-by: Andrii Nakryiko
Signed-off-by: Daniel Borkmann

Toke Høiland-Jørgensen
2019-06-29 07:31:08 +0800

20 Jun, 2019

1 commit

dca73a65a Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next ... Browse Code »

Alexei Starovoitov says:

====================
pull-request: bpf-next 2019-06-19

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) new SO_REUSEPORT_DETACH_BPF setsocktopt, from Martin.

2) BTF based map definition, from Andrii.

3) support bpf_map_lookup_elem for xskmap, from Jonathan.

4) bounded loops and scalar precision logic in the verifier, from Alexei.
====================

Signed-off-by: David S. Miller

David S. Miller
2019-06-20 12:06:27 +0800

18 Jun, 2019

2 commits

13091aa30 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Honestly all the conflicts were simple overlapping changes,
nothing really interesting to report.

Signed-off-by: David S. Miller

David S. Miller
2019-06-18 11:20:36 +0800
da0f38202 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:
"Lots of bug fixes here:

1) Out of bounds access in __bpf_skc_lookup, from Lorenz Bauer.

2) Fix rate reporting in cfg80211_calculate_bitrate_he(), from John
Crispin.

3) Use after free in psock backlog workqueue, from John Fastabend.

4) Fix source port matching in fdb peer flow rule of mlx5, from Raed
Salem.

5) Use atomic_inc_not_zero() in fl6_sock_lookup(), from Eric Dumazet.

6) Network header needs to be set for packet redirect in nfp, from
John Hurley.

7) Fix udp zerocopy refcnt, from Willem de Bruijn.

8) Don't assume linear buffers in vxlan and geneve error handlers,
from Stefano Brivio.

9) Fix TOS matching in mlxsw, from Jiri Pirko.

10) More SCTP cookie memory leak fixes, from Neil Horman.

11) Fix VLAN filtering in rtl8366, from Linus Walluij.

12) Various TCP SACK payload size and fragmentation memory limit fixes
from Eric Dumazet.

13) Use after free in pneigh_get_next(), also from Eric Dumazet.

14) LAPB control block leak fix from Jeremy Sowden"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (145 commits)
lapb: fixed leak of control-blocks.
tipc: purge deferredq list for each grp member in tipc_group_delete
ax25: fix inconsistent lock state in ax25_destroy_timer
neigh: fix use-after-free read in pneigh_get_next
tcp: fix compile error if !CONFIG_SYSCTL
hv_sock: Suppress bogus "may be used uninitialized" warnings
be2net: Fix number of Rx queues used for flow hashing
net: handle 802.1P vlan 0 packets properly
tcp: enforce tcp_min_snd_mss in tcp_mtu_probing()
tcp: add tcp_min_snd_mss sysctl
tcp: tcp_fragment() should apply sane memory limits
tcp: limit payload size of sacked skbs
Revert "net: phylink: set the autoneg state in phylink_phy_change"
bpf: fix nested bpf tracepoints with per-cpu data
bpf: Fix out of bounds memory access in bpf_sk_storage
vsock/virtio: set SOCK_DONE on peer shutdown
net: dsa: rtl8366: Fix up VLAN filtering
net: phylink: set the autoneg state in phylink_phy_change
net: add high_order_alloc_disable sysctl/static key
tcp: add tcp_tx_skb_cache sysctl
...

Linus Torvalds
2019-06-18 06:55:34 +0800

15 Jun, 2019

3 commits

86723c864 bpf, devmap: Add missing RCU read lock on flush ... Browse Code »

.ndo_xdp_xmit() assumes it is called under RCU. For example virtio_net
uses RCU to detect it has setup the resources for tx. The assumption
accidentally broke when introducing bulk queue in devmap.

Fixes: 5d053f9da431 ("bpf: devmap prepare xdp frames for bulking")
Reported-by: David Ahern
Signed-off-by: Toshiaki Makita
Signed-off-by: Daniel Borkmann

Toshiaki Makita
2019-06-15 06:58:51 +0800
edabf4d9d bpf, devmap: Add missing bulk queue free ... Browse Code »

dev_map_free() forgot to free bulk queue when freeing its entries.

Fixes: 5d053f9da431 ("bpf: devmap prepare xdp frames for bulking")
Signed-off-by: Toshiaki Makita
Acked-by: Jesper Dangaard Brouer
Signed-off-by: Daniel Borkmann

Toshiaki Makita
2019-06-15 06:58:47 +0800
d4dd153d5 bpf, devmap: Fix premature entry free on destroying map ... Browse Code »

dev_map_free() waits for flush_needed bitmap to be empty in order to
ensure all flush operations have completed before freeing its entries.
However the corresponding clear_bit() was called before using the
entries, so the entries could be used after free.

All access to the entries needs to be done before clearing the bit.
It seems commit a5e2da6e9787 ("bpf: netdev is never null in
__dev_map_flush") accidentally changed the clear_bit() and memory access
order.

Note that the problem happens only in __dev_map_flush(), not in
dev_map_flush_old(). dev_map_flush_old() is called only after nulling
out the corresponding netdev_map entry, so dev_map_free() never frees
the entry thus no such race happens there.

Fixes: a5e2da6e9787 ("bpf: netdev is never null in __dev_map_flush")
Signed-off-by: Toshiaki Makita
Signed-off-by: Daniel Borkmann

Toshiaki Makita
2019-06-15 06:58:42 +0800

05 Jun, 2019

1 commit

5b497af42 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295 ... Browse Code »

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of version 2 of the gnu general public license as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 64 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Alexios Zavras
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-06-05 23:36:38 +0800

04 Jun, 2019

1 commit

6685699e4 bpf: remove redundant assignment to err ... Browse Code »

The variable err is assigned with the value -EINVAL that is never
read and it is re-assigned a new value later on. The assignment is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King
Acked-by: Jesper Dangaard Brouer
Signed-off-by: Daniel Borkmann

Colin Ian King
2019-06-04 22:57:07 +0800

01 Jun, 2019

3 commits

c85d69135 bpf: move memory size checks to bpf_map_charge_init() ... Browse Code »

Most bpf map types doing similar checks and bytes to pages
conversion during memory allocation and charging.

Let's unify these checks by moving them into bpf_map_charge_init().

Signed-off-by: Roman Gushchin
Acked-by: Song Liu
Signed-off-by: Alexei Starovoitov

Roman Gushchin
2019-06-01 07:52:56 +0800
b936ca643 bpf: rework memlock-based memory accounting for maps ... Browse Code »

In order to unify the existing memlock charging code with the
memcg-based memory accounting, which will be added later, let's
rework the current scheme.

Currently the following design is used:
1) .alloc() callback optionally checks if the allocation will likely
succeed using bpf_map_precharge_memlock()
2) .alloc() performs actual allocations
3) .alloc() callback calculates map cost and sets map.memory.pages
4) map_create() calls bpf_map_init_memlock() which sets map.memory.user
and performs actual charging; in case of failure the map is
destroyed

1) bpf_map_free_deferred() calls bpf_map_release_memlock(), which
performs uncharge and releases the user
2) .map_free() callback releases the memory

The scheme can be simplified and made more robust:
1) .alloc() calculates map cost and calls bpf_map_charge_init()
2) bpf_map_charge_init() sets map.memory.user and performs actual
charge
3) .alloc() performs actual allocations

1) .map_free() callback releases the memory
2) bpf_map_charge_finish() performs uncharge and releases the user

The new scheme also allows to reuse bpf_map_charge_init()/finish()
functions for memcg-based accounting. Because charges are performed
before actual allocations and uncharges after freeing the memory,
no bogus memory pressure can be created.

In cases when the map structure is not available (e.g. it's not
created yet, or is already destroyed), on-stack bpf_map_memory
structure is used. The charge can be transferred with the
bpf_map_charge_move() function.

Signed-off-by: Roman Gushchin
Acked-by: Song Liu
Signed-off-by: Alexei Starovoitov

Roman Gushchin
2019-06-01 07:52:56 +0800
3539b96e0 bpf: group memory related fields in struct bpf_map_memory ... Browse Code »

Group "user" and "pages" fields of bpf_map into the bpf_map_memory
structure. Later it can be extended with "memcg" and other related
information.

The main reason for a such change (beside cosmetics) is to pass
bpf_map_memory structure to charging functions before the actual
allocation of bpf_map.

Signed-off-by: Roman Gushchin
Acked-by: Song Liu
Signed-off-by: Alexei Starovoitov

Roman Gushchin
2019-06-01 07:52:56 +0800

14 May, 2019

1 commit

2baae3545 bpf: devmap: fix use-after-free Read in __dev_map_entry_free ... Browse Code »

synchronize_rcu() is fine when the rcu callbacks only need
to free memory (kfree_rcu() or direct kfree() call rcu call backs)

__dev_map_entry_free() is a bit more complex, so we need to make
sure that call queued __dev_map_entry_free() callbacks have completed.

sysbot report:

BUG: KASAN: use-after-free in dev_map_flush_old kernel/bpf/devmap.c:365
[inline]
BUG: KASAN: use-after-free in __dev_map_entry_free+0x2a8/0x300
kernel/bpf/devmap.c:379
Read of size 8 at addr ffff8801b8da38c8 by task ksoftirqd/1/18

CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.17.0+ #39
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
print_address_description+0x6c/0x20b mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
dev_map_flush_old kernel/bpf/devmap.c:365 [inline]
__dev_map_entry_free+0x2a8/0x300 kernel/bpf/devmap.c:379
__rcu_reclaim kernel/rcu/rcu.h:178 [inline]
rcu_do_batch kernel/rcu/tree.c:2558 [inline]
invoke_rcu_callbacks kernel/rcu/tree.c:2818 [inline]
__rcu_process_callbacks kernel/rcu/tree.c:2785 [inline]
rcu_process_callbacks+0xe9d/0x1760 kernel/rcu/tree.c:2802
__do_softirq+0x2e0/0xaf5 kernel/softirq.c:284
run_ksoftirqd+0x86/0x100 kernel/softirq.c:645
smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164
kthread+0x345/0x410 kernel/kthread.c:240
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412

Allocated by task 6675:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
kmalloc include/linux/slab.h:513 [inline]
kzalloc include/linux/slab.h:706 [inline]
dev_map_alloc+0x208/0x7f0 kernel/bpf/devmap.c:102
find_and_alloc_map kernel/bpf/syscall.c:129 [inline]
map_create+0x393/0x1010 kernel/bpf/syscall.c:453
__do_sys_bpf kernel/bpf/syscall.c:2351 [inline]
__se_sys_bpf kernel/bpf/syscall.c:2328 [inline]
__x64_sys_bpf+0x303/0x510 kernel/bpf/syscall.c:2328
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 26:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
__cache_free mm/slab.c:3498 [inline]
kfree+0xd9/0x260 mm/slab.c:3813
dev_map_free+0x4fa/0x670 kernel/bpf/devmap.c:191
bpf_map_free_deferred+0xba/0xf0 kernel/bpf/syscall.c:262
process_one_work+0xc64/0x1b70 kernel/workqueue.c:2153
worker_thread+0x181/0x13a0 kernel/workqueue.c:2296
kthread+0x345/0x410 kernel/kthread.c:240
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412

The buggy address belongs to the object at ffff8801b8da37c0
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 264 bytes inside of
512-byte region [ffff8801b8da37c0, ffff8801b8da39c0)
The buggy address belongs to the page:
page:ffffea0006e368c0 count:1 mapcount:0 mapping:ffff8801da800940
index:0xffff8801b8da3540
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffffea0007217b88 ffffea0006e30cc8 ffff8801da800940
raw: ffff8801b8da3540 ffff8801b8da3040 0000000100000004 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8801b8da3780: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ffff8801b8da3800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8801b8da3880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8801b8da3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8801b8da3980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc

Fixes: 546ac1ffb70d ("bpf: add devmap, a map for storing net device references")
Signed-off-by: Eric Dumazet
Reported-by: syzbot+457d3e2ffbcf31aee5c0@syzkaller.appspotmail.com
Acked-by: Toke Høiland-Jørgensen
Acked-by: Jesper Dangaard Brouer
Signed-off-by: Daniel Borkmann

Eric Dumazet
2019-05-14 07:25:49 +0800