14 Mar, 2020
13 commits
-
Daniel Borkmann says:
====================
pull-request: bpf-next 2020-03-13The following pull-request contains BPF updates for your *net-next* tree.
We've added 86 non-merge commits during the last 12 day(s) which contain
a total of 107 files changed, 5771 insertions(+), 1700 deletions(-).The main changes are:
1) Add modify_return attach type which allows to attach to a function via
BPF trampoline and is run after the fentry and before the fexit programs
and can pass a return code to the original caller, from KP Singh.2) Generalize BPF's kallsyms handling and add BPF trampoline and dispatcher
objects to be visible in /proc/kallsyms so they can be annotated in
stack traces, from Jiri Olsa.3) Extend BPF sockmap to allow for UDP next to existing TCP support in order
in order to enable this for BPF based socket dispatch, from Lorenz Bauer.4) Introduce a new bpftool 'prog profile' command which attaches to existing
BPF programs via fentry and fexit hooks and reads out hardware counters
during that period, from Song Liu. Example usage:bpftool prog profile id 337 duration 3 cycles instructions llc_misses
4228 run_cnt
3403698 cycles (84.08%)
3525294 instructions # 1.04 insn per cycle (84.05%)
13 llc_misses # 3.69 LLC misses per million isns (83.50%)5) Batch of improvements to libbpf, bpftool and BPF selftests. Also addition
of a new bpf_link abstraction to keep in particular BPF tracing programs
attached even when the applicaion owning them exits, from Andrii Nakryiko.6) New bpf_get_current_pid_tgid() helper for tracing to perform PID filtering
and which returns the PID as seen by the init namespace, from Carlos Neira.7) Refactor of RISC-V JIT code to move out common pieces and addition of a
new RV32G BPF JIT compiler, from Luke Nelson.8) Add gso_size context member to __sk_buff in order to be able to know whether
a given skb is GSO or not, from Willem de Bruijn.9) Add a new bpf_xdp_output() helper which reuses XDP's existing perf RB output
implementation but can be called from tracepoint programs, from Eelco Chaudron.
====================Signed-off-by: David S. Miller
-
Sparse reports a warning at __bpf_prog_enter() and __bpf_prog_exit()
warning: context imbalance in __bpf_prog_enter() - wrong count at exit
warning: context imbalance in __bpf_prog_exit() - unexpected unlockThe root cause is the missing annotation at __bpf_prog_enter()
and __bpf_prog_exit()Add the missing __acquires(RCU) annotation
Add the missing __releases(RCU) annotationSigned-off-by: Jules Irenge
Signed-off-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/20200311010908.42366-2-jbi.octave@gmail.com -
Now that we have all the objects (bpf_prog, bpf_trampoline,
bpf_dispatcher) linked in bpf_tree, there's no need to have
separate bpf_image tree for images.Reverting the bpf_image tree together with struct bpf_image,
because it's no longer needed.Also removing bpf_image_alloc function and adding the original
bpf_jit_alloc_exec_page interface instead.The kernel_text_address function can now rely only on is_bpf_text_address,
because it checks the bpf_tree that contains all the objects.Keeping bpf_image_ksym_add and bpf_image_ksym_del because they are
useful wrappers with perf's ksymbol interface calls.Signed-off-by: Jiri Olsa
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200312195610.346362-13-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov -
Adding dispatchers to kallsyms. It's displayed as
bpf_dispatcher_where NAME is the name of dispatcher.
Signed-off-by: Jiri Olsa
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200312195610.346362-12-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov -
Adding trampolines to kallsyms. It's displayed as
bpf_trampoline_ [bpf]where ID is the BTF id of the trampoline function.
Adding bpf_image_ksym_add/del functions that setup
the start/end values and call KSYMBOL perf events
handlers.Signed-off-by: Jiri Olsa
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200312195610.346362-11-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov -
Separating /proc/kallsyms add/del code and adding bpf_ksym_add/del
functions for that.Moving bpf_prog_ksym_node_add/del functions to __bpf_ksym_add/del
and changing their argument to 'struct bpf_ksym' object. This way
we can call them for other bpf objects types like trampoline and
dispatcher.Signed-off-by: Jiri Olsa
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200312195610.346362-10-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov -
Adding 'prog' bool flag to 'struct bpf_ksym' to mark that
this object belongs to bpf_prog object.This change allows having bpf_prog objects together with
other types (trampolines and dispatchers) in the single
bpf_tree. It's used when searching for bpf_prog exception
tables by the bpf_prog_ksym_find function, where we need
to get the bpf_prog pointer.>From now we can safely add bpf_ksym support for trampoline
or dispatcher objects, because we can differentiate them
from bpf_prog objects.Signed-off-by: Jiri Olsa
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200312195610.346362-9-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov -
Adding bpf_ksym_find function that is used bpf bpf address
lookup functions:
__bpf_address_lookup
is_bpf_text_addresswhile keeping bpf_prog_kallsyms_find to be used only for lookup
of bpf_prog objects (will happen in following changes).Signed-off-by: Jiri Olsa
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200312195610.346362-8-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov -
Moving ksym_tnode list node to 'struct bpf_ksym' object,
so the symbol itself can be chained and used in other
objects like bpf_trampoline and bpf_dispatcher.We need bpf_ksym object to be linked both in bpf_kallsyms
via lnode for /proc/kallsyms and in bpf_tree via tnode for
bpf address lookup functions like __bpf_address_lookup or
bpf_prog_kallsyms_find.Signed-off-by: Jiri Olsa
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200312195610.346362-7-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov -
Adding lnode list node to 'struct bpf_ksym' object,
so the struct bpf_ksym itself can be chained and used
in other objects like bpf_trampoline and bpf_dispatcher.Changing iterator to bpf_ksym in bpf_get_kallsym function.
The ksym->start is holding the prog->bpf_func value,
so it's ok to use it as value in bpf_get_kallsym.Signed-off-by: Jiri Olsa
Signed-off-by: Alexei Starovoitov
Acked-by: Song Liu
Link: https://lore.kernel.org/bpf/20200312195610.346362-6-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov -
Adding name to 'struct bpf_ksym' object to carry the name
of the symbol for bpf_prog, bpf_trampoline, bpf_dispatcher
objects.The current benefit is that name is now generated only when
the symbol is added to the list, so we don't need to generate
it every time it's accessed.The future benefit is that we will have all the bpf objects
symbols represented by struct bpf_ksym.Signed-off-by: Jiri Olsa
Signed-off-by: Alexei Starovoitov
Acked-by: Song Liu
Link: https://lore.kernel.org/bpf/20200312195610.346362-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov -
Adding 'struct bpf_ksym' object that will carry the
kallsym information for bpf symbol. Adding the start
and end address to begin with. It will be used by
bpf_prog, bpf_trampoline, bpf_dispatcher objects.The symbol_start/symbol_end values were originally used
to sort bpf_prog objects. For the address displayed in
/proc/kallsyms we are using prog->bpf_func value.I'm using the bpf_func value for program symbol start
instead of the symbol_start, because it makes no difference
for sorting bpf_prog objects and we can use it directly as
an address to display it in /proc/kallsyms.Signed-off-by: Jiri Olsa
Signed-off-by: Alexei Starovoitov
Acked-by: Song Liu
Link: https://lore.kernel.org/bpf/20200312195610.346362-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov -
Instead of requiring users to do three steps for cleaning up bpf_link, its
anon_inode file, and unused fd, abstract that away into bpf_link_cleanup()
helper. bpf_link_defunct() is removed, as it shouldn't be needed as an
individual operation anymore.v1->v2:
- keep bpf_link_cleanup() static for now (Daniel).Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Acked-by: Martin KaFai Lau
Link: https://lore.kernel.org/bpf/20200313002128.2028680-1-andriin@fb.com
Signed-off-by: Alexei Starovoitov
13 Mar, 2020
4 commits
-
Minor overlapping changes, nothing serious.
Signed-off-by: David S. Miller
-
Introduce new helper that reuses existing xdp perf_event output
implementation, but can be called from raw_tracepoint programs
that receive 'struct xdp_buff *' as a tracepoint argument.Signed-off-by: Eelco Chaudron
Signed-off-by: Alexei Starovoitov
Acked-by: John Fastabend
Acked-by: Toke Høiland-Jørgensen
Link: https://lore.kernel.org/bpf/158348514556.2239.11050972434793741444.stgit@xdp-tutorial -
New bpf helper bpf_get_ns_current_pid_tgid,
This helper will return pid and tgid from current task
which namespace matches dev_t and inode number provided,
this will allows us to instrument a process inside a container.Signed-off-by: Carlos Neira
Signed-off-by: Alexei Starovoitov
Acked-by: Yonghong Song
Link: https://lore.kernel.org/bpf/20200304204157.58695-3-cneirabustos@gmail.com -
Pull networking fixes from David Miller:
"It looks like a decent sized set of fixes, but a lot of these are one
liner off-by-one and similar type changes:1) Fix netlink header pointer to calcular bad attribute offset
reported to user. From Pablo Neira Ayuso.2) Don't double clear PHY interrupts when ->did_interrupt is set,
from Heiner Kallweit.3) Add missing validation of various (devlink, nl802154, fib, etc.)
attributes, from Jakub Kicinski.4) Missing *pos increments in various netfilter seq_next ops, from
Vasily Averin.5) Missing break in of_mdiobus_register() loop, from Dajun Jin.
6) Don't double bump tx_dropped in veth driver, from Jiang Lidong.
7) Work around FMAN erratum A050385, from Madalin Bucur.
8) Make sure ARP header is pulled early enough in bonding driver,
from Eric Dumazet.9) Do a cond_resched() during multicast processing of ipvlan and
macvlan, from Mahesh Bandewar.10) Don't attach cgroups to unrelated sockets when in interrupt
context, from Shakeel Butt.11) Fix tpacket ring state management when encountering unknown GSO
types. From Willem de Bruijn.12) Fix MDIO bus PHY resume by checking mdio_bus_phy_may_suspend()
only in the suspend context. From Heiner Kallweit"* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
net: systemport: fix index check to avoid an array out of bounds access
tc-testing: add ETS scheduler to tdc build configuration
net: phy: fix MDIO bus PM PHY resuming
net: hns3: clear port base VLAN when unload PF
net: hns3: fix RMW issue for VLAN filter switch
net: hns3: fix VF VLAN table entries inconsistent issue
net: hns3: fix "tc qdisc del" failed issue
taprio: Fix sending packets without dequeueing them
net: mvmdio: avoid error message for optional IRQ
net: dsa: mv88e6xxx: Add missing mask of ATU occupancy register
net: memcg: fix lockdep splat in inet_csk_accept()
s390/qeth: implement smarter resizing of the RX buffer pool
s390/qeth: refactor buffer pool code
s390/qeth: use page pointers to manage RX buffer pool
seg6: fix SRv6 L2 tunnels to use IANA-assigned protocol number
net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed
net/packet: tpacket_rcv: do not increment ring index on drop
sxgbe: Fix off by one in samsung driver strncpy size arg
net: caif: Add lockdep expression to RCU traversal primitive
MAINTAINERS: remove Sathya Perla as Emulex NIC maintainer
...
12 Mar, 2020
2 commits
-
Pull thread fix from Christian Brauner:
"This contains a single fix for a regression which was introduced when
we introduced the ability to select a specific pid at process creation
time.When this feature is requested, the error value will be set to -EPERM
after exiting the pid allocation loop. This caused EPERM to be
returned when e.g. the init process/child subreaper of the pid
namespace has already died where we used to return ENOMEM before.The first patch here simply fixes the regression by unconditionally
setting the return value back to ENOMEM again once we've successfully
allocated the requested pid number. This should be easy to backport to
v5.5.The second patch adds a comment explaining that we must keep returning
ENOMEM since we've been doing it for a long time and have explicitly
documented this behavior for userspace. This seemed worthwhile because
we now have at least two separate example where people tried to change
the return value to something other than ENOMEM (The first version of
the regression fix did that too and the commit message links to an
earlier patch that tried to do the same.).I have a simple regression test to make sure we catch this regression
in the future but since that introduces a whole new selftest subdir
and test files I'll keep this for v5.7"* tag 'for-linus-2020-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
pid: make ENOMEM return value more obvious
pid: Fix error return value in some cases -
Pull ftrace fix from Steven Rostedt:
"Have ftrace lookup_rec() return a consistent record otherwise it can
break live patching"* tag 'trace-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Return the first found result in lookup_rec()
11 Mar, 2020
5 commits
-
It appears that ip ranges can overlap so. In that case lookup_rec()
returns whatever results it got last even if it found nothing in last
searched page.This breaks an obscure livepatch late module patching usecase:
- load livepatch
- load the patched module
- unload livepatch
- try to load livepatch againTo fix this return from lookup_rec() as soon as it found the record
containing searched-for ip. This used to be this way prior lookup_rec()
introduction.Link: http://lkml.kernel.org/r/20200306174317.21699-1-asavkov@redhat.com
Cc: stable@vger.kernel.org
Fixes: 7e16f581a817 ("ftrace: Separate out functionality from ftrace_location_range()")
Signed-off-by: Artem Savkov
Signed-off-by: Steven Rostedt (VMware) -
Add bpf_link_new_file() API for cases when we need to ensure anon_inode is
successfully created before we proceed with expensive BPF program attachment
procedure, which will require equally (if not more so) expensive and
potentially failing compensation detachment procedure just because anon_inode
creation failed. This API allows to simplify code by ensuring first that
anon_inode is created and after BPF program is attached proceed with
fd_install() that can't fail.After anon_inode file is created, link can't be just kfree()'d anymore,
because its destruction will be performed by deferred file_operations->release
call. For this, bpf_link API required specifying two separate operations:
release() and dealloc(), former performing detachment only, while the latter
frees memory used by bpf_link itself. dealloc() needs to be specified, because
struct bpf_link is frequently embedded into link type-specific container
struct (e.g., struct bpf_raw_tp_link), so bpf_link itself doesn't know how to
properly free the memory. In case when anon_inode file was successfully
created, but subsequent BPF attachment failed, bpf_link needs to be marked as
"defunct", so that file's release() callback will perform only memory
deallocation, but no detachment.Convert raw tracepoint and tracing attachment to new API and eliminate
detachment from error handling path.Signed-off-by: Andrii Nakryiko
Signed-off-by: Daniel Borkmann
Acked-by: John Fastabend
Link: https://lore.kernel.org/bpf/20200309231051.1270337-1-andriin@fb.com -
We are testing network memory accounting in our setup and noticed
inconsistent network memory usage and often unrelated cgroups network
usage correlates with testing workload. On further inspection, it
seems like mem_cgroup_sk_alloc() and cgroup_sk_alloc() are broken in
irq context specially for cgroup v1.mem_cgroup_sk_alloc() and cgroup_sk_alloc() can be called in irq context
and kind of assumes that this can only happen from sk_clone_lock()
and the source sock object has already associated cgroup. However in
cgroup v1, where network memory accounting is opt-in, the source sock
can be unassociated with any cgroup and the new cloned sock can get
associated with unrelated interrupted cgroup.Cgroup v2 can also suffer if the source sock object was created by
process in the root cgroup or if sk_alloc() is called in irq context.
The fix is to just do nothing in interrupt.WARNING: Please note that about half of the TCP sockets are allocated
from the IRQ context, so, memory used by such sockets will not be
accouted by the memcg.The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
CPU: 70 PID: 12720 Comm: ssh Tainted: 5.6.0-smp-DEV #1
Hardware name: ...
Call Trace:
dump_stack+0x57/0x75
mem_cgroup_sk_alloc+0xe9/0xf0
sk_clone_lock+0x2a7/0x420
inet_csk_clone_lock+0x1b/0x110
tcp_create_openreq_child+0x23/0x3b0
tcp_v6_syn_recv_sock+0x88/0x730
tcp_check_req+0x429/0x560
tcp_v6_rcv+0x72d/0xa40
ip6_protocol_deliver_rcu+0xc9/0x400
ip6_input+0x44/0xd0
? ip6_protocol_deliver_rcu+0x400/0x400
ip6_rcv_finish+0x71/0x80
ipv6_rcv+0x5b/0xe0
? ip6_sublist_rcv+0x2e0/0x2e0
process_backlog+0x108/0x1e0
net_rx_action+0x26b/0x460
__do_softirq+0x104/0x2a6
do_softirq_own_stack+0x2a/0x40
do_softirq.part.19+0x40/0x50
__local_bh_enable_ip+0x51/0x60
ip6_finish_output2+0x23d/0x520
? ip6table_mangle_hook+0x55/0x160
__ip6_finish_output+0xa1/0x100
ip6_finish_output+0x30/0xd0
ip6_output+0x73/0x120
? __ip6_finish_output+0x100/0x100
ip6_xmit+0x2e3/0x600
? ipv6_anycast_cleanup+0x50/0x50
? inet6_csk_route_socket+0x136/0x1e0
? skb_free_head+0x1e/0x30
inet6_csk_xmit+0x95/0xf0
__tcp_transmit_skb+0x5b4/0xb20
__tcp_send_ack.part.60+0xa3/0x110
tcp_send_ack+0x1d/0x20
tcp_rcv_state_process+0xe64/0xe80
? tcp_v6_connect+0x5d1/0x5f0
tcp_v6_do_rcv+0x1b1/0x3f0
? tcp_v6_do_rcv+0x1b1/0x3f0
__release_sock+0x7f/0xd0
release_sock+0x30/0xa0
__inet_stream_connect+0x1c3/0x3b0
? prepare_to_wait+0xb0/0xb0
inet_stream_connect+0x3b/0x60
__sys_connect+0x101/0x120
? __sys_getsockopt+0x11b/0x140
__x64_sys_connect+0x1a/0x20
do_syscall_64+0x51/0x200
entry_SYSCALL_64_after_hwframe+0x44/0xa9The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
Fixes: 2d7580738345 ("mm: memcontrol: consolidate cgroup socket tracking")
Fixes: d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
Signed-off-by: Shakeel Butt
Reviewed-by: Roman Gushchin
Signed-off-by: David S. Miller -
Pull cgroup fixes from Tejun Heo:
- cgroup.procs listing related fixes.
It didn't interlock properly with exiting tasks leaving a short
window where a cgroup has empty cgroup.procs but still can't be
removed and misbehaved on short reads.- psi_show() crash fix on 32bit ino archs
- Empty release_agent handling fix
* 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup1: don't call release_agent when it is ""
cgroup: fix psi_show() crash on 32bit ino archs
cgroup: Iterate tasks that did not finish do_exit()
cgroup: cgroup_procs_next should increase position index
cgroup-v1: cgroup_pidlist_next should update position index -
Pull workqueue fixes from Tejun Heo:
"Workqueue has been incorrectly round-robining per-cpu work items.
Hillf's patch fixes that.The other patch documents memory-ordering properties of workqueue
operations"* 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: don't use wq_select_unbound_cpu() for bound works
workqueue: Document (some) memory-ordering properties of {queue,schedule}_work()
10 Mar, 2020
2 commits
-
wq_select_unbound_cpu() is designed for unbound workqueues only, but
it's wrongly called when using a bound workqueue too.Fixing this ensures work queued to a bound workqueue with
cpu=WORK_CPU_UNBOUND always runs on the local CPU.Before, that would happen only if wq_unbound_cpumask happened to include
it (likely almost always the case), or was empty, or we got lucky with
forced round-robin placement. So restricting
/sys/devices/virtual/workqueue/cpumask to a small subset of a machine's
CPUs would cause some bound work items to run unexpectedly there.Fixes: ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs")
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: Hillf Danton
[dj: massage changelog]
Signed-off-by: Daniel Jordan
Cc: Tejun Heo
Cc: Lai Jiangshan
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tejun Heo -
The alloc_pid() codepath used to be simpler. With the introducation of the
ability to choose specific pids in 49cb2fc42ce4 ("fork: extend clone3() to
support setting a PID") it got more complex. It hasn't been super obvious
that ENOMEM is returned when the pid namespace init process/child subreaper
of the pid namespace has died. As can be seen from multiple attempts to
improve this see e.g. [1] and most recently [2].
We regressed returning ENOMEM in [3] and [2] restored it. Let's add a
comment on top explaining that this is historic and documented behavior and
cannot easily be changed.[1]: 35f71bc0a09a ("fork: report pid reservation failure properly")
[2]: b26ebfe12f34 ("pid: Fix error return value in some cases")
[3]: 49cb2fc42ce4 ("fork: extend clone3() to support setting a PID")
Signed-off-by: Christian Brauner
08 Mar, 2020
2 commits
-
Recent changes to alloc_pid() allow the pid number to be specified on
the command line. If set_tid_size is set, then the code scanning the
levels will hard-set retval to -EPERM, overriding it's previous -ENOMEM
value.After the code scanning the levels, there are error returns that do not
set retval, assuming it is still set to -ENOMEM.So set retval back to -ENOMEM after scanning the levels.
Fixes: 49cb2fc42ce4 ("fork: extend clone3() to support setting a PID")
Signed-off-by: Corey Minyard
Acked-by: Christian Brauner
Cc: Andrei Vagin
Cc: Dmitry Safonov
Cc: Oleg Nesterov
Cc: Adrian Reber
Cc: # 5.5
Link: https://lore.kernel.org/r/20200306172314.12232-1-minyard@acm.org
[christian.brauner@ubuntu.com: fixup commit message]
Signed-off-by: Christian Brauner -
Pull block fixes from Jens Axboe:
"Here are a few fixes that should go into this release. This contains:- Revert of a bad bcache patch from this merge window
- Removed unused function (Daniel)
- Fixup for the blktrace fix from Jan from this release (Cengiz)
- Fix of deeper level bfqq overwrite in BFQ (Carlo)"
* tag 'block-5.6-2020-03-07' of git://git.kernel.dk/linux-block:
block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group()
blktrace: fix dereference after null check
Revert "bcache: ignore pending signals when creating gc and allocator thread"
block: Remove used kblockd_schedule_work_on()
07 Mar, 2020
1 commit
-
Pull thread fixes from Christian Brauner:
"Here are a few hopefully uncontroversial fixes:- Use RCU_INIT_POINTER() when initializing rcu protected members in
task_struct to fix sparse warnings.- Add pidfd_fdinfo_test binary to .gitignore file"
* tag 'for-linus-2020-03-07' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
selftests: pidfd: Add pidfd_fdinfo_test in .gitignore
exit: Fix Sparse errors and warnings
fork: Use RCU_INIT_POINTER() instead of rcu_access_pointer()
06 Mar, 2020
3 commits
-
test_run.o is not built when CONFIG_NET is not set and
bpf_prog_test_run_tracing being referenced in bpf_trace.o causes the
linker error:ld: kernel/trace/bpf_trace.o:(.rodata+0x38): undefined reference to
`bpf_prog_test_run_tracing'Add a __weak function in bpf_trace.c to handle this.
Fixes: da00d2f117a0 ("bpf: Add test ops for BPF_PROG_TYPE_TRACING")
Signed-off-by: KP Singh
Reported-by: Randy Dunlap
Acked-by: Randy Dunlap
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200305220127.29109-1-kpsingh@chromium.org -
While well intentioned, checking CAP_MAC_ADMIN for attaching
BPF_MODIFY_RETURN tracing programs to "security_" functions is not
necessary as tracing BPF programs already require CAP_SYS_ADMIN.Fixes: 6ba43b761c41 ("bpf: Attachment verification for BPF_MODIFY_RETURN")
Signed-off-by: KP Singh
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200305204955.31123-1-kpsingh@chromium.org -
There was a recent change in blktrace.c that added a RCU protection to
`q->blk_trace` in order to fix a use-after-free issue during access.However the change missed an edge case that can lead to dereferencing of
`bt` pointer even when it's NULL:Coverity static analyzer marked this as a FORWARD_NULL issue with CID
1460458.```
/kernel/trace/blktrace.c: 1904 in sysfs_blk_trace_attr_store()
1898 ret = 0;
1899 if (bt == NULL)
1900 ret = blk_trace_setup_queue(q, bdev);
1901
1902 if (ret == 0) {
1903 if (attr == &dev_attr_act_mask)
>>> CID 1460458: Null pointer dereferences (FORWARD_NULL)
>>> Dereferencing null pointer "bt".
1904 bt->act_mask = value;
1905 else if (attr == &dev_attr_pid)
1906 bt->pid = value;
1907 else if (attr == &dev_attr_start_lba)
1908 bt->start_lba = value;
1909 else if (attr == &dev_attr_end_lba)
```Added a reassignment with RCU annotation to fix the issue.
Fixes: c780e86dd48 ("blktrace: Protect q->blk_trace with RCU")
Cc: stable@vger.kernel.org
Reviewed-by: Ming Lei
Reviewed-by: Bob Liu
Reviewed-by: Steven Rostedt (VMware)
Signed-off-by: Cengiz Can
Signed-off-by: Jens Axboe
05 Mar, 2020
6 commits
-
The current fexit and fentry tests rely on a different program to
exercise the functions they attach to. Instead of doing this, implement
the test operations for tracing which will also be used for
BPF_MODIFY_RETURN in a subsequent patch.Also, clean up the fexit test to use the generated skeleton.
Signed-off-by: KP Singh
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Acked-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/20200304191853.1529-7-kpsingh@chromium.org -
- Allow BPF_MODIFY_RETURN attachment only to functions that are:
* Whitelisted for error injection by checking
within_error_injection_list. Similar discussions happened for the
bpf_override_return helper.* security hooks, this is expected to be cleaned up with the LSM
changes after the KRSI patches introduce the LSM_HOOK macro:https://lore.kernel.org/bpf/20200220175250.10795-1-kpsingh@chromium.org/
- The attachment is currently limited to functions that return an int.
This can be extended later other types (e.g. PTR).Signed-off-by: KP Singh
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Acked-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/20200304191853.1529-5-kpsingh@chromium.org -
When multiple programs are attached, each program receives the return
value from the previous program on the stack and the last program
provides the return value to the attached function.The fmod_ret bpf programs are run after the fentry programs and before
the fexit programs. The original function is only called if all the
fmod_ret programs return 0 to avoid any unintended side-effects. The
success value, i.e. 0 is not currently configurable but can be made so
where user-space can specify it at load time.For example:
int func_to_be_attached(int a, int b)
{
if (ret != 0)
goto do_fexit;original_function:
}
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Acked-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/20200304191853.1529-4-kpsingh@chromium.org -
As we need to introduce a third type of attachment for trampolines, the
flattened signature of arch_prepare_bpf_trampoline gets even more
complicated.Refactor the prog and count argument to arch_prepare_bpf_trampoline to
use bpf_tramp_progs to simplify the addition and accounting for new
attachment types.Signed-off-by: KP Singh
Signed-off-by: Alexei Starovoitov
Acked-by: Andrii Nakryiko
Acked-by: Daniel Borkmann
Link: https://lore.kernel.org/bpf/20200304191853.1529-2-kpsingh@chromium.org -
Older (and maybe current) versions of systemd set release_agent to "" when
shutting down, but do not set notify_on_release to 0.Since 64e90a8acb85 ("Introduce STATIC_USERMODEHELPER to mediate
call_usermodehelper()"), we filter out such calls when the user mode helper
path is "". However, when used in conjunction with an actual (i.e. non "")
STATIC_USERMODEHELPER, the path is never "", so the real usermode helper
will be called with argv[0] == "".Let's avoid this by not invoking the release_agent when it is "".
Signed-off-by: Tycho Andersen
Signed-off-by: Tejun Heo -
Similar to the commit d7495343228f ("cgroup: fix incorrect
WARN_ON_ONCE() in cgroup_setup_root()"), cgroup_id(root_cgrp) does not
equal to 1 on 32bit ino archs which triggers all sorts of issues with
psi_show() on s390x. For example,BUG: KASAN: slab-out-of-bounds in collect_percpu_times+0x2d0/
Read of size 4 at addr 000000001e0ce000 by task read_all/3667
collect_percpu_times+0x2d0/0x798
psi_show+0x7c/0x2a8
seq_read+0x2ac/0x830
vfs_read+0x92/0x150
ksys_read+0xe2/0x188
system_call+0xd8/0x2b4Fix it by using cgroup_ino().
Fixes: 743210386c03 ("cgroup: use cgrp->kn->id as the cgroup ID")
Signed-off-by: Qian Cai
Acked-by: Johannes Weiner
Signed-off-by: Tejun Heo
Cc: stable@vger.kernel.org # v5.5
03 Mar, 2020
1 commit
-
Introduce bpf_link abstraction, representing an attachment of BPF program to
a BPF hook point (e.g., tracepoint, perf event, etc). bpf_link encapsulates
ownership of attached BPF program, reference counting of a link itself, when
reference from multiple anonymous inodes, as well as ensures that release
callback will be called from a process context, so that users can safely take
mutex locks and sleep.Additionally, with a new abstraction it's now possible to generalize pinning
of a link object in BPF FS, allowing to explicitly prevent BPF program
detachment on process exit by pinning it in a BPF FS and let it open from
independent other process to keep working with it.Convert two existing bpf_link-like objects (raw tracepoint and tracing BPF
program attachments) into utilizing bpf_link framework, making them pinnable
in BPF FS. More FD-based bpf_links will be added in follow up patches.Signed-off-by: Andrii Nakryiko
Signed-off-by: Alexei Starovoitov
Link: https://lore.kernel.org/bpf/20200303043159.323675-2-andriin@fb.com
02 Mar, 2020
1 commit
-
Pull scheduler fix from Ingo Molnar:
"Fix a scheduler statistics bug"* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix statistics for find_idlest_group()