10 Oct, 2018
1 commit
-
commit b799207e1e1816b09e7a5920fbb2d5fcf6edd681 upstream.
When I wrote commit 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification"), I
assumed that, in order to emulate 64-bit arithmetic with 32-bit logic, it
is sufficient to just truncate the output to 32 bits; and so I just moved
the register size coercion that used to be at the start of the function to
the end of the function.That assumption is true for almost every op, but not for 32-bit right
shifts, because those can propagate information towards the least
significant bit. Fix it by always truncating inputs for 32-bit ops to 32
bits.Also get rid of the coerce_reg_to_size() after the ALU op, since that has
no effect.Fixes: 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification")
Acked-by: Daniel Borkmann
Signed-off-by: Jann Horn
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman
04 Oct, 2018
1 commit
-
[ Upstream commit 9b2e0388bec8ec5427403e23faff3b58dd1c3200 ]
When sockmap code is using the stream parser it also handles the write
space events in order to handle the case where (a) verdict redirects
skb to another socket and (b) the sockmap then sends the skb but due
to memory constraints (or other EAGAIN errors) needs to do a retry.But the initial code missed a third case where the
skb_send_sock_locked() triggers an sk_wait_event(). A typically case
would be when sndbuf size is exceeded. If this happens because we
do not pass the write_space event to the lower layers we never wake
up the event and it will wait for sndtimeo. Which as noted in ktls
fix may be rather large and look like a hang to the user.To reproduce the best test is to reduce the sndbuf size and send
1B data chunks to stress the memory handling. To fix this pass the
event from the upper layer to the lower layer.Signed-off-by: John Fastabend
Signed-off-by: Daniel Borkmann
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
24 Aug, 2018
1 commit
-
[ Upstream commit ed2b82c03dc187018307c7c6bf9299705f3db383 ]
Decrement the number of elements in the map in case the allocation
of a new node fails.Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements")
Signed-off-by: Mauricio Vasquez B
Acked-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
03 Aug, 2018
1 commit
-
[ Upstream commit ab7f5bf0928be2f148d000a6eaa6c0a36e74750e ]
Comments in the verifier refer to free_bpf_prog_info() which
seems to have never existed in tree. Replace it with
free_used_maps().Signed-off-by: Jakub Kicinski
Reviewed-by: Quentin Monnet
Signed-off-by: Daniel Borkmann
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
26 Apr, 2018
1 commit
-
[ Upstream commit 3d9e952697de89b53227f06d4241f275eb99cfc4 ]
When a program is attached to a map we increment the program refcnt
to ensure that the program is not removed while it is potentially
being referenced from sockmap side. However, if this same program
also references the map (this is a reasonably common pattern in
my programs) then the verifier will also increment the maps refcnt
from the verifier. This is to ensure the map doesn't get garbage
collected while the program has a reference to it.So we are left in a state where the map holds the refcnt on the
program stopping it from being removed and releasing the map refcnt.
And vice versa the program holds a refcnt on the map stopping it
from releasing the refcnt on the prog.All this is fine as long as users detach the program while the
map fd is still around. But, if the user omits this detach command
we are left with a dangling map we can no longer release.To resolve this when the map fd is released decrement the program
references and remove any reference from the map to the program.
This fixes the issue with possibly dangling map and creates a
user side API constraint. That is, the map fd must be held open
for programs to be attached to a map.Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
Signed-off-by: John Fastabend
Signed-off-by: Daniel Borkmann
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
29 Mar, 2018
1 commit
-
commit 0fa4fe85f4724fff89b09741c437cbee9cf8b008 upstream.
The current check statement in BPF syscall will do a capability check
for CAP_SYS_ADMIN before checking sysctl_unprivileged_bpf_disabled. This
code path will trigger unnecessary security hooks on capability checking
and cause false alarms on unprivileged process trying to get CAP_SYS_ADMIN
access. This can be resolved by simply switch the order of the statement
and CAP_SYS_ADMIN is not required anyway if unprivileged bpf syscall is
allowed.Signed-off-by: Chenbo Feng
Acked-by: Lorenzo Colitti
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman
11 Mar, 2018
5 commits
-
[ upstream commit ca36960211eb228bcbc7aaebfa0d027368a94c60 ]
The requirements around atomic_add() / atomic64_add() resp. their
JIT implementations differ across architectures. E.g. while x86_64
seems just fine with BPF's xadd on unaligned memory, on arm64 it
triggers via interpreter but also JIT the following crash:[ 830.864985] Unable to handle kernel paging request at virtual address ffff8097d7ed6703
[...]
[ 830.916161] Internal error: Oops: 96000021 [#1] SMP
[ 830.984755] CPU: 37 PID: 2788 Comm: test_verifier Not tainted 4.16.0-rc2+ #8
[ 830.991790] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.29 07/17/2017
[ 830.998998] pstate: 80400005 (Nzcv daif +PAN -UAO)
[ 831.003793] pc : __ll_sc_atomic_add+0x4/0x18
[ 831.008055] lr : ___bpf_prog_run+0x1198/0x1588
[ 831.012485] sp : ffff00001ccabc20
[ 831.015786] x29: ffff00001ccabc20 x28: ffff8017d56a0f00
[ 831.021087] x27: 0000000000000001 x26: 0000000000000000
[ 831.026387] x25: 000000c168d9db98 x24: 0000000000000000
[ 831.031686] x23: ffff000008203878 x22: ffff000009488000
[ 831.036986] x21: ffff000008b14e28 x20: ffff00001ccabcb0
[ 831.042286] x19: ffff0000097b5080 x18: 0000000000000a03
[ 831.047585] x17: 0000000000000000 x16: 0000000000000000
[ 831.052885] x15: 0000ffffaeca8000 x14: 0000000000000000
[ 831.058184] x13: 0000000000000000 x12: 0000000000000000
[ 831.063484] x11: 0000000000000001 x10: 0000000000000000
[ 831.068783] x9 : 0000000000000000 x8 : 0000000000000000
[ 831.074083] x7 : 0000000000000000 x6 : 000580d428000000
[ 831.079383] x5 : 0000000000000018 x4 : 0000000000000000
[ 831.084682] x3 : ffff00001ccabcb0 x2 : 0000000000000001
[ 831.089982] x1 : ffff8097d7ed6703 x0 : 0000000000000001
[ 831.095282] Process test_verifier (pid: 2788, stack limit = 0x0000000018370044)
[ 831.102577] Call trace:
[ 831.105012] __ll_sc_atomic_add+0x4/0x18
[ 831.108923] __bpf_prog_run32+0x4c/0x70
[ 831.112748] bpf_test_run+0x78/0xf8
[ 831.116224] bpf_prog_test_run_xdp+0xb4/0x120
[ 831.120567] SyS_bpf+0x77c/0x1110
[ 831.123873] el0_svc_naked+0x30/0x34
[ 831.127437] Code: 97fffe97 17ffffec 00000000 f9800031 (885f7c31)Reason for this is because memory is required to be aligned. In
case of BPF, we always enforce alignment in terms of stack access,
but not when accessing map values or packet data when the underlying
arch (e.g. arm64) has CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS set.xadd on packet data that is local to us anyway is just wrong, so
forbid this case entirely. The only place where xadd makes sense in
fact are map values; xadd on stack is wrong as well, but it's been
around for much longer. Specifically enforce strict alignment in case
of xadd, so that we handle this case generically and avoid such crashes
in the first place.Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Daniel Borkmann
Signed-off-by: Alexei Starovoitov
Signed-off-by: Greg Kroah-Hartman -
[ upstream commit 32fff239de37ef226d5b66329dd133f64d63b22d ]
syszbot managed to trigger RCU detected stalls in
bpf_array_free_percpu()It takes time to allocate a huge percpu map, but even more time to free
it.Since we run in process context, use cond_resched() to yield cpu if
needed.Fixes: a10423b87a7e ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map")
Signed-off-by: Eric Dumazet
Reported-by: syzbot
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman -
[ upstream commit 6c5f61023c5b0edb0c8a64c902fe97c6453b1852 ]
Commit 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback function")
fixed a memory leak and removed unnecessary locks in map_free callback function.
Unfortrunately, it introduced a lockdep warning. When lockdep checking is turned on,
running tools/testing/selftests/bpf/test_lpm_map will have:[ 98.294321] =============================
[ 98.294807] WARNING: suspicious RCU usage
[ 98.295359] 4.16.0-rc2+ #193 Not tainted
[ 98.295907] -----------------------------
[ 98.296486] /home/yhs/work/bpf/kernel/bpf/lpm_trie.c:572 suspicious rcu_dereference_check() usage!
[ 98.297657]
[ 98.297657] other info that might help us debug this:
[ 98.297657]
[ 98.298663]
[ 98.298663] rcu_scheduler_active = 2, debug_locks = 1
[ 98.299536] 2 locks held by kworker/2:1/54:
[ 98.300152] #0: ((wq_completion)"events"){+.+.}, at: [] process_one_work+0x157/0x5c0
[ 98.301381] #1: ((work_completion)(&map->work)){+.+.}, at: [] process_one_work+0x157/0x5c0Since actual trie tree removal happens only after no other
accesses to the tree are possible, replacing
rcu_dereference_protected(*slot, lockdep_is_held(&trie->lock))
with
rcu_dereference_protected(*slot, 1)
fixed the issue.Fixes: 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback function")
Reported-by: Eric Dumazet
Suggested-by: Eric Dumazet
Signed-off-by: Yonghong Song
Reviewed-by: Eric Dumazet
Acked-by: David S. Miller
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman -
[ upstream commit 9a3efb6b661f71d5675369ace9257833f0e78ef3 ]
There is a memory leak happening in lpm_trie map_free callback
function trie_free. The trie structure itself does not get freed.Also, trie_free function did not do synchronize_rcu before freeing
various data structures. This is incorrect as some rcu_read_lock
region(s) for lookup, update, delete or get_next_key may not complete yet.
The fix is to add synchronize_rcu in the beginning of trie_free.
The useless spin_lock is removed from this function as well.Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
Reported-by: Mathieu Malaterre
Reported-by: Alexei Starovoitov
Tested-by: Mathieu Malaterre
Signed-off-by: Yonghong Song
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman -
[ upstream commit 9c2d63b843a5c8a8d0559cc067b5398aa5ec3ffc ]
syzkaller recently triggered OOM during percpu map allocation;
while there is work in progress by Dennis Zhou to add __GFP_NORETRY
semantics for percpu allocator under pressure, there seems also a
missing bpf_map_precharge_memlock() check in array map allocation.Given today the actual bpf_map_charge_memlock() happens after the
find_and_alloc_map() in syscall path, the bpf_map_precharge_memlock()
is there to bail out early before we go and do the map setup work
when we find that we hit the limits anyway. Therefore add this for
array map as well.Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements")
Fixes: a10423b87a7e ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map")
Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann
Cc: Dennis Zhou
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman
03 Mar, 2018
1 commit
-
[ Upstream commit 5731a879d03bdaa00265f8ebc32dfd0e65d25276 ]
Add psock NULL check to handle a racing sock event that can get the
sk_callback_lock before this case but after xchg happens causing the
refcnt to hit zero and sock user data (psock) to be null and queued
for garbage collection.Also add a comment in the code because this is a bit subtle and
not obvious in my opinion.Signed-off-by: John Fastabend
Signed-off-by: Daniel Borkmann
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
25 Feb, 2018
1 commit
-
commit 6f16101e6a8b4324c36e58a29d9e0dbb287cdedb upstream.
syzkaller generated a BPF proglet and triggered a warning with
the following:0: (b7) r0 = 0
1: (d5) if r0 s max_val scenario it means that reg_set_min_max()
and reg_set_min_max_inv() (which both refine existing bounds)
demonstrated that such branch cannot be taken at runtime.In above scenario for the case where it will be taken, the
existing [0, 0] bounds are kept intact. Meaning, the rejection
is not due to a verifier internal error, and therefore the
WARN() is not necessary either.We could just reject such cases in adjust_{ptr,scalar}_min_max_vals()
when either known scalars have smin_val != smax_val or
umin_val != umax_val or any scalar reg with bounds
smin_val > smax_val or umin_val > umax_val. However, there
may be a small risk of breakage of buggy programs, so handle
this more gracefully and in adjust_{ptr,scalar}_min_max_vals()
just taint the dst reg as unknown scalar when we see ops with
such kind of src reg.Reported-by: syzbot+6d362cadd45dc0a12ba4@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann
Signed-off-by: Alexei Starovoitov
Signed-off-by: Greg Kroah-Hartman
22 Feb, 2018
1 commit
-
commit 4950276672fce5c241857540f8561c440663673d upstream.
Patch series "kmemcheck: kill kmemcheck", v2.
As discussed at LSF/MM, kill kmemcheck.
KASan is a replacement that is able to work without the limitation of
kmemcheck (single CPU, slow). KASan is already upstream.We are also not aware of any users of kmemcheck (or users who don't
consider KASan as a suitable replacement).The only objection was that since KASAN wasn't supported by all GCC
versions provided by distros at that time we should hold off for 2
years, and try again.Now that 2 years have passed, and all distros provide gcc that supports
KASAN, kill kmemcheck again for the very same reasons.This patch (of 4):
Remove kmemcheck annotations, and calls to kmemcheck from the kernel.
[alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs]
Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com
Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.com
Signed-off-by: Sasha Levin
Cc: Alexander Potapenko
Cc: Eric W. Biederman
Cc: Michal Hocko
Cc: Pekka Enberg
Cc: Steven Rostedt
Cc: Tim Hansen
Cc: Vegard Nossum
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman
31 Jan, 2018
4 commits
-
[ upstream commit f37a8cb84cce18762e8f86a70bd6a49a66ab964c ]
Alexei found that verifier does not reject stores into context
via BPF_ST instead of BPF_STX. And while looking at it, we
also should not allow XADD variant of BPF_STX.The context rewriter is only assuming either BPF_LDX_MEM- or
BPF_STX_MEM-type operations, thus reject anything other than
that so that assumptions in the rewriter properly hold. Add
test cases as well for BPF selftests.Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields")
Reported-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Alexei Starovoitov
Signed-off-by: Greg Kroah-Hartman -
[ upstream commit 68fda450a7df51cff9e5a4d4a4d9d0d5f2589153 ]
due to some JITs doing if (src_reg == 0) check in 64-bit mode
for div/mod operations mask upper 32-bits of src register
before doing the checkFixes: 622582786c9e ("net: filter: x86: internal BPF JIT")
Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.")
Reported-by: syzbot+48340bb518e88849e2e3@syzkaller.appspotmail.com
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman -
[ upstream commit c366287ebd698ef5e3de300d90cd62ee9ee7373e ]
Divides by zero are not nice, lets avoid them if possible.
Also do_div() seems not needed when dealing with 32bit operands,
but this seems a minor detail.Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Signed-off-by: Eric Dumazet
Reported-by: syzbot
Signed-off-by: Alexei Starovoitov
Signed-off-by: Greg Kroah-Hartman -
[ upstream commit 290af86629b25ffd1ed6232c4e9107da031705cb ]
The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
A quote from goolge project zero blog:
"At this point, it would normally be necessary to locate gadgets in
the host kernel code that can be used to actually leak data by reading
from an attacker-controlled location, shifting and masking the result
appropriately and then using the result of that as offset to an
attacker-controlled address for a load. But piecing gadgets together
and figuring out which ones work in a speculation context seems annoying.
So instead, we decided to use the eBPF interpreter, which is built into
the host kernel - while there is no legitimate way to invoke it from inside
a VM, the presence of the code in the host kernel's text section is sufficient
to make it usable for the attack, just like with ordinary ROP gadgets."To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
option that removes interpreter from the kernel in favor of JIT-only mode.
So far eBPF JIT is supported by:
x64, arm64, arm32, sparc64, s390, powerpc64, mips64The start of JITed program is randomized and code page is marked as read-only.
In addition "constant blinding" can be turned on with net.core.bpf_jit_hardenv2->v3:
- move __bpf_prog_ret0 under ifdef (Daniel)v1->v2:
- fix init order, test_bpf and cBPF (Daniel's feedback)
- fix offloaded bpf (Jakub's feedback)
- add 'return 0' dummy in case something can invoke prog->bpf_func
- retarget bpf tree. For bpf-next the patch would need one extra hunk.
It will be sent when the trees are merged back to net-nextConsidered doing:
int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
but it seems better to land the patch as-is and in bpf-next remove
bpf_jit_enable global variable from all JITs, consolidate in one place
and remove this jit_init() function.Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman
17 Jan, 2018
3 commits
-
commit 7891a87efc7116590eaba57acc3c422487802c6f upstream.
The following snippet was throwing an 'unknown opcode cc' warning
in BPF interpreter:0: (18) r0 = 0x0
2: (7b) *(u64 *)(r10 -16) = r0
3: (cc) (u32) r0 s>>= (u32) r0
4: (95) exitAlthough a number of JITs do support BPF_ALU | BPF_ARSH | BPF_{K,X}
generation, not all of them do and interpreter does neither. We can
leave existing ones and implement it later in bpf-next for the
remaining ones, but reject this properly in verifier for the time
being.Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Reported-by: syzbot+93c4904c5c70348a6890@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann
Signed-off-by: Alexei Starovoitov
Signed-off-by: Greg Kroah-Hartman -
commit bbeb6e4323dad9b5e0ee9f60c223dd532e2403b1 upstream.
syzkaller tried to alloc a map with 0xfffffffd entries out of a userns,
and thus unprivileged. With the recently added logic in b2157399cc98
("bpf: prevent out-of-bounds speculation") we round this up to the next
power of two value for max_entries for unprivileged such that we can
apply proper masking into potentially zeroed out map slots.However, this will generate an index_mask of 0xffffffff, and therefore
a + 1 will let this overflow into new max_entries of 0. This will pass
allocation, etc, and later on map access we still enforce on the original
attr->max_entries value which was 0xfffffffd, therefore triggering GPF
all over the place. Thus bail out on overflow in such case.Moreover, on 32 bit archs roundup_pow_of_two() can also not be used,
since fls_long(max_entries - 1) can result in 32 and 1UL << 32 in 32 bit
space is undefined. Therefore, do this by hand in a 64 bit variable.This fixes all the issues triggered by syzkaller's reproducers.
Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation")
Reported-by: syzbot+b0efb8e572d01bce1ae0@syzkaller.appspotmail.com
Reported-by: syzbot+6c15e9744f75f2364773@syzkaller.appspotmail.com
Reported-by: syzbot+d2f5524fb46fd3b312ee@syzkaller.appspotmail.com
Reported-by: syzbot+61d23c95395cc90dbc2b@syzkaller.appspotmail.com
Reported-by: syzbot+0d363c942452cca68c01@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann
Signed-off-by: Alexei Starovoitov
Cc: Jiri Slaby
Cc: Eric Dumazet
Signed-off-by: Greg Kroah-Hartman -
commit b2157399cc9898260d6031c5bfe45fe137c1fbe7 upstream.
Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index & index_mask' operation.If prog_array map is created by unpriv user replace
bpf_tail_call(ctx, map, index);
with
if (index >= max_entries) {
index &= map->index_mask;
bpf_tail_call(ctx, map, index);
}
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index >= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.v2->v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.Signed-off-by: Alexei Starovoitov
Acked-by: John Fastabend
Signed-off-by: Daniel Borkmann
Cc: Jiri Slaby
Signed-off-by: Greg Kroah-Hartman
25 Dec, 2017
9 commits
-
From: Alexei Starovoitov
[ Upstream commit bb7f0f989ca7de1153bd128a40a71709e339fa03 ]
There were various issues related to the limited size of integers used in
the verifier:
- `off + size` overflow in __check_map_access()
- `off + reg->off` overflow in check_mem_access()
- `off + reg->var_off.value` overflow or 32-bit truncation of
`reg->var_off.value` in check_mem_access()
- 32-bit truncation in check_stack_boundary()Make sure that any integer math cannot overflow by not allowing
pointer math with large values.Also reduce the scope of "scalar op scalar" tracking.
Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Reported-by: Jann Horn
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman -
From: Jann Horn
[ Upstream commit 179d1c5602997fef5a940c6ddcf31212cbfebd14 ]
This could be made safe by passing through a reference to env and checking
for env->allow_ptr_leaks, but it would only work one way and is probably
not worth the hassle - not doing it will not directly lead to program
rejection.Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Jann Horn
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman -
From: Jann Horn
[ Upstream commit a5ec6ae161d72f01411169a938fa5f8baea16e8f ]
Force strict alignment checks for stack pointers because the tracking of
stack spills relies on it; unaligned stack accesses can lead to corruption
of spilled registers, which is exploitable.Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Jann Horn
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman -
From: Jann Horn
Prevent indirect stack accesses at non-constant addresses, which would
permit reading and corrupting spilled pointers.Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Jann Horn
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman -
From: Jann Horn
[ Upstream commit 468f6eafa6c44cb2c5d8aad35e12f06c240a812a ]
32-bit ALU ops operate on 32-bit values and have 32-bit outputs.
Adjust the verifier accordingly.Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Jann Horn
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman -
From: Jann Horn
[ Upstream commit 0c17d1d2c61936401f4702e1846e2c19b200f958 ]
Properly handle register truncation to a smaller size.
The old code first mirrors the clearing of the high 32 bits in the bitwise
tristate representation, which is correct. But then, it computes the new
arithmetic bounds as the intersection between the old arithmetic bounds and
the bounds resulting from the bitwise tristate representation. Therefore,
when coerce_reg_to_32() is called on a number with bounds
[0xffff'fff8, 0x1'0000'0007], the verifier computes
[0xffff'fff8, 0xffff'ffff] as bounds of the truncated number.
This is incorrect: The truncated number could also be in the range [0, 7],
and no meaningful arithmetic bounds can be computed in that case apart from
the obvious [0, 0xffff'ffff].Starting with v4.14, this is exploitable by unprivileged users as long as
the unprivileged_bpf_disabled sysctl isn't set.Debian assigned CVE-2017-16996 for this issue.
v2:
- flip the mask during arithmetic bounds calculation (Ben Hutchings)
v3:
- add CVE number (Ben Hutchings)Fixes: b03c9f9fdc37 ("bpf/verifier: track signed and unsigned min/max values")
Signed-off-by: Jann Horn
Acked-by: Edward Cree
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman -
From: Jann Horn
[ Upstream commit 95a762e2c8c942780948091f8f2a4f32fce1ac6f ]
Distinguish between
BPF_ALU64|BPF_MOV|BPF_K (load 32-bit immediate, sign-extended to 64-bit)
and BPF_ALU|BPF_MOV|BPF_K (load 32-bit immediate, zero-padded to 64-bit);
only perform sign extension in the first case.Starting with v4.14, this is exploitable by unprivileged users as long as
the unprivileged_bpf_disabled sysctl isn't set.Debian assigned CVE-2017-16995 for this issue.
v3:
- add CVE number (Ben Hutchings)Fixes: 484611357c19 ("bpf: allow access into map value arrays")
Signed-off-by: Jann Horn
Acked-by: Edward Cree
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman -
From: Edward Cree
[ Upstream commit 4374f256ce8182019353c0c639bb8d0695b4c941 ]
Incorrect signed bounds were being computed.
If the old upper signed bound was positive and the old lower signed bound was
negative, this could cause the new upper signed bound to be too low,
leading to security issues.Fixes: b03c9f9fdc37 ("bpf/verifier: track signed and unsigned min/max values")
Reported-by: Jann Horn
Signed-off-by: Edward Cree
Acked-by: Alexei Starovoitov
[jannh@google.com: changed description to reflect bug impact]
Signed-off-by: Jann Horn
Signed-off-by: Alexei Starovoitov
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman -
From: Alexei Starovoitov
[ Upstream commit c131187db2d3fa2f8bf32fdf4e9a4ef805168467 ]
when the verifier detects that register contains a runtime constant
and it's compared with another constant it will prune exploration
of the branch that is guaranteed not to be taken at runtime.
This is all correct, but malicious program may be constructed
in such a way that it always has a constant comparison and
the other branch is never taken under any conditions.
In this case such path through the program will not be explored
by the verifier. It won't be taken at run-time either, but since
all instructions are JITed the malicious program may cause JITs
to complain about using reserved fields, etc.
To fix the issue we have to track the instructions explored by
the verifier and sanitize instructions that are dead at run time
with NOPs. We cannot reject such dead code, since llvm generates
it for valid C code, since it doesn't do as much data flow
analysis as the verifier does.Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Signed-off-by: Daniel Borkmann
Signed-off-by: Greg Kroah-Hartman
14 Dec, 2017
1 commit
-
[ Upstream commit 89ad2fa3f043a1e8daae193bcb5fe34d5f8caf28 ]
pcpu_freelist_pop() needs the same lockdep awareness than
pcpu_freelist_populate() to avoid a false positive.[ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
switchto-defaul/12508 [HC0[0]:SC0[6]:HE0:SE0] is trying to acquire:
(&htab->buckets[i].lock){......}, at: [] __htab_percpu_map_update_elem+0x1cb/0x300and this task is already holding:
(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}, at: [] __dev_queue_xmit+0
x868/0x1240
which would create a new lock dependency:
(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...} -> (&htab->buckets[i].lock){......}but this new dependency connects a SOFTIRQ-irq-safe lock:
(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}
... which became SOFTIRQ-irq-safe at:
[] __lock_acquire+0x42b/0x1f10
[] lock_acquire+0xbc/0x1b0
[] _raw_spin_lock+0x38/0x50
[] __dev_queue_xmit+0x868/0x1240
[] dev_queue_xmit+0x10/0x20
[] ip_finish_output2+0x439/0x590
[] ip_finish_output+0x150/0x2f0
[] ip_output+0x7d/0x260
[] ip_local_out+0x5e/0xe0
[] ip_queue_xmit+0x205/0x620
[] tcp_transmit_skb+0x5a8/0xcb0
[] tcp_write_xmit+0x242/0x1070
[] __tcp_push_pending_frames+0x3c/0xf0
[] tcp_rcv_established+0x312/0x700
[] tcp_v4_do_rcv+0x11c/0x200
[] tcp_v4_rcv+0xaa2/0xc30
[] ip_local_deliver_finish+0xa7/0x240
[] ip_local_deliver+0x66/0x200
[] ip_rcv_finish+0xdd/0x560
[] ip_rcv+0x295/0x510
[] __netif_receive_skb_core+0x988/0x1020
[] __netif_receive_skb+0x21/0x70
[] process_backlog+0x6f/0x230
[] net_rx_action+0x229/0x420
[] __do_softirq+0xd8/0x43d
[] do_softirq_own_stack+0x1c/0x30
[] do_softirq+0x55/0x60
[] __local_bh_enable_ip+0xa8/0xb0
[] cpu_startup_entry+0x1c7/0x500
[] start_secondary+0x113/0x140to a SOFTIRQ-irq-unsafe lock:
(&head->lock){+.+...}
... which became SOFTIRQ-irq-unsafe at:
... [] __lock_acquire+0x82f/0x1f10
[] lock_acquire+0xbc/0x1b0
[] _raw_spin_lock+0x38/0x50
[] pcpu_freelist_pop+0x7a/0xb0
[] htab_map_alloc+0x50c/0x5f0
[] SyS_bpf+0x265/0x1200
[] entry_SYSCALL_64_fastpath+0x12/0x17other info that might help us debug this:
Chain exists of:
dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2 --> &htab->buckets[i].lock --> &head->lockPossible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&head->lock);
local_irq_disable();
lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2);
lock(&htab->buckets[i].lock);
lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2);*** DEADLOCK ***
Fixes: e19494edab82 ("bpf: introduce percpu_freelist")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman
03 Nov, 2017
1 commit
-
…el/git/gregkh/driver-core
Pull initial SPDX identifiers from Greg KH:
"License cleanup: add SPDX license identifiers to some filesMany source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.By default all files without license information are under the default
license of the kernel, which is GPL version 2.Update the files which contain no license information with the
'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally
binding shorthand, which can be used instead of the full boiler plate
text.This patch is based on work done by Thomas Gleixner and Kate Stewart
and Philippe Ombredanne.How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset
of the use cases:- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to
license had to be inferred by heuristics based on keywords.The analysis to determine which SPDX License Identifier to be applied
to a file was done in a spreadsheet of side by side results from of
the output of two independent scanners (ScanCode & Windriver)
producing SPDX tag:value files created by Philippe Ombredanne.
Philippe prepared the base worksheet, and did an initial spot review
of a few 1000 files.The 4.13 kernel was the starting point of the analysis with 60,537
files assessed. Kate Stewart did a file by file comparison of the
scanner results in the spreadsheet to determine which SPDX license
identifier(s) to be applied to the file. She confirmed any
determination that was not immediately clear with lawyers working with
the Linux Foundation.Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained
>5 lines of source- File already had some variant of a license header in it (even if <5
lines).All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that
was:SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that
became the concluded license(s).- when there was disagreement between the two scanners (one detected
a license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply
(and which scanner probably needed to revisit its heuristics).- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases,
confirmation by lawyers working with the Linux Foundation.Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.
The Windriver scanner is based on an older version of FOSSology in
part, so they are related.Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot
checks in about 15000 files.In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect
the correct identifier.Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial
patch version early this week with:- a full scancode scan run, collecting the matched texts, detected
license ids and scores- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct- reviewing anything where there was no detection but the patch
license was not GPL-2.0 WITH Linux-syscall-note to ensure that the
applied SPDX license was correctThis produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"* tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
License cleanup: add SPDX license identifier to uapi header files with a license
License cleanup: add SPDX license identifier to uapi header files with no license
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
02 Nov, 2017
1 commit
-
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.By default all files without license information are under the default
license of the kernel, which is GPL version 2.Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman
01 Nov, 2017
1 commit
-
Now that SK_REDIRECT is no longer a valid return code. Remove it
from the UAPI completely. Then do a namespace remapping internal
to sockmap so SK_REDIRECT is no longer externally visible.Patchs primary change is to do a namechange from SK_REDIRECT to
__SK_REDIRECTReported-by: Alexei Starovoitov
Signed-off-by: John Fastabend
Signed-off-by: David S. Miller
29 Oct, 2017
2 commits
-
Recent additions to support multiple programs in cgroups impose
a strict requirement, "all yes is yes, any no is no". To enforce
this the infrastructure requires the 'no' return code, SK_DROP in
this case, to be 0.To apply these rules to SK_SKB program types the sk_actions return
codes need to be adjusted.This fix adds SK_PASS and makes 'SK_DROP = 0'. Finally, remove
SK_ABORTED to remove any chance that the API may allow aborted
program flows to be passed up the stack. This would be incorrect
behavior and allow programs to break existing policies.Signed-off-by: John Fastabend
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller -
SK_SKB program types use bpf_compute_data to store the end of the
packet data. However, bpf_compute_data assumes the cb is stored in the
qdisc layer format. But, for SK_SKB this is the wrong layer of the
stack for this type.It happens to work (sort of!) because in most cases nothing happens
to be overwritten today. This is very fragile and error prone.
Fortunately, we have another hole in tcp_skb_cb we can use so lets
put the data_end value there.Note, SK_SKB program types do not use data_meta, they are failed by
sk_skb_is_valid_access().Signed-off-by: John Fastabend
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller
22 Oct, 2017
3 commits
-
Alexander had a test program with direct packet access, where
the access test was in the form of data + X > data_end. In an
unrelated change to the program LLVM decided to swap the branches
and emitted code for the test in form of data + X
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Acked-by: John Fastabend
Signed-off-by: David S. Miller -
During review I noticed that the current logic for direct packet
access marking in check_cond_jmp_op() has an off by one for the
upper right range border when marking in find_good_pkt_pointers()
with BPF_JLT and BPF_JLE. It's not really harmful given access
up to pkt_end is always safe, but we should nevertheless correct
the range marking before it becomes ABI. If pkt_data' denotes a
pkt_data derived pointer (pkt_data + X), then for pkt_data' < pkt_end
in the true branch as well as for pkt_end < pkt_end the verifier simulation cannot
deduce that a byte load of pkt_data' - 1 would succeed in this
branch.Fixes: b4e432f1000a ("bpf: enable BPF_J{LT, LE, SLT, SLE} opcodes in verifier")
Signed-off-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Acked-by: John Fastabend
Signed-off-by: David S. Miller -
An integer overflow is possible in dev_map_bitmap_size() when
calculating the BITS_TO_LONG logic which becomes, after macro
replacement,(((n) + (d) - 1)/ (d))
where 'n' is a __u32 and 'd' is (8 * sizeof(long)). To avoid
overflow cast to u64 before arithmetic.Reported-by: Richard Weinberger
Acked-by: Daniel Borkmann
Signed-off-by: John Fastabend
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller
20 Oct, 2017
1 commit
-
Devmap is used with XDP which requires CAP_NET_ADMIN so lets also
make CAP_NET_ADMIN required to use the map.Signed-off-by: John Fastabend
Acked-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller