30 Jul, 2016
1 commit
-
Pull security subsystem updates from James Morris:
"Highlights:- TPM core and driver updates/fixes
- IPv6 security labeling (CALIPSO)
- Lots of Apparmor fixes
- Seccomp: remove 2-phase API, close hole where ptrace can change
syscall #"* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits)
apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling
tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family)
tpm: Factor out common startup code
tpm: use devm_add_action_or_reset
tpm2_i2c_nuvoton: add irq validity check
tpm: read burstcount from TPM_STS in one 32-bit transaction
tpm: fix byte-order for the value read by tpm2_get_tpm_pt
tpm_tis_core: convert max timeouts from msec to jiffies
apparmor: fix arg_size computation for when setprocattr is null terminated
apparmor: fix oops, validate buffer size in apparmor_setprocattr()
apparmor: do not expose kernel stack
apparmor: fix module parameters can be changed after policy is locked
apparmor: fix oops in profile_unpack() when policy_db is not present
apparmor: don't check for vmalloc_addr if kvzalloc() failed
apparmor: add missing id bounds check on dfa verification
apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task
apparmor: use list_next_entry instead of list_entry_next
apparmor: fix refcount race when finding a child profile
apparmor: fix ref count leak when profile sha1 hash is read
apparmor: check that xindex is in trans_table bounds
...
14 Jul, 2016
1 commit
-
Dccp verifies packet integrity, including length, at initial rcv in
dccp_invalid_packet, later pulls headers in dccp_enqueue_skb.A call to sk_filter in-between can cause __skb_pull to wrap skb->len.
skb_copy_datagram_msg interprets this as a negative value, so
(correctly) fails with EFAULT. The negative length is reported in
ioctl SIOCINQ or possibly in a DCCP_WARN in dccp_close.Introduce an sk_receive_skb variant that caps how small a filter
program can trim packets, and call this in dccp with the header
length. Excessively trimmed packets are now processed normally and
queued for reception as 0B payloads.Fixes: 7c657876b63c ("[DCCP]: Initial implementation")
Signed-off-by: Willem de Bruijn
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller
10 Jul, 2016
1 commit
-
In the prep work I did before enabling BH while handling socket backlog,
I missed two points in DCCP :1) dccp_v4_ctl_send_reset() uses bh_lock_sock(), assuming BH were
blocked. It is not anymore always true.2) dccp_v4_route_skb() was using __IP_INC_STATS() instead of
IP_INC_STATS()A similar fix was done for TCP, in commit 47dcc20a39d0
("ipv4: tcp: ip_send_unicast_reply() is not BH safe")Fixes: 7309f8821fd6 ("dccp: do not assume DCCP code is non preemptible")
Fixes: 5413d1babe8f ("net: do not block BH while processing socket backlog")
Signed-off-by: Eric Dumazet
Reported-by: Dmitry Vyukov
Signed-off-by: David S. Miller
07 Jul, 2016
1 commit
28 Jun, 2016
1 commit
-
If set, these will take precedence over the parent's options during
both sending and child creation. If they're not set, the parent's
options (if any) will be used.This is to allow the security_inet_conn_request() hook to modify the
IPv6 options in just the same way that it already may do for IPv4.Signed-off-by: Huw Davies
Signed-off-by: Paul Moore
03 May, 2016
1 commit
-
DCCP uses the generic backlog code, and this will soon
be changed to not disable BH when protocol is called back.Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Signed-off-by: David S. Miller
28 Apr, 2016
6 commits
-
There is nothing related to BH in SNMP counters anymore,
since linux-3.0.Rename helpers to use __ prefix instead of _BH prefix,
for contexts where preemption is disabled.This more closely matches convention used to update
percpu variables.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Rename NET_INC_STATS_BH() to __NET_INC_STATS()
and NET_ADD_STATS_BH() to __NET_ADD_STATS()Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Rename ICMP6_INC_STATS_BH() to __ICMP6_INC_STATS()
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Rename IP_INC_STATS_BH() to __IP_INC_STATS(), to
better express this is used in non preemptible context.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Rename ICMP_INC_STATS_BH() to __ICMP_INC_STATS()
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Rename DCCP_INC_STATS_BH() to __DCCP_INC_STATS()
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
08 Apr, 2016
1 commit
-
The socket is either locked if we hold the slock spin_lock for
lock_sock_fast and unlock_sock_fast or we own the lock (sk_lock.owned
!= 0). Check for this and at the same time improve that the current
thread/cpu is really holding the lock.Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
05 Apr, 2016
1 commit
-
When a SYNFLOOD targets a non SO_REUSEPORT listener, multiple
cpus contend on sk->sk_refcnt and sk->sk_wmem_alloc changes.By letting listeners use SOCK_RCU_FREE infrastructure,
we can relax TCP_LISTEN lookup rules and avoid touching sk_refcntNote that we still use SLAB_DESTROY_BY_RCU rules for other sockets,
only listeners are impacted by this change.Peak performance under SYNFLOOD is increased by ~33% :
On my test machine, I could process 3.2 Mpps instead of 2.4 Mpps
Most consuming functions are now skb_set_owner_w() and sock_wfree()
contending on sk->sk_wmem_alloc when cooking SYNACK and freeing them.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
18 Mar, 2016
1 commit
-
Now SYN_RECV request sockets are installed in ehash table, an ICMP
handler can find a request socket while another cpu handles an incoming
packet transforming this SYN_RECV request socket into an ESTABLISHED
socket.We need to remove the now obsolete WARN_ON(req->sk), since req->sk
is set when a new child is created and added into listener accept queue.If this race happens, the ICMP will do nothing special.
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet
Reported-by: Ben Lazarus
Reported-by: Neal Cardwell
Signed-off-by: David S. Miller
23 Feb, 2016
1 commit
-
Conflicts:
drivers/net/phy/bcm7xxx.c
drivers/net/phy/marvell.c
drivers/net/vxlan.cAll three conflicts were cases of simple overlapping changes.
Signed-off-by: David S. Miller
19 Feb, 2016
1 commit
-
Ilya reported following lockdep splat:
kernel: =========================
kernel: [ BUG: held lock freed! ]
kernel: 4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted
kernel: -------------------------
kernel: swapper/5/0 is freeing memory
ffff880035c9d200-ffff880035c9dbff, with a lock still held there!
kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at:
[] inet_csk_reqsk_queue_add+0x28/0xa0
kernel: 4 locks held by swapper/5/0:
kernel: #0: (rcu_read_lock){......}, at: []
netif_receive_skb_internal+0x4b/0x1f0
kernel: #1: (rcu_read_lock){......}, at: []
ip_local_deliver_finish+0x3f/0x380
kernel: #2: (slock-AF_INET){+.-...}, at: []
sk_clone_lock+0x19b/0x440
kernel: #3: (&(&queue->rskq_lock)->rlock){+.-...}, at:
[] inet_csk_reqsk_queue_add+0x28/0xa0To properly fix this issue, inet_csk_reqsk_queue_add() needs
to return to its callers if the child as been queued
into accept queue.We also need to make sure listener is still there before
calling sk->sk_data_ready(), by holding a reference on it,
since the reference carried by the child can disappear as
soon as the child is put on accept queue.Reported-by: Ilya Dryomov
Fixes: ebb516af60e1 ("tcp/dccp: fix race at listener dismantle phase")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
11 Feb, 2016
2 commits
-
This is a preliminary step to allow fast socket lookup of SO_REUSEPORT
groups. Doing so with a BPF filter will require access to the
skb in question. This change plumbs the skb (and offset to payload
data) through the call stack to the listening socket lookup
implementations where it will be used in a following patch.Signed-off-by: Craig Gallek
Signed-off-by: David S. Miller -
In order to support fast lookups for TCP sockets with SO_REUSEPORT,
the function that adds sockets to the listening hash set needs
to be able to check receive address equality. Since this equality
check is different for IPv4 and IPv6, we will need two different
socket hashing functions.This patch adds inet6_hash identical to the existing inet_hash function
and updates the appropriate references. A following patch will
differentiate the two by passing different comparison functions to
__inet_hash.Additionally, in order to use the IPv6 address equality function from
inet6_hashtables (which is compiled as a built-in object when IPv6 is
enabled) it also needs to be in a built-in object file as well. This
moves ipv6_rcv_saddr_equal into inet_hashtables to accomplish this.Signed-off-by: Craig Gallek
Signed-off-by: David S. Miller
04 Dec, 2015
2 commits
-
Conflicts:
drivers/net/ethernet/renesas/ravb_main.c
kernel/bpf/syscall.c
net/ipv4/ipmr.cAll three conflicts were cases of overlapping changes.
Signed-off-by: David S. Miller
-
While testing the np->opt RCU conversion, I found that UDP/IPv6 was
using a mixture of xchg() and sk_dst_lock to protect concurrent changes
to sk->sk_dst_cache, leading to possible corruptions and crashes.ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest
way to fix the mess is to remove sk_dst_lock completely, as we did for
IPv4.__ip6_dst_store() and ip6_dst_store() share same implementation.
sk_setup_caps() being called with socket lock being held or not,
we have to use sk_dst_set() instead of __sk_dst_set()Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);"
in ip6_dst_store() before the sk_setup_caps(sk, dst) call.This is because ip6_dst_store() can be called from process context,
without any lock held.As soon as the dst is installed in sk->sk_dst_cache, dst can be freed
from another cpu doing a concurrent ip6_dst_store()Doing the dst dereference before doing the install is needed to make
sure no use after free would trigger.Signed-off-by: Eric Dumazet
Reported-by: Dmitry Vyukov
Signed-off-by: David S. Miller
03 Dec, 2015
1 commit
-
This patch addresses multiple problems :
UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())This patch adds full RCU protection to np->opt
Reported-by: Dmitry Vyukov
Signed-off-by: Eric Dumazet
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
02 Dec, 2015
1 commit
-
This patch is a cleanup to make following patch easier to
review.Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
from (struct socket)->flags to a (struct socket_wq)->flags
to benefit from RCU protection in sock_wake_async()To ease backports, we rename both constants.
Two new helpers, sk_set_bit(int nr, struct sock *sk)
and sk_clear_bit(int net, struct sock *sk) are added so that
following patch can change their implementation.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
01 Dec, 2015
1 commit
-
The memory barrier in the helper wq_has_sleeper is needed by just
about every user of waitqueue_active. This patch generalises it
by making it take a wait_queue_head_t directly. The existing
helper is renamed to skwq_has_sleeper.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
03 Nov, 2015
1 commit
-
IPv6 request sockets store a pointer to skb containing the SYN packet
to be able to transfer it to full blown socket when 3WHS is done
(ireq->pktopts -> np->pktoptions)As explained in commit 5e0724d027f0 ("tcp/dccp: fix hashdance race for
passive sessions"), we must transfer the skb only if we won the
hashdance race, if multiple cpus receive the 'ack' packet completing
3WHS at the same time.Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
02 Nov, 2015
1 commit
-
This patch changes the use of struct timespec in
dccp_probe to use struct timespec64 instead. timespec uses a 32-bit
seconds field which will overflow in the year 2038 and beyond. timespec64
uses a 64-bit seconds field. Note that the correctness of the code isn't
changed, since the original code only uses the timestamps to compute a
small elapsed interval. This patch is part of a larger attempt to remove
instances of 32-bit timekeeping structures (timespec, timeval, time_t)
from the kernel so it is easier to identify where the real 2038 issues
are.Signed-off-by: Tina Ruchandani
Signed-off-by: David S. Miller
23 Oct, 2015
1 commit
-
Multiple cpus can process duplicates of incoming ACK messages
matching a SYN_RECV request socket. This is a rare event under
normal operations, but definitely can happen.Only one must win the race, otherwise corruption would occur.
To fix this without adding new atomic ops, we use logic in
inet_ehash_nolisten() to detect the request was present in the same
ehash bucket where we try to insert the new child.If request socket was not found, we have to undo the child creation.
This actually removes a spin_lock()/spin_unlock() pair in
reqsk_queue_unlink() for the fast path.Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
16 Oct, 2015
2 commits
-
Let's reduce the confusion about inet_csk_reqsk_queue_drop() :
In many cases we also need to release reference on request socket,
so add a helper to do this, reducing code size and complexity.Fixes: 4bdc3d66147b ("tcp/dccp: fix behavior of stale SYN_RECV request sockets")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
This reverts commit c69736696cf3742b37d850289dc0d7ead177bb14.
At the time of above commit, tcp_req_err() and dccp_req_err()
were dead code, as SYN_RECV request sockets were not yet in ehash table.Real bug was fixed later in a different commit.
We need to revert to not leak a refcount on request socket.
inet_csk_reqsk_queue_drop_and_put() will be added
in following commit to make clean inet_csk_reqsk_queue_drop()
does not release the reference owned by caller.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
14 Oct, 2015
1 commit
-
When a TCP/DCCP listener is closed, its pending SYN_RECV request sockets
become stale, meaning 3WHS can not complete.But current behavior is wrong :
incoming packets finding such stale sockets are dropped.We need instead to cleanup the request socket and perform another
lookup :
- Incoming ACK will give a RST answer,
- SYN rtx might find another listener if available.
- We expedite cleanup of request sockets and old listener socket.Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
09 Oct, 2015
1 commit
-
This patch makes dccp_bad_service_code return bool due to these
particular functions only using either one or zero as their return
value.dccp_list_has_service is also been made return bool in this patchset.
No functional change.
Signed-off-by: Yaowei Bai
Signed-off-by: David S. Miller
05 Oct, 2015
1 commit
-
inet_reqsk_alloc() is used to allocate a temporary request
in order to generate a SYNACK with a cookie. Then later,
syncookie validation also uses a temporary request.These paths already took a reference on listener refcount,
we can avoid a couple of atomic operations.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
03 Oct, 2015
1 commit
-
In this patch, we insert request sockets into TCP/DCCP
regular ehash table (where ESTABLISHED and TIMEWAIT sockets
are) instead of using the per listener hash table.ACK packets find SYN_RECV pseudo sockets without having
to find and lock the listener.In nominal conditions, this halves pressure on listener lock.
Note that this will allow for SO_REUSEPORT refinements,
so that we can select a listener using cpu/numa affinities instead
of the prior 'consistent hash', since only SYN packets will
apply this selection logic.We will shrink listen_sock in the following patch to ease
code review.Signed-off-by: Eric Dumazet
Cc: Ying Cai
Cc: Willem de Bruijn
Signed-off-by: David S. Miller
30 Sep, 2015
4 commits
-
We'll soon no longer hold listener socket lock, these
functions do not modify the socket in any way.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
socket no longer needs to be read/write
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Before changing dccp_v6_request_recv_sock() sock argument
to const, we need to get rid of security_sk_classify_flow(),
and it seems doable by reusing inet6_csk_route_req() helper.We need to add a proto parameter to inet6_csk_route_req(),
not assume it is TCP.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
None of these functions need to change the socket, make it
const.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
27 Sep, 2015
1 commit
-
Conflicts:
net/ipv4/arp.cThe net/ipv4/arp.c conflict was one commit adding a new
local variable while another commit was deleting one.Signed-off-by: David S. Miller
26 Sep, 2015
2 commits
-
This is done to make sure we do not change listener socket
while sending SYNACK packets while socket lock is not held.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Like tcp_make_synack() the only time we might change the socket is
when calling sock_wmalloc(), which is using atomic operation to
update sk->sk_wmem_allocAlso use MAX_DCCP_HEADER as both IPv4/IPv6 use this value for max_header.
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller