31 Jul, 2016
1 commit
-
Using list_move() instead of list_del() + list_add().
Signed-off-by: Wei Yongjun
Signed-off-by: David S. Miller
30 Jul, 2016
1 commit
-
Pull security subsystem updates from James Morris:
"Highlights:- TPM core and driver updates/fixes
- IPv6 security labeling (CALIPSO)
- Lots of Apparmor fixes
- Seccomp: remove 2-phase API, close hole where ptrace can change
syscall #"* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits)
apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling
tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family)
tpm: Factor out common startup code
tpm: use devm_add_action_or_reset
tpm2_i2c_nuvoton: add irq validity check
tpm: read burstcount from TPM_STS in one 32-bit transaction
tpm: fix byte-order for the value read by tpm2_get_tpm_pt
tpm_tis_core: convert max timeouts from msec to jiffies
apparmor: fix arg_size computation for when setprocattr is null terminated
apparmor: fix oops, validate buffer size in apparmor_setprocattr()
apparmor: do not expose kernel stack
apparmor: fix module parameters can be changed after policy is locked
apparmor: fix oops in profile_unpack() when policy_db is not present
apparmor: don't check for vmalloc_addr if kvzalloc() failed
apparmor: add missing id bounds check on dfa verification
apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task
apparmor: use list_next_entry instead of list_entry_next
apparmor: fix refcount race when finding a child profile
apparmor: fix ref count leak when profile sha1 hash is read
apparmor: check that xindex is in trans_table bounds
...
27 Jul, 2016
1 commit
-
Currently lastuse is updated on entry creation and cache hit, but it should
also be updated on entry change. Since both on add and update the ttl array
is updated we can simply update the lastuse in ipmr_update_thresholds.Signed-off-by: Nikolay Aleksandrov
CC: Roopa Prabhu
CC: Donald Sharp
CC: David S. Miller
Signed-off-by: David S. Miller
26 Jul, 2016
2 commits
-
After a612769774a3 ("udp: prevent bugcheck if filter truncates packet
too much"), there followed various other fixes for similar cases such
as f4979fcea7fd ("rose: limit sk_filter trim to payload").Latter introduced a new helper sk_filter_trim_cap(), where we can pass
the trim limit directly to the socket filter handling. Make use of it
here as well with sizeof(struct udphdr) as lower cap limit and drop the
extra skb->len test in UDP's input path.Signed-off-by: Daniel Borkmann
Cc: Willem de Bruijn
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller -
Default kernel behavior is to delete IPv6 addresses on link
down, which entails deletion of the multicast and the
subnet-router anycast addresses. These deletions do not
happen with sysctl setting to keep global IPv6 addresses on
link down, so every link down/up causes an increment of the
anycast and multicast refcounts. These bogus refcounts may
stop these addrs from being removed on subsequent calls to
delete them. The solution is to leave the groups for the
multicast and subnet anycast on link down for the callflow
when global IPv6 addresses are kept.Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Signed-off-by: Mike Manning
Acked-by: David Ahern
Signed-off-by: David S. Miller
25 Jul, 2016
1 commit
-
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-nextThe following patchset contains Netfilter/IPVS updates for net-next,
they are:1) Count pre-established connections as active in "least connection"
schedulers such that pre-established connections to avoid overloading
backend servers on peak demands, from Michal Kubecek via Simon Horman.2) Address a race condition when resizing the conntrack table by caching
the bucket size when fulling iterating over the hashtable in these
three possible scenarios: 1) dump via /proc/net/nf_conntrack,
2) unlinking userspace helper and 3) unlinking custom conntrack timeout.
From Liping Zhang.3) Revisit early_drop() path to perform lockless traversal on conntrack
eviction under stress, use del_timer() as synchronization point to
avoid two CPUs evicting the same entry, from Florian Westphal.4) Move NAT hlist_head to nf_conn object, this simplifies the existing
NAT extension and it doesn't increase size since recent patches to
align nf_conn, from Florian.5) Use rhashtable for the by-source NAT hashtable, also from Florian.
6) Don't allow --physdev-is-out from OUTPUT chain, just like
--physdev-out is not either, from Hangbin Liu.7) Automagically set on nf_conntrack counters if the user tries to
match ct bytes/packets from nftables, from Liping Zhang.8) Remove possible_net_t fields in nf_tables set objects since we just
simply pass the net pointer to the backend set type implementations.9) Fix possible off-by-one in h323, from Toby DiPasquale.
10) early_drop() may be called from ctnetlink patch, so we must hold
rcu read size lock from them too, this amends Florian's patch #3
coming in this batch, from Liping Zhang.11) Use binary search to validate jump offset in x_tables, this
addresses the O(n!) validation that was introduced recently
resolve security issues with unpriviledge namespaces, from Florian.12) Fix reference leak to connlabel in error path of nft_ct, from Zhang.
13) Three updates for nft_log: Fix log prefix leak in error path. Bail
out on loglevel larger than debug in nft_log and set on the new
NF_LOG_F_COPY_LEN flag when snaplen is specified. Again from Zhang.14) Allow to filter rule dumps in nf_tables based on table and chain
names.15) Simplify connlabel to always use 128 bits to store labels and
get rid of unused function in xt_connlabel, from Florian.16) Replace set_expect_timeout() by mod_timer() from the h323 conntrack
helper, by Gao Feng.17) Put back x_tables module reference in nft_compat on error, from
Liping Zhang.18) Add a reference count to the x_tables extensions cache in
nft_compat, so we can remove them when unused and avoid a crash
if the extensions are rmmod, again from Zhang.
====================Signed-off-by: David S. Miller
24 Jul, 2016
1 commit
-
Just several instances of overlapping changes.
Signed-off-by: David S. Miller
19 Jul, 2016
1 commit
-
The dummy ruleset I used to test the original validation change was broken,
most rules were unreachable and were not tested by mark_source_chains().In some cases rulesets that used to load in a few seconds now require
several minutes.sample ruleset that shows the behaviour:
echo "*filter"
for i in $(seq 0 100000);do
printf ":chain_%06x - [0:0]\n" $i
done
for i in $(seq 0 100000);do
printf -- "-A INPUT -j chain_%06x\n" $i
printf -- "-A INPUT -j chain_%06x\n" $i
printf -- "-A INPUT -j chain_%06x\n" $i
done
echo COMMIT[ pipe result into iptables-restore ]
This ruleset will be about 74mbyte in size, with ~500k searches
though all 500k[1] rule entries. iptables-restore will take forever
(gave up after 10 minutes)Instead of always searching the entire blob for a match, fill an
array with the start offsets of every single ipt_entry struct,
then do a binary search to check if the jump target is present or not.After this change ruleset restore times get again close to what one
gets when reverting 36472341017529e (~3 seconds on my workstation).[1] every user-defined rule gets an implicit RETURN, so we get
300k jumps + 100k userchains + 100k returns -> 500k rule entriesFixes: 36472341017529e ("netfilter: x_tables: validate targets of jumps")
Reported-by: Jeff Wu
Tested-by: Jeff Wu
Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso
17 Jul, 2016
1 commit
-
In preparation for hardware offloading of ipmr/ip6mr we need an
interface that allows to check (and later update) the age of entries.
Relying on stats alone can show activity but not actual age of the entry,
furthermore when there're tens of thousands of entries a lot of the
hardware implementations only support "hit" bits which are cleared on
read to denote that the entry was active and shouldn't be aged out,
these can then be naturally translated into age timestamp and will be
compatible with the software forwarding age. Using a lastuse entry doesn't
affect performance because the members in that cache line are written to
along with the age.
Since all new users are encouraged to use ipmr via netlink, this is
exported via the RTA_EXPIRES attribute.
Also do a minor local variable declaration style adjustment - arrange them
longest to shortest.Signed-off-by: Nikolay Aleksandrov
CC: Roopa Prabhu
CC: Shrijeet Mukherjee
CC: Satish Ashok
CC: Donald Sharp
CC: David S. Miller
CC: Alexey Kuznetsov
CC: James Morris
CC: Hideaki YOSHIFUJI
CC: Patrick McHardy
Signed-off-by: David S. Miller
12 Jul, 2016
1 commit
-
If socket filter truncates an udp packet below the length of UDP header
in udpv6_queue_rcv_skb() or udp_queue_rcv_skb(), it will trigger a
BUG_ON in skb_pull_rcsum(). This BUG_ON (and therefore a system crash if
kernel is configured that way) can be easily enforced by an unprivileged
user which was reported as CVE-2016-6162. For a reproducer, see
http://seclists.org/oss-sec/2016/q3/8Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Reported-by: Marco Grassi
Signed-off-by: Michal Kubecek
Acked-by: Eric Dumazet
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller
10 Jul, 2016
2 commits
-
All inet6_netconf_notify_devconf() callers are in process context,
so we can use GFP_KERNEL allocations if we take care of not holding
a rwlock while not needed in ip6mr (we hold RTNL there)Fixes: d67b8c616b48 ("netconf: advertise mc_forwarding status")
Fixes: f3a1bfb11ccb ("rtnl/ipv6: use netconf msg to advertise forwarding status")
Signed-off-by: Eric Dumazet
Cc: Nicolas Dichtel
Acked-by: Nicolas Dichtel
Signed-off-by: David S. Miller -
Extend the SIT driver to support MPLS over IPv4. This implementation
extends existing support for IPv6 over IPv4 and IPv4 over IPv4.Signed-off-by: Simon Horman
Reviewed-by: Dinan Gunawardena
Signed-off-by: David S. Miller
07 Jul, 2016
3 commits
-
Conflicts:
drivers/net/ethernet/mellanox/mlx5/core/en.h
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
drivers/net/usb/r8152.cAll three conflicts were overlapping changes.
Signed-off-by: David S. Miller
-
Pablo Neira Ayuso says:
====================
Netfilter updates for net-nextThe following patchset contains Netfilter updates for net-next,
they are:1) Don't use userspace datatypes in bridge netfilter code, from
Tobin Harding.2) Iterate only once over the expectation table when removing the
helper module, instead of once per-netns, from Florian Westphal.3) Extra sanitization in xt_hook_ops_alloc() to return error in case
we ever pass zero hooks, xt_hook_ops_alloc():4) Handle NFPROTO_INET from the logging core infrastructure, from
Liping Zhang.5) Autoload loggers when TRACE target is used from rules, this doesn't
change the behaviour in case the user already selected nfnetlink_log
as preferred way to print tracing logs, also from Liping Zhang.6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields
by cache lines, increases the size of entries in 11% per entry.
From Florian Westphal.7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian.
8) Remove useless defensive check in nf_logger_find_get() from Shivani
Bhardwaj.9) Remove zone extension as place it in the conntrack object, this is
always include in the hashing and we expect more intensive use of
zones since containers are in place. Also from Florian Westphal.10) Owner match now works from any namespace, from Eric Bierdeman.
11) Make sure we only reply with TCP reset to TCP traffic from
nf_reject_ipv4, patch from Liping Zhang.12) Introduce --nflog-size to indicate amount of network packet bytes
that are copied to userspace via log message, from Vishwanath Pai.
This obsoletes --nflog-range that has never worked, it was designed
to achieve this but it has never worked.13) Introduce generic macros for nf_tables object generation masks.
14) Use generation mask in table, chain and set objects in nf_tables.
This allows fixes interferences with ongoing preparation phase of
the commit protocol and object listings going on at the same time.
This update is introduced in three patches, one per object.15) Check if the object is active in the next generation for element
deactivation in the rbtree implementation, given that deactivation
happens from the commit phase path we have to observe the future
status of the object.16) Support for deletion of just added elements in the hash set type.
17) Allow to resize hashtable from /proc entry, not only from the
obscure /sys entry that maps to the module parameter, from Florian
Westphal.18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised
anymore since we tear down the ruleset whenever the netdevice
goes away.19) Support for matching inverted set lookups, from Arturo Borrero.
20) Simplify the iptables_mangle_hook() by removing a superfluous
extra branch.21) Introduce ether_addr_equal_masked() and use it from the netfilter
codebase, from Joe Perches.22) Remove references to "Use netfilter MARK value as routing key"
from the Netfilter Kconfig description given that this toggle
doesn't exists already for 10 years, from Moritz Sichert.23) Introduce generic NF_INVF() and use it from the xtables codebase,
from Joe Perches.24) Setting logger to NONE via /proc was not working unless explicit
nul-termination was included in the string. This fixes seems to
leave the former behaviour there, so we don't break backward.
====================Signed-off-by: David S. Miller
06 Jul, 2016
1 commit
-
It was first reported and reproduced by Petr (thanks!) in
https://bugzilla.kernel.org/show_bug.cgi?id=119581free_percpu(rt->rt6i_pcpu) used to always happen in ip6_dst_destroy().
However, after fixing a deadlock bug in
commit 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt"),
free_percpu() is not called before setting non_pcpu_rt->rt6i_pcpu to NULL.It is worth to note that rt6i_pcpu is protected by table->tb6_lock.
kmemleak somehow did not report it. We nailed it down by
observing the pcpu entries in /proc/vmallocinfo (first suggested
by Hannes, thanks!).Signed-off-by: Martin KaFai Lau
Fixes: 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt")
Reported-by: Petr Novopashenniy
Tested-by: Petr Novopashenniy
Acked-by: Hannes Frederic Sowa
Cc: Hannes Frederic Sowa
Cc: Petr Novopashenniy
Signed-off-by: David S. Miller
03 Jul, 2016
1 commit
-
netfilter uses multiple FWINV #defines with identical form that hide a
specific structure variable and dereference it with a invflags member.$ git grep "#define FWINV"
include/linux/netfilter_bridge/ebtables.h:#define FWINV(bool,invflg) ((bool) ^ !!(info->invflags & invflg))
net/bridge/netfilter/ebtables.c:#define FWINV2(bool, invflg) ((bool) ^ !!(e->invflags & invflg))
net/ipv4/netfilter/arp_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(arpinfo->invflags & (invflg)))
net/ipv4/netfilter/ip_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ipinfo->invflags & (invflg)))
net/ipv6/netfilter/ip6_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ip6info->invflags & (invflg)))
net/netfilter/xt_tcpudp.c:#define FWINVTCP(bool, invflg) ((bool) ^ !!(tcpinfo->invflags & (invflg)))Consolidate these macros into a single NF_INVF macro.
Miscellanea:
o Neaten the alignment around these uses
o A few lines are > 80 columns for intelligibilitySigned-off-by: Joe Perches
Signed-off-by: Pablo Neira Ayuso
01 Jul, 2016
2 commits
-
No need for a special case to handle NF_INET_POST_ROUTING, this is
basically the same handling as for prerouting, input, forward.Signed-off-by: Pablo Neira Ayuso
-
Some arches have virtually mapped kernel stacks, or will soon have.
tcp_md5_hash_header() uses an automatic variable to copy tcp header
before mangling th->check and calling crypto function, which might
be problematic on such arches.David says that using percpu storage is also problematic on non SMP
builds.Just use kmalloc() to allocate scratch areas.
Signed-off-by: Eric Dumazet
Reported-by: Andy Lutomirski
Signed-off-by: David S. Miller
30 Jun, 2016
1 commit
-
Several cases of overlapping changes, except the packet scheduler
conflicts which deal with the addition of the free list parameter
to qdisc_enqueue().Signed-off-by: David S. Miller
28 Jun, 2016
13 commits
-
This fixes wrong-interface signaling on 32-bit platforms for entries
created when jiffies > 2^31 + MFC_ASSERT_THRESH.Signed-off-by: Tom Goff
Signed-off-by: David S. Miller -
This works in exactly the same way as the CIPSO label cache.
The idea is to allow the lsm to cache the result of a secattr
lookup so that it doesn't need to perform the lookup for
every skbuff.It introduces two sysctl controls:
calipso_cache_enable - enables/disables the cache.
calipso_cache_bucket_size - sets the size of a cache bucket.Signed-off-by: Huw Davies
Signed-off-by: Paul Moore -
Lengths, checksum and the DOI are checked. Checking of the
level and categories are left for the socket layer.CRC validation is performed in the calipso module to avoid
unconditionally linking crc_ccitt() into ipv6.Signed-off-by: Huw Davies
Signed-off-by: Paul Moore -
In some cases, the lsm needs to add the label to the skbuff directly.
A NF_INET_LOCAL_OUT IPv6 hook is added to selinux to match the IPv4
behaviour. This allows selinux to label the skbuffs that it requires.Signed-off-by: Huw Davies
Signed-off-by: Paul Moore -
Signed-off-by: Huw Davies
Signed-off-by: Paul Moore -
Request sockets need to have a label that takes into account the
incoming connection as well as their parent's label. This is used
for the outgoing SYN-ACK and for their child full-socket.Signed-off-by: Huw Davies
Signed-off-by: Paul Moore -
If set, these will take precedence over the parent's options during
both sending and child creation. If they're not set, the parent's
options (if any) will be used.This is to allow the security_inet_conn_request() hook to modify the
IPv6 options in just the same way that it already may do for IPv4.Signed-off-by: Huw Davies
Signed-off-by: Paul Moore -
CALIPSO is a hop-by-hop IPv6 option. A lot of this patch is based on
the equivalent CISPO code. The main difference is due to manipulating
the options in the hop-by-hop header.Signed-off-by: Huw Davies
Signed-off-by: Paul Moore -
The functionality is equivalent to ipv6_renew_options() except
that the newopt pointer is in kernel, not user, memoryThe kernel memory implementation will be used by the CALIPSO network
labelling engine, which needs to be able to set IPv6 hop-by-hop
options.Signed-off-by: Huw Davies
Signed-off-by: Paul Moore -
Remove a specified DOI through the NLBL_CALIPSO_C_REMOVE command.
It requires the attribute:
NLBL_CALIPSO_A_DOI.Signed-off-by: Huw Davies
Signed-off-by: Paul Moore -
Enumerate the DOI list through the NLBL_CALIPSO_C_LISTALL command.
It takes no attributes.Signed-off-by: Huw Davies
Signed-off-by: Paul Moore -
Query a specified DOI through the NLBL_CALIPSO_C_LIST command.
It requires the attribute:
NLBL_CALIPSO_A_DOI.The reply will contain:
NLBL_CALIPSO_A_MTYPESigned-off-by: Huw Davies
Signed-off-by: Paul Moore -
CALIPSO is a packet labelling protocol for IPv6 which is very similar
to CIPSO. It is specified in RFC 5570. Much of the code is based on
the current CIPSO code.This adds support for adding passthrough-type CALIPSO DOIs through the
NLBL_CALIPSO_C_ADD command. It requires attributes:NLBL_CALIPSO_A_TYPE which must be CALIPSO_MAP_PASS.
NLBL_CALIPSO_A_DOI.In passthrough mode the CALIPSO engine will map MLS secattr levels
and categories directly to the packet label.At this stage, the major difference between this and the CIPSO
code is that IPv6 may be compiled as a module. To allow for
this the CALIPSO functions are registered at module init time.Signed-off-by: Huw Davies
Signed-off-by: Paul Moore
27 Jun, 2016
1 commit
-
with the commit 8c14586fc320 ("net: ipv6: Use passed in table for
nexthop lookups"), net hop lookup is first performed on route creation
in the passed-in table.
However device match is not enforced in table lookup, so the found
route can be later discarded due to egress device mismatch and no
global lookup will be performed.
This cause the following to fail:ip link add dummy1 type dummy
ip link add dummy2 type dummy
ip link set dummy1 up
ip link set dummy2 up
ip route add 2001:db8:8086::/48 dev dummy1 metric 20
ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy1 metric 20
ip route add 2001:db8:8086::/48 dev dummy2 metric 21
ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy2 metric 21
RTNETLINK answers: No route to hostThis change fixes the issue enforcing device lookup in
ip6_nh_lookup_table()v1->v2: updated commit message title
Fixes: 8c14586fc320 ("net: ipv6: Use passed in table for nexthop lookups")
Reported-and-tested-by: Beniamino Galvani
Signed-off-by: Paolo Abeni
Acked-by: David Ahern
Signed-off-by: David S. Miller
19 Jun, 2016
4 commits
-
When receiving an ICMPv4 message containing extensions as
defined in RFC 4884, and translating it to ICMPv6 at SIT
or GRE tunnel, we need some extra manipulation in order
to properly forward the extensions.This patch only takes care of Time Exceeded messages as they
are the ones that typically carry information from various
routers in a fabric during a traceroute session.It also avoids complex skb logic if the data_len is not
a multiple of 8.RFC states :
The "original datagram" field MUST contain at least 128 octets.
If the original datagram did not contain 128 octets, the
"original datagram" field MUST be zero padded to 128 octets.In practice routers use 128 bytes of original datagram, not more.
Initial translation was added in commit ca15a078bd90
("sit: generate icmpv6 error when receiving icmpv4 error")Signed-off-by: Eric Dumazet
Cc: Oussama Ghorbel
Signed-off-by: David S. Miller -
For better traceroute/mtr support for SIT and GRE tunnels,
we translate IPV4 ICMP ICMP_TIME_EXCEEDED to ICMPV6_TIME_EXCEEDWe also have to translate the IPv4 source IP address of ICMP
message to IPv6 v4mapped.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
We want to use this helper from GRE as well, so this is
the time to move it in net/ipv6/icmp.cAlso add a @nhs parameter, since SIT and GRE have different
values for the header(s) to skip.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
SIT or GRE tunnels might want to translate an IPV4 address
into a v4mapped one when translating ICMP to ICMPv6.This patch adds the parameter to icmp6_send() but
does not change icmpv6_send() signature.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
18 Jun, 2016
2 commits
-
IPv6 version of 3f2fb9a834cb ("net: l3mdev: address selection should only
consider devices in L3 domain") and the follow up commit, a17b693cdd876
("net: l3mdev: prefer VRF master for source address selection").That is, if outbound device is given then the address preference order
is an address from that device, an address from the master device if it
is enslaved, and then an address from a device in the same L3 domain.Signed-off-by: David Ahern
Signed-off-by: David S. Miller -
IPv6 source address selection needs to consider the real egress route.
Similar to IPv4 implement a get_saddr6 method which is called if
source address has not been set. The get_saddr6 method does a full
lookup which means pulling a route from the VRF FIB table and properly
considering linklocal/multicast destination addresses. Lookup failures
(eg., unreachable) then cause the source address selection to fail
which gets propagated back to the caller.Signed-off-by: David Ahern
Signed-off-by: David S. Miller