05 Oct, 2013
1 commit
-
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter updates for your net-next tree,
mostly ipset improvements and enhancements features, they are:* Don't call ip_nest_end needlessly in the error path from me, suggested
by Pablo Neira Ayuso, from Jozsef Kadlecsik.* Fixed sparse warnings about shadowed variable and missing rcu annotation
and fix of "may be used uninitialized" warnings, also from Jozsef.* Renamed simple macro names to avoid namespace issues, reported by David
Laight, again from Jozsef.* Use fix sized type for timeout in the extension part, and cosmetic
ordering of matches and targets separatedly in xt_set.c, from Jozsef.* Support package fragments for IPv4 protos without ports from Anders K.
Pedersen. For example this allows a hash:ip,port ipset containing the
entry 192.168.0.1,gre:0 to match all package fragments for PPTP VPN
tunnels to/from the host. Without this patch only the first package
fragment (with fragment offset 0) was matched.* Introduced a new operation to get both setname and family, from Jozsef.
ip[6]tables set match and SET target need to know the family of the set
in order to reject adding rules which refer to a set with a non-mathcing
family. Currently such rules are silently accepted and then ignored
instead of generating an error message to the user.* Reworked extensions support in ipset types from Jozsef. The approach of
defining structures with all variations is not manageable as the
number of extensions grows. Therefore a blob for the extensions is
introduced, somewhat similar to conntrack. The support of extensions
which need a per data destroy function is added as well.* When an element timed out in a list:set type of set, the garbage
collector skipped the checking of the next element. So the purging
was delayed to the next run of the gc, fixed by Jozsef.* A small Kconfig fix: NETFILTER_NETLINK cannot be selected and
ipset requires it.* hash:net,net type from Oliver Smith. The type provides the ability to
store pairs of subnets in a set.* Comment for ipset entries from Oliver Smith. This makes possible to
annotate entries in a set with comments, for example:ipset n foo hash:net,net comment
ipset a foo 10.0.0.0/21,192.168.1.0/24 comment "office nets A and B"* Fix of hash types resizing with comment extension from Jozsef.
* Fix of new extensions for list:set type when an element is added
into a slot from where another element was pushed away from Jozsef.* Introduction of a common function for the listing of the element
extensions from Jozsef.* Net namespace support for ipset from Vitaly Lavrov.
* hash:net,port,net type from Oliver Smith, which makes possible
to store the triples of two subnets and a protocol, port pair in
a set.* Get xt_TCPMSS working with net namespace, by Gao feng.
* Use the proper net netnamespace to allocate skbs, also by Gao feng.
* A couple of cleanups for the conntrack SIP helper, by Holger
Eitzenberger.* Extend cttimeout to allow setting default conntrack timeouts via
nfnetlink, so we can get rid of all our sysctl/proc interfaces in
the future for timeout tuning, from me.
====================Signed-off-by: David S. Miller
02 Oct, 2013
1 commit
-
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter/IPVS fixes for your net
tree, they are:* Fix BUG_ON splat due to malformed TCP packets seen by synproxy, from
Patrick McHardy.* Fix possible weight overflow in lblc and lblcr schedulers due to
32-bits arithmetics, from Simon Kirby.* Fix possible memory access race in the lblc and lblcr schedulers,
introduced when it was converted to use RCU, two patches from
Julian Anastasov.* Fix hard dependency on CPU 0 when reading per-cpu stats in the
rate estimator, from Julian Anastasov.* Fix race that may lead to object use after release, when invoking
ipvsadm -C && ipvsadm -R, introduced when adding RCU, from Julian
Anastasov.
====================Signed-off-by: David S. Miller
01 Oct, 2013
29 commits
-
Default timeouts are currently set via proc/sysctl interface, the
typical pattern is a file name like:/proc/sys/net/netfilter/nf_conntrack_PROTOCOL_timeout_STATE
This results in one entry per default protocol state timeout.
This patch simplifies this by allowing to set default protocol
timeouts via cttimeout netlink interface.This should allow us to get rid of the existing proc/sysctl code
in the midterm.Signed-off-by: Pablo Neira Ayuso
-
There are currently seven different NAT hooks used in both
nf_conntrack_sip and nf_nat_sip, each of the hooks is exported in
nf_conntrack_sip, then set from the nf_nat_sip NAT helper.And because each of them is exported there is quite some overhead
introduced due of this.By introducing nf_nat_sip_hooks I am able to reduce both text/data
somewhat. For nf_conntrack_sip e. g. I gettext data bss dec
old 15243 5256 32 20531
new 15010 5192 32 20234Signed-off-by: Holger Eitzenberger
Signed-off-by: Pablo Neira Ayuso -
Use proper net struct to allocate skb, otherwise
netlink mmap will be of no effect.Signed-off-by: Gao feng
Signed-off-by: Pablo Neira Ayuso -
Use proper net struct to allocate skb, otherwise netlink mmap
will have no effect.Signed-off-by: Gao feng
Signed-off-by: Pablo Neira Ayuso -
This adds a new set that provides similar functionality to ip,port,net
but permits arbitrary size subnets for both the first and last
parameter.Signed-off-by: Oliver Smith
Signed-off-by: Jozsef Kadlecsik -
This patch adds netns support for ipset.
Major changes were made in ip_set_core.c and ip_set.h.
Global variables are moved to per net namespace.
Added initialization code and the destruction of the network namespace ipset subsystem.
In the prototypes of public functions ip_set_* added parameter "struct net*".The remaining corrections related to the change prototypes of public functions ip_set_*.
The patch for git://git.netfilter.org/ipset.git commit 6a4ec96c0b8caac5c35474e40e319704d92ca347
Signed-off-by: Vitaly Lavrov
Signed-off-by: Jozsef Kadlecsik -
Signed-off-by: Jozsef Kadlecsik
-
The new extensions require zero initialization for the new element
to be added into a slot from where another element was pushed away.Signed-off-by: Jozsef Kadlecsik
-
The destroy function must take into account that resizing doesn't
create new extensions so those cannot be destroyed at resize.Signed-off-by: Jozsef Kadlecsik
-
This provides kernel support for creating ipsets with comment support.
This does incur a penalty to flushing/destroying an ipset since all
entries are walked in order to free the allocated strings, this penalty
is of course less expensive than the operation of listing an ipset to
userspace, so for general-purpose usage the overall impact is expected
to be little to none.Signed-off-by: Oliver Smith
Signed-off-by: Jozsef Kadlecsik -
This provides kernel support for creating list ipsets with the comment
annotation extension.Signed-off-by: Oliver Smith
Signed-off-by: Jozsef Kadlecsik -
This provides kernel support for creating bitmap ipsets with comment
support.As is the case for hashes, this incurs a penalty when flushing or
destroying the entire ipset as the entries must first be walked in order
to free the comment strings. This penalty is of course far less than the
cost of listing an ipset to userspace. Any set created without support
for comments will be flushed/destroyed as before.Signed-off-by: Oliver Smith
Signed-off-by: Jozsef Kadlecsik -
This adds the core support for having comments on ipset entries.
The comments are stored as standard null-terminated strings in
dynamically allocated memory after being passed to the kernel. As a
result of this, code has been added to the generic destroy function to
iterate all extensions and call that extension's destroy task if the set
has that extension activated, and if such a task is defined.Signed-off-by: Oliver Smith
Signed-off-by: Jozsef Kadlecsik -
This adds a new set that provides the ability to configure pairs of
subnets. A small amount of additional handling code has been added to
the generic hash header file - this code is conditionally activated by a
preprocessor definition.Signed-off-by: Oliver Smith
Signed-off-by: Jozsef Kadlecsik -
Signed-off-by: Jozsef Kadlecsik
-
When an element timed out, the next one was skipped by the garbage
collector, fixed.Signed-off-by: Jozsef Kadlecsik
-
Signed-off-by: Jozsef Kadlecsik
-
Get rid of the structure based extensions and introduce a blob for
the extensions. Thus we can support more extension types easily.Signed-off-by: Jozsef Kadlecsik
-
Default timeout and extension offsets are moved to struct set, because
all set types supports all extensions and it makes possible to generalize
extension support.Signed-off-by: Jozsef Kadlecsik
-
Signed-off-by: Jozsef Kadlecsik
-
In order to support hash:net,net, hash:net,port,net etc. types,
arrays are introduced for the book-keeping of existing cidr sizes
and network numbers in a set.Signed-off-by: Jozsef Kadlecsik
-
ip[6]tables set match and SET target need to know the family of the set
in order to reject adding rules which refer to a set with a non-mathcing
family. Currently such rules are silently accepted and then ignored
instead of generating a clear error message to the user, which is not
helpful.Signed-off-by: Jozsef Kadlecsik
-
Signed-off-by: Jozsef Kadlecsik
-
Enable ipset port set types to match IPv4 package fragments for
protocols that doesn't have ports (or the port information isn't
supported by ipset).For example this allows a hash:ip,port ipset containing the entry
192.168.0.1,gre:0 to match all package fragments for PPTP VPN tunnels
to/from the host. Without this patch only the first package fragment
(with fragment offset 0) was matched, while subsequent fragments wasn't.This is not possible for IPv6, where the protocol is in the fragmented
part of the package unlike IPv4, where the protocol is in the IP header.IPPROTO_ICMPV6 is deliberately not included, because it isn't relevant
for IPv4.Signed-off-by: Anders K. Pedersen
Signed-off-by: Jozsef Kadlecsik -
Reported-by: Pablo Neira Ayuso
Signed-off-by: Jozsef Kadlecsik -
Reported-by: David Laight
Signed-off-by: Jozsef Kadlecsik -
Reported-by: Pablo Neira Ayuso
Signed-off-by: Jozsef Kadlecsik -
net/netfilter/ipset/ip_set_hash_ipportnet.c:275:20:
warning: symbol 'cidr' shadows an earlier oneSigned-off-by: Jozsef Kadlecsik
-
Suggested-by: Pablo Neira Ayuso
Signed-off-by: Jozsef Kadlecsik
30 Sep, 2013
1 commit
-
TCP packets hitting the SYN proxy through the SYNPROXY target are not
validated by TCP conntrack. When th->doff is below 5, an underflow happens
when calculating the options length, causing skb_header_pointer() to
return NULL and triggering the BUG_ON().Handle this case gracefully by checking for NULL instead of using BUG_ON().
Reported-by: Martin Topholm
Tested-by: Martin Topholm
Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso
27 Sep, 2013
3 commits
-
Otherwise the pmtu will be incorrect.
Signed-off-by: Gao feng
Signed-off-by: Pablo Neira Ayuso -
This patch refactors the code to skip tcpmss_reverse_mtu if no
clamp-mss-to-pmtu is specified.Signed-off-by: Gao feng
Signed-off-by: Pablo Neira Ayuso -
Currently set_expected_rtp_rtcp() in the SIP helper uses
rcu_dereference() two times to access two different NAT hook
functions. However, only the first one is protected by the RCU
reader lock, but the 2nd isn't. Fix it by extending the RCU
protected area.This is more a cosmetic thing since we rely on all netfilter hooks
being rcu_read_lock()ed by nf_hook_slow() in many places anyways,
as Patrick McHardy clarified.Signed-off-by: Holger Eitzenberger
Signed-off-by: Pablo Neira Ayuso
20 Sep, 2013
1 commit
-
If local fragmentation is allowed, then ip_select_ident() and
ip_select_ident_more() need to generate unique IDs to ensure
correct defragmentation on the peer.For example, if IPsec (tunnel mode) has to encrypt large skbs
that have local_df bit set, then all IP fragments that belonged
to different ESP datagrams would have used the same identificator.
If one of these IP fragments would get lost or reordered, then
peer could possibly stitch together wrong IP fragments that did
not belong to the same datagram. This would lead to a packet loss
or data corruption.Signed-off-by: Ansis Atteka
Signed-off-by: David S. Miller
19 Sep, 2013
4 commits
-
When reading percpu stats we need to properly reset
the sum when CPU 0 is not present in the possible mask.Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman -
commit c5549571f975ab ("ipvs: convert lblcr scheduler to rcu")
allows RCU readers to use dest after calling ip_vs_dest_put().
In the corner case it can race with ip_vs_dest_trash_expire()
which can release the dest while it is being returned to the
RCU readers as scheduling result.To fix the problem do not allow e->dest to be replaced and
defer the ip_vs_dest_put() call by using RCU callback. Now
e->dest does not need to be RCU pointer.Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman -
commit c2a4ffb70eef39 ("ipvs: convert lblc scheduler to rcu")
allows RCU readers to use dest after calling ip_vs_dest_put().
In the corner case it can race with ip_vs_dest_trash_expire()
which can release the dest while it is being returned to the
RCU readers as scheduling result.To fix the problem do not allow en->dest to be replaced and
defer the ip_vs_dest_put() call by using RCU callback. Now
en->dest does not need to be RCU pointer.Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman -
commit 578bc3ef1e473a ("ipvs: reorganize dest trash") added
IP_VS_DEST_STATE_REMOVING flag and RCU callback named
ip_vs_dest_wait_readers() to keep dests and services after
removal for at least a RCU grace period. But we have the
following corner cases:- we can not reuse the same dest if its service is removed
while IP_VS_DEST_STATE_REMOVING is still set because another dest
removal in the first grace period can not extend this period.
It can happen when ipvsadm -C && ipvsadm -R is used.- dest->svc can be replaced but ip_vs_in_stats() and
ip_vs_out_stats() have no explicit read memory barriers
when accessing dest->svc. It can happen that dest->svc
was just freed (replaced) while we use it to update
the stats.We solve the problems as follows:
- IP_VS_DEST_STATE_REMOVING is removed and we ensure a fixed
idle period for the dest (IP_VS_DEST_TRASH_PERIOD). idle_start
will remember when for first time after deletion we noticed
dest->refcnt=0. Later, the connections can grab a reference
while in RCU grace period but if refcnt becomes 0 we can
safely free the dest and its svc.- dest->svc becomes RCU pointer. As result, we add explicit
RCU locking in ip_vs_in_stats() and ip_vs_out_stats().- __ip_vs_unbind_svc is renamed to __ip_vs_svc_put(), it
now can free the service immediately or after a RCU grace
period. dest->svc is not set to NULL anymore.As result, unlinked dests and their services are
freed always after IP_VS_DEST_TRASH_PERIOD period, unused
services are freed after a RCU grace period.Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman