05 Oct, 2013

1 commit

  • Pablo Neira Ayuso says:

    ====================
    The following patchset contains Netfilter updates for your net-next tree,
    mostly ipset improvements and enhancements features, they are:

    * Don't call ip_nest_end needlessly in the error path from me, suggested
    by Pablo Neira Ayuso, from Jozsef Kadlecsik.

    * Fixed sparse warnings about shadowed variable and missing rcu annotation
    and fix of "may be used uninitialized" warnings, also from Jozsef.

    * Renamed simple macro names to avoid namespace issues, reported by David
    Laight, again from Jozsef.

    * Use fix sized type for timeout in the extension part, and cosmetic
    ordering of matches and targets separatedly in xt_set.c, from Jozsef.

    * Support package fragments for IPv4 protos without ports from Anders K.
    Pedersen. For example this allows a hash:ip,port ipset containing the
    entry 192.168.0.1,gre:0 to match all package fragments for PPTP VPN
    tunnels to/from the host. Without this patch only the first package
    fragment (with fragment offset 0) was matched.

    * Introduced a new operation to get both setname and family, from Jozsef.
    ip[6]tables set match and SET target need to know the family of the set
    in order to reject adding rules which refer to a set with a non-mathcing
    family. Currently such rules are silently accepted and then ignored
    instead of generating an error message to the user.

    * Reworked extensions support in ipset types from Jozsef. The approach of
    defining structures with all variations is not manageable as the
    number of extensions grows. Therefore a blob for the extensions is
    introduced, somewhat similar to conntrack. The support of extensions
    which need a per data destroy function is added as well.

    * When an element timed out in a list:set type of set, the garbage
    collector skipped the checking of the next element. So the purging
    was delayed to the next run of the gc, fixed by Jozsef.

    * A small Kconfig fix: NETFILTER_NETLINK cannot be selected and
    ipset requires it.

    * hash:net,net type from Oliver Smith. The type provides the ability to
    store pairs of subnets in a set.

    * Comment for ipset entries from Oliver Smith. This makes possible to
    annotate entries in a set with comments, for example:

    ipset n foo hash:net,net comment
    ipset a foo 10.0.0.0/21,192.168.1.0/24 comment "office nets A and B"

    * Fix of hash types resizing with comment extension from Jozsef.

    * Fix of new extensions for list:set type when an element is added
    into a slot from where another element was pushed away from Jozsef.

    * Introduction of a common function for the listing of the element
    extensions from Jozsef.

    * Net namespace support for ipset from Vitaly Lavrov.

    * hash:net,port,net type from Oliver Smith, which makes possible
    to store the triples of two subnets and a protocol, port pair in
    a set.

    * Get xt_TCPMSS working with net namespace, by Gao feng.

    * Use the proper net netnamespace to allocate skbs, also by Gao feng.

    * A couple of cleanups for the conntrack SIP helper, by Holger
    Eitzenberger.

    * Extend cttimeout to allow setting default conntrack timeouts via
    nfnetlink, so we can get rid of all our sysctl/proc interfaces in
    the future for timeout tuning, from me.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

02 Oct, 2013

1 commit

  • Pablo Neira Ayuso says:

    ====================
    The following patchset contains Netfilter/IPVS fixes for your net
    tree, they are:

    * Fix BUG_ON splat due to malformed TCP packets seen by synproxy, from
    Patrick McHardy.

    * Fix possible weight overflow in lblc and lblcr schedulers due to
    32-bits arithmetics, from Simon Kirby.

    * Fix possible memory access race in the lblc and lblcr schedulers,
    introduced when it was converted to use RCU, two patches from
    Julian Anastasov.

    * Fix hard dependency on CPU 0 when reading per-cpu stats in the
    rate estimator, from Julian Anastasov.

    * Fix race that may lead to object use after release, when invoking
    ipvsadm -C && ipvsadm -R, introduced when adding RCU, from Julian
    Anastasov.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

01 Oct, 2013

29 commits


30 Sep, 2013

1 commit

  • TCP packets hitting the SYN proxy through the SYNPROXY target are not
    validated by TCP conntrack. When th->doff is below 5, an underflow happens
    when calculating the options length, causing skb_header_pointer() to
    return NULL and triggering the BUG_ON().

    Handle this case gracefully by checking for NULL instead of using BUG_ON().

    Reported-by: Martin Topholm
    Tested-by: Martin Topholm
    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     

27 Sep, 2013

3 commits


20 Sep, 2013

1 commit

  • If local fragmentation is allowed, then ip_select_ident() and
    ip_select_ident_more() need to generate unique IDs to ensure
    correct defragmentation on the peer.

    For example, if IPsec (tunnel mode) has to encrypt large skbs
    that have local_df bit set, then all IP fragments that belonged
    to different ESP datagrams would have used the same identificator.
    If one of these IP fragments would get lost or reordered, then
    peer could possibly stitch together wrong IP fragments that did
    not belong to the same datagram. This would lead to a packet loss
    or data corruption.

    Signed-off-by: Ansis Atteka
    Signed-off-by: David S. Miller

    Ansis Atteka
     

19 Sep, 2013

4 commits

  • When reading percpu stats we need to properly reset
    the sum when CPU 0 is not present in the possible mask.

    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman

    Julian Anastasov
     
  • commit c5549571f975ab ("ipvs: convert lblcr scheduler to rcu")
    allows RCU readers to use dest after calling ip_vs_dest_put().
    In the corner case it can race with ip_vs_dest_trash_expire()
    which can release the dest while it is being returned to the
    RCU readers as scheduling result.

    To fix the problem do not allow e->dest to be replaced and
    defer the ip_vs_dest_put() call by using RCU callback. Now
    e->dest does not need to be RCU pointer.

    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman

    Julian Anastasov
     
  • commit c2a4ffb70eef39 ("ipvs: convert lblc scheduler to rcu")
    allows RCU readers to use dest after calling ip_vs_dest_put().
    In the corner case it can race with ip_vs_dest_trash_expire()
    which can release the dest while it is being returned to the
    RCU readers as scheduling result.

    To fix the problem do not allow en->dest to be replaced and
    defer the ip_vs_dest_put() call by using RCU callback. Now
    en->dest does not need to be RCU pointer.

    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman

    Julian Anastasov
     
  • commit 578bc3ef1e473a ("ipvs: reorganize dest trash") added
    IP_VS_DEST_STATE_REMOVING flag and RCU callback named
    ip_vs_dest_wait_readers() to keep dests and services after
    removal for at least a RCU grace period. But we have the
    following corner cases:

    - we can not reuse the same dest if its service is removed
    while IP_VS_DEST_STATE_REMOVING is still set because another dest
    removal in the first grace period can not extend this period.
    It can happen when ipvsadm -C && ipvsadm -R is used.

    - dest->svc can be replaced but ip_vs_in_stats() and
    ip_vs_out_stats() have no explicit read memory barriers
    when accessing dest->svc. It can happen that dest->svc
    was just freed (replaced) while we use it to update
    the stats.

    We solve the problems as follows:

    - IP_VS_DEST_STATE_REMOVING is removed and we ensure a fixed
    idle period for the dest (IP_VS_DEST_TRASH_PERIOD). idle_start
    will remember when for first time after deletion we noticed
    dest->refcnt=0. Later, the connections can grab a reference
    while in RCU grace period but if refcnt becomes 0 we can
    safely free the dest and its svc.

    - dest->svc becomes RCU pointer. As result, we add explicit
    RCU locking in ip_vs_in_stats() and ip_vs_out_stats().

    - __ip_vs_unbind_svc is renamed to __ip_vs_svc_put(), it
    now can free the service immediately or after a RCU grace
    period. dest->svc is not set to NULL anymore.

    As result, unlinked dests and their services are
    freed always after IP_VS_DEST_TRASH_PERIOD period, unused
    services are freed after a RCU grace period.

    Signed-off-by: Julian Anastasov
    Signed-off-by: Simon Horman

    Julian Anastasov