14 Oct, 2013

2 commits

  • This patch adds nftables which is the intended successor of iptables.
    This packet filtering framework reuses the existing netfilter hooks,
    the connection tracking system, the NAT subsystem, the transparent
    proxying engine, the logging infrastructure and the userspace packet
    queueing facilities.

    In a nutshell, nftables provides a pseudo-state machine with 4 general
    purpose registers of 128 bits and 1 specific purpose register to store
    verdicts. This pseudo-machine comes with an extensible instruction set,
    a.k.a. "expressions" in the nftables jargon. The expressions included
    in this patch provide the basic functionality, they are:

    * bitwise: to perform bitwise operations.
    * byteorder: to change from host/network endianess.
    * cmp: to compare data with the content of the registers.
    * counter: to enable counters on rules.
    * ct: to store conntrack keys into register.
    * exthdr: to match IPv6 extension headers.
    * immediate: to load data into registers.
    * limit: to limit matching based on packet rate.
    * log: to log packets.
    * meta: to match metainformation that usually comes with the skbuff.
    * nat: to perform Network Address Translation.
    * payload: to fetch data from the packet payload and store it into
    registers.
    * reject (IPv4 only): to explicitly close connection, eg. TCP RST.

    Using this instruction-set, the userspace utility 'nft' can transform
    the rules expressed in human-readable text representation (using a
    new syntax, inspired by tcpdump) to nftables bytecode.

    nftables also inherits the table, chain and rule objects from
    iptables, but in a more configurable way, and it also includes the
    original datatype-agnostic set infrastructure with mapping support.
    This set infrastructure is enhanced in the follow up patch (netfilter:
    nf_tables: add netlink set API).

    This patch includes the following components:

    * the netlink API: net/netfilter/nf_tables_api.c and
    include/uapi/netfilter/nf_tables.h
    * the packet filter core: net/netfilter/nf_tables_core.c
    * the expressions (described above): net/netfilter/nft_*.c
    * the filter tables: arp, IPv4, IPv6 and bridge:
    net/ipv4/netfilter/nf_tables_ipv4.c
    net/ipv6/netfilter/nf_tables_ipv6.c
    net/ipv4/netfilter/nf_tables_arp.c
    net/bridge/netfilter/nf_tables_bridge.c
    * the NAT table (IPv4 only):
    net/ipv4/netfilter/nf_table_nat_ipv4.c
    * the route table (similar to mangle):
    net/ipv4/netfilter/nf_table_route_ipv4.c
    net/ipv6/netfilter/nf_table_route_ipv6.c
    * internal definitions under:
    include/net/netfilter/nf_tables.h
    include/net/netfilter/nf_tables_core.h
    * It also includes an skeleton expression:
    net/netfilter/nft_expr_template.c
    and the preliminary implementation of the meta target
    net/netfilter/nft_meta_target.c

    It also includes a change in struct nf_hook_ops to add a new
    pointer to store private data to the hook, that is used to store
    the rule list per chain.

    This patch is based on the patch from Patrick McHardy, plus merged
    accumulated cleanups, fixes and small enhancements to the nftables
    code that has been done since 2009, which are:

    From Patrick McHardy:
    * nf_tables: adjust netlink handler function signatures
    * nf_tables: only retry table lookup after successful table module load
    * nf_tables: fix event notification echo and avoid unnecessary messages
    * nft_ct: add l3proto support
    * nf_tables: pass expression context to nft_validate_data_load()
    * nf_tables: remove redundant definition
    * nft_ct: fix maxattr initialization
    * nf_tables: fix invalid event type in nf_tables_getrule()
    * nf_tables: simplify nft_data_init() usage
    * nf_tables: build in more core modules
    * nf_tables: fix double lookup expression unregistation
    * nf_tables: move expression initialization to nf_tables_core.c
    * nf_tables: build in payload module
    * nf_tables: use NFPROTO constants
    * nf_tables: rename pid variables to portid
    * nf_tables: save 48 bits per rule
    * nf_tables: introduce chain rename
    * nf_tables: check for duplicate names on chain rename
    * nf_tables: remove ability to specify handles for new rules
    * nf_tables: return error for rule change request
    * nf_tables: return error for NLM_F_REPLACE without rule handle
    * nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification
    * nf_tables: fix NLM_F_MULTI usage in netlink notifications
    * nf_tables: include NLM_F_APPEND in rule dumps

    From Pablo Neira Ayuso:
    * nf_tables: fix stack overflow in nf_tables_newrule
    * nf_tables: nft_ct: fix compilation warning
    * nf_tables: nft_ct: fix crash with invalid packets
    * nft_log: group and qthreshold are 2^16
    * nf_tables: nft_meta: fix socket uid,gid handling
    * nft_counter: allow to restore counters
    * nf_tables: fix module autoload
    * nf_tables: allow to remove all rules placed in one chain
    * nf_tables: use 64-bits rule handle instead of 16-bits
    * nf_tables: fix chain after rule deletion
    * nf_tables: improve deletion performance
    * nf_tables: add missing code in route chain type
    * nf_tables: rise maximum number of expressions from 12 to 128
    * nf_tables: don't delete table if in use
    * nf_tables: fix basechain release

    From Tomasz Bursztyka:
    * nf_tables: Add support for changing users chain's name
    * nf_tables: Change chain's name to be fixed sized
    * nf_tables: Add support for replacing a rule by another one
    * nf_tables: Update uapi nftables netlink header documentation

    From Florian Westphal:
    * nft_log: group is u16, snaplen u32

    From Phil Oester:
    * nf_tables: operational limit match

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • Pass the hook ops to the hookfn to allow for generic hook
    functions. This change is required by nf_tables.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     

27 Sep, 2013

1 commit

  • There are a mix of function prototypes with and without extern
    in the kernel sources. Standardize on not using extern for
    function prototypes.

    Function prototypes don't need to be written with extern.
    extern is assumed by the compiler. Its use is as unnecessary as
    using auto to declare automatic/local variables in a block.

    Signed-off-by: Joe Perches

    Joe Perches
     

28 Aug, 2013

1 commit

  • Split out sequence number adjustments from NAT and move them to the conntrack
    core to make them usable for SYN proxying. The sequence number adjustment
    information is moved to a seperate extend. The extend is added to new
    conntracks when a NAT mapping is set up for a connection using a helper.

    As a side effect, this saves 24 bytes per connection with NAT in the common
    case that a connection does not have a helper assigned.

    Signed-off-by: Patrick McHardy
    Tested-by: Martin Topholm
    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     

13 Aug, 2013

1 commit


01 Aug, 2013

1 commit


31 Jul, 2013

1 commit


23 May, 2013

1 commit


06 Apr, 2013

1 commit


13 Oct, 2012

1 commit


30 Aug, 2012

1 commit


22 Jun, 2012

1 commit


21 Jun, 2012

1 commit


16 Jun, 2012

2 commits

  • User-space programs that receive traffic via NFQUEUE may mangle packets.
    If NAT is enabled, this usually puzzles sequence tracking, leading to
    traffic disruptions.

    With this patch, nfnl_queue will make the corresponding NAT TCP sequence
    adjustment if:

    1) The packet has been mangled,
    2) the NFQA_CFG_F_CONNTRACK flag has been set, and
    3) NAT is detected.

    There are some records on the Internet complaning about this issue:
    http://stackoverflow.com/questions/260757/packet-mangling-utilities-besides-iptables

    By now, we only support TCP since we have no helpers for DCCP or SCTP.
    Better to add this if we ever have some helper over those layer 4 protocols.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch allows you to include the conntrack information together
    with the packet that is sent to user-space via NFQUEUE.

    Previously, there was no integration between ctnetlink and
    nfnetlink_queue. If you wanted to access conntrack information
    from your libnetfilter_queue program, you required to query
    ctnetlink from user-space to obtain it. Thus, delaying the packet
    processing even more.

    Including the conntrack information is optional, you can set it
    via NFQA_CFG_F_CONNTRACK flag with the new NFQA_CFG_FLAGS attribute.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

07 Jun, 2012

1 commit


21 Apr, 2012

1 commit


24 Feb, 2012

1 commit

  • …_key_slow_[inc|dec]()

    So here's a boot tested patch on top of Jason's series that does
    all the cleanups I talked about and turns jump labels into a
    more intuitive to use facility. It should also address the
    various misconceptions and confusions that surround jump labels.

    Typical usage scenarios:

    #include <linux/static_key.h>

    struct static_key key = STATIC_KEY_INIT_TRUE;

    if (static_key_false(&key))
    do unlikely code
    else
    do likely code

    Or:

    if (static_key_true(&key))
    do likely code
    else
    do unlikely code

    The static key is modified via:

    static_key_slow_inc(&key);
    ...
    static_key_slow_dec(&key);

    The 'slow' prefix makes it abundantly clear that this is an
    expensive operation.

    I've updated all in-kernel code to use this everywhere. Note
    that I (intentionally) have not pushed through the rename
    blindly through to the lowest levels: the actual jump-label
    patching arch facility should be named like that, so we want to
    decouple jump labels from the static-key facility a bit.

    On non-jump-label enabled architectures static keys default to
    likely()/unlikely() branches.

    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Acked-by: Jason Baron <jbaron@redhat.com>
    Acked-by: Steven Rostedt <rostedt@goodmis.org>
    Cc: a.p.zijlstra@chello.nl
    Cc: mathieu.desnoyers@efficios.com
    Cc: davem@davemloft.net
    Cc: ddaney.cavm@gmail.com
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

    Ingo Molnar
     

22 Nov, 2011

1 commit

  • On configs where CONFIG_JUMP_LABEL=y, we can replace in fast path a
    load/compare/conditional jump by a single jump with no dcache reference.

    Jump target is modified as soon as nf_hooks[pf][hook] switches from
    empty state to non empty states. jump_label state is kept outside of
    nf_hooks array so has no cost on cpu caches.

    This patch removes the test on CONFIG_NETFILTER_DEBUG : No need to call
    nf_hook_slow() at all if nf_hooks[pf][hook] is empty, this didnt give
    useful information, but slowed down things a lot.

    Signed-off-by: Eric Dumazet
    CC: Patrick McHardy
    CC: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Eric Dumazet
     

28 May, 2011

1 commit


04 Apr, 2011

2 commits

  • ipv6 fib lookup can set RT6_LOOKUP_F_IFACE flag to restrict search
    to an interface, but this flag cannot be set via struct flowi.

    Also, it cannot be set via ip6_route_output: this function uses the
    passed sock struct to determine if this flag is required
    (by testing for nonzero sk_bound_dev_if).

    Work around this by passing in an artificial struct sk in case
    'strict' argument is true.

    This is required to replace the rt6_lookup call in xt_addrtype.c with
    nf_afinfo->route().

    Signed-off-by: Florian Westphal
    Acked-by: David S. Miller
    Signed-off-by: Patrick McHardy

    Florian Westphal
     
  • This is required to eventually replace the rt6_lookup call in
    xt_addrtype.c with nf_afinfo->route().

    Signed-off-by: Florian Westphal
    Acked-by: David S. Miller
    Signed-off-by: Patrick McHardy

    Florian Westphal
     

18 Jan, 2011

2 commits

  • If an skb is to be NF_QUEUE'd, but no program has opened the queue, the
    packet is dropped.

    This adds a v2 target revision of xt_NFQUEUE that allows packets to
    continue through the ruleset instead.

    Because the actual queueing happens outside of the target context, the
    'bypass' flag has to be communicated back to the netfilter core.

    Unfortunately the only choice to do this without adding a new function
    argument is to use the target function return value (i.e. the verdict).

    In the NF_QUEUE case, the upper 16bit already contain the queue number
    to use. The previous patch reduced NF_VERDICT_MASK to 0xff, i.e.
    we now have extra room for a new flag.

    If a hook issued a NF_QUEUE verdict, then the netfilter core will
    continue packet processing if the queueing hook
    returns -ESRCH (== "this queue does not exist") and the new
    NF_VERDICT_FLAG_QUEUE_BYPASS flag is set in the verdict value.

    Note: If the queue exists, but userspace does not consume packets fast
    enough, the skb will still be dropped.

    Signed-off-by: Florian Westphal
    Signed-off-by: Patrick McHardy

    Florian Westphal
     
  • NF_VERDICT_MASK is currently 0xffff. This is because the upper
    16 bits are used to store errno (for NF_DROP) or the queue number
    (NF_QUEUE verdict).

    As there are up to 0xffff different queues available, there is no more
    room to store additional flags.

    At the moment there are only 6 different verdicts, i.e. we can reduce
    NF_VERDICT_MASK to 0xff to allow storing additional flags in the 0xff00 space.

    NF_VERDICT_BITS would then be reduced to 8, but because the value is
    exported to userspace, this might cause breakage; e.g.:

    e.g. 'queuenr = (1 << NF_VERDICT_BITS) | NF_QUEUE' would now break.

    Thus, remove NF_VERDICT_BITS usage in the kernel and move the old value
    to the 'userspace compat' section.

    Signed-off-by: Florian Westphal
    Signed-off-by: Patrick McHardy

    Florian Westphal
     

13 Jan, 2011

1 commit


18 Nov, 2010

1 commit


16 Nov, 2010

1 commit


12 Nov, 2010

1 commit

  • The NF_HOOK_COND returns 0 when it shouldn't due to what I believe to be an
    error in the code as the order of operations is not what was intended. C will
    evalutate == before =. Which means ret is getting set to the bool result,
    rather than the return value of the function call. The code says

    if (ret = function() == 1)
    when it meant to say:
    if ((ret = function()) == 1)

    Normally the compiler would warn, but it doesn't notice it because its
    a actually complex conditional and so the wrong code is wrapped in an explict
    set of () [exactly what the compiler wants you to do if this was intentional].
    Fixing this means that errors when netfilter denies a packet get propagated
    back up the stack rather than lost.

    Problem introduced by commit 2249065f (netfilter: get rid of the grossness
    in netfilter.h).

    Signed-off-by: Eric Paris
    Cc: stable@kernel.org
    Signed-off-by: Patrick McHardy

    Eric Paris
     

19 Feb, 2010

1 commit


15 Feb, 2010

2 commits


02 Feb, 2010

1 commit

  • Ifdef out
    struct nf_sockopt_ops::compat_set
    struct nf_sockopt_ops::compat_get
    struct xt_match::compat_from_user
    struct xt_match::compat_to_user
    struct xt_match::compatsize
    to make structures smaller on COMPAT=n kernels.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Patrick McHardy

    Alexey Dobriyan
     

05 Nov, 2009

1 commit

  • This cleanup patch puts struct/union/enum opening braces,
    in first line to ease grep games.

    struct something
    {

    becomes :

    struct something {

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Oct, 2009

1 commit

  • This provides safety against negative optlen at the type
    level instead of depending upon (sometimes non-trivial)
    checks against this sprinkled all over the the place, in
    each and every implementation.

    Based upon work done by Arjan van de Ven and feedback
    from Linus Torvalds.

    Signed-off-by: David S. Miller

    David S. Miller
     

08 Oct, 2008

3 commits


22 May, 2008

1 commit

  • Greg Steuck points out that some of the netfilter
    headers can't be used in userspace without including linux/types.h
    first. The headers include their own linux/types.h include statements,
    these are stripped by make headers-install because they are inside
    #ifdef __KERNEL__ however. Move them out to fix this.

    Reported and Tested by Greg Steuck.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

14 Apr, 2008

2 commits

  • Move the UDP-Lite conntrack checksum validation to a generic helper
    similar to nf_checksum() and make it fall back to nf_checksum()
    in case the full packet is to be checksummed and hardware checksums
    are available. This is to be used by DCCP conntrack, which also
    needs to verify partial checksums.

    Signed-off-by: Patrick McHardy

    Patrick McHardy
     
  • Commit 9335f047fe61587ec82ff12fbb1220bcfdd32006 aka
    "[NETFILTER]: ip_tables: per-netns FILTER, MANGLE, RAW"
    added per-netns _view_ of iptables rules. They were shown to user, but
    ignored by filtering code. Now that it's possible to at least ping loopback,
    per-netns tables can affect filtering decisions.

    netns is taken in case of
    PRE_ROUTING, LOCAL_IN -- from in device,
    POST_ROUTING, LOCAL_OUT -- from out device,
    FORWARD -- from in device which should be equal to out device's netns.
    This code is relatively new, so BUG_ON was plugged.

    Wrappers were added to a) keep code the same from CONFIG_NET_NS=n users
    (overwhelming majority), b) consolidate code in one place -- similar
    changes will be done in ipv6 and arp netfilter code.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Patrick McHardy

    Alexey Dobriyan