18 Oct, 2013

19 commits

  • This patch adds code to handle SKB_GSO_TCPV6 skbs and construct appropriate
    extra or prefix segments to pass the large packet to the frontend. New
    xenstore flags, feature-gso-tcpv6 and feature-gso-tcpv6-prefix, are sampled
    to determine if the frontend is capable of handling such packets.

    Signed-off-by: Paul Durrant
    Cc: Wei Liu
    Cc: David Vrabel
    Cc: Ian Campbell
    Signed-off-by: David S. Miller

    Paul Durrant
     
  • This patch adds a xenstore feature flag, festure-gso-tcpv6, to advertise
    that netback can handle IPv6 TCP GSO packets. It creates SKB_GSO_TCPV6 skbs
    if the frontend passes an extra segment with the new type
    XEN_NETIF_GSO_TYPE_TCPV6 added to netif.h.

    Signed-off-by: Paul Durrant
    Cc: Wei Liu
    Cc: David Vrabel
    Acked-by: Ian Campbell
    Signed-off-by: David S. Miller

    Paul Durrant
     
  • There is no mechanism to insist that a guest always generates a packet
    with good checksum (at least for IPv4) so we must handle checksum
    offloading from the guest and hence should set NETIF_F_RXCSUM.

    Signed-off-by: Paul Durrant
    Cc: Wei Liu
    Cc: David Vrabel
    Cc: Ian Campbell
    Signed-off-by: David S. Miller

    Paul Durrant
     
  • For performance of VM to VM traffic on a single host it is better to avoid
    calculation of TCP/UDP checksum in the sending frontend. To allow this this
    patch adds the code necessary to set up partial checksum for IPv6 packets
    and xenstore flag feature-ipv6-csum-offload to advertise that fact to
    frontends.

    Signed-off-by: Paul Durrant
    Cc: Wei Liu
    Cc: David Vrabel
    Cc: Ian Campbell
    Signed-off-by: David S. Miller

    Paul Durrant
     
  • Check xenstore flag feature-ipv6-csum-offload to determine if a
    guest is happy to accept IPv6 packets with only partial checksum.

    Signed-off-by: Paul Durrant
    Cc: Wei Liu
    Cc: David Vrabel
    Cc: Ian Campbell
    Signed-off-by: David S. Miller

    Paul Durrant
     
  • bonding: patchset for rcu use in bonding

    ====================
    The Patch Set convert the xmit of 3ad and alb mode to use rcu lock.
    dd rtnl lock and remove read lock for bond sysfs.

    v2 because the bond_for_each_slave_rcu without rcu_read_lock() will occurs one warming, so
    add new function for alb xmit path to avoid warming.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The bond_for_each_slave() will not be protected by read_lock(),
    only protected by rtnl_lock(), so need to replace read_lock()
    with rtnl_lock().

    Signed-off-by: Ding Tianhong
    Signed-off-by: David S. Miller

    dingtianhong
     
  • The commit 278b20837511776dc9d5f6ee1c7fabd5479838bb
    (bonding: initial RCU conversion) has convert the roundrobin,
    active-backup, broadcast and xor xmit path to rcu protection,
    the performance will be better for these mode, so this time,
    convert xmit path for alb mode.

    Signed-off-by: Ding Tianhong
    Signed-off-by: Yang Yingliang
    Cc: Nikolay Aleksandrov
    Cc: Veaceslav Falico
    Signed-off-by: David S. Miller

    dingtianhong
     
  • The commit 278b20837511776dc9d5f6ee1c7fabd5479838bb
    (bonding: initial RCU conversion) has convert the roundrobin,
    active-backup, broadcast and xor xmit path to rcu protection,
    the performance will be better for these mode, so this time,
    convert xmit path for 3ad mode.

    Suggested-by: Nikolay Aleksandrov
    Suggested-by: Veaceslav Falico
    Signed-off-by: Ding Tianhong
    Signed-off-by: Wang Yufen
    Cc: Nikolay Aleksandrov
    Cc: Veaceslav Falico
    Signed-off-by: David S. Miller

    dingtianhong
     
  • Pablo Neira Ayuso says:

    ====================
    netfilter updates: nf_tables pull request

    The following patchset contains the current original nf_tables tree
    condensed in 17 patches. I have organized them by chronogical order
    since the original nf_tables code was released in 2009 and by
    dependencies between the different patches.

    The patches are:

    1) Adapt all existing hooks in the tree to pass hook ops to the
    hook callback function, required by nf_tables, from Patrick McHardy.

    2) Move alloc_null_binding to nf_nat_core, as it is now also needed by
    nf_tables and ip_tables, original patch from Patrick McHardy but
    required major changes to adapt it to the current tree that I made.

    3) Add nf_tables core, including the netlink API, the packet filtering
    engine, expressions and built-in tables, from Patrick McHardy. This
    patch includes accumulated fixes since 2009 and minor enhancements.
    The patch description contains a list of references to the original
    patches for the record. For those that are not familiar to the
    original work, see [1], [2] and [3].

    4) Add netlink set API, this replaces the original set infrastructure
    to introduce a netlink API to add/delete sets and to add/delete
    set elements. This includes two set types: the hash and the rb-tree
    sets (used for interval based matching). The main difference with
    ipset is that this infrastructure is data type agnostic. Patch from
    Patrick McHardy.

    5) Allow expression operation overload, this API change allows us to
    provide define expression subtypes depending on the configuration
    that is received from user-space via Netlink. It is used by follow
    up patches to provide optimized versions of the payload and cmp
    expressions and the x_tables compatibility layer, from Patrick
    McHardy.

    6) Add optimized data comparison operation, it requires the previous
    patch, from Patrick McHardy.

    7) Add optimized payload implementation, it requires patch 5, from
    Patrick McHardy.

    8) Convert built-in tables to chain types. Each chain type have special
    semantics (filter, route and nat) that are used by userspace to
    configure the chain behaviour. The main chain regarding iptables
    is that tables become containers of chain, with no specific semantics.
    However, you may still configure your tables and chains to retain
    iptables like semantics, patch from me.

    9) Add compatibility layer for x_tables. This patch adds support to
    use all existing x_tables extensions from nf_tables, this is used
    to provide a userspace utility that accepts iptables syntax but
    used internally the nf_tables kernel core. This patch includes
    missing features in the nf_tables core such as the per-chain
    stats, default chain policy and number of chain references, which
    are required by the iptables compatibility userspace tool. Patch
    from me.

    10) Fix transport protocol matching, this fix is a side effect of the
    x_tables compatibility layer, which now provides a pointer to the
    transport header, from me.

    11) Add support for dormant tables, this feature allows you to disable
    all chains and rules that are contained in one table, from me.

    12) Add IPv6 NAT support. At the time nf_tables was made, there was no
    NAT IPv6 support yet, from Tomasz Bursztyka.

    13) Complete net namespace support. This patch register the protocol
    family per net namespace, so tables (thus, other objects contained
    in tables such as sets, chains and rules) are only visible from the
    corresponding net namespace, from me.

    14) Add the insert operation to the nf_tables netlink API, this requires
    adding a new position attribute that allow us to locate where in the
    ruleset a rule needs to be inserted, from Eric Leblond.

    15) Add rule batching support, including atomic rule-set updates by
    using rule-set generations. This patch includes a change to nfnetlink
    to include two new control messages to indicate the beginning and
    the end of a batch. The end message is interpreted as the commit
    message, if it's missing, then the rule-set updates contained in the
    batch are aborted, from me.

    16) Add trace support to the nf_tables packet filtering core, from me.

    17) Add ARP filtering support, original patch from Patrick McHardy, but
    adapted to fit into the chain type infrastructure. This was recovered
    to be used by nft userspace tool and our compatibility arptables
    userspace tool.

    There is still work to do to fully replace x_tables [4] [5] but that can
    be done incrementally by extending our netlink API. Moreover, looking at
    netfilter-devel and the amount of contributions to nf_tables we've been
    getting, I think it would be good to have it mainstream to avoid accumulating
    large patchsets skip continuous rebases.

    I tried to provide a reasonable patchset, we have more than 100 accumulated
    patches in the original nf_tables tree, so I collapsed many of the small
    fixes to the main patch we had since 2009 and provide a small batch for
    review to netdev, while trying to retain part of the history.

    For those who didn't give a try to nf_tables yet, there's a quick howto
    available from Eric Leblond that describes how to get things working [6].

    Comments/reviews welcome.

    Thanks!

    [1] http://lwn.net/Articles/324251/
    [2] http://workshop.netfilter.org/2013/wiki/images/e/ee/Nftables-osd-2013-developer.pdf
    [3] http://lwn.net/Articles/564095/
    [4] http://people.netfilter.org/pablo/map-pending-work.txt
    [4] http://people.netfilter.org/pablo/nftables-todo.txt
    [5] https://home.regit.org/netfilter-en/nftables-quick-howto/
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This patch removes a comment mentioning IRQF_DISABLED,
    which is deprecated.

    Signed-off-by: Michael Opdenacker
    Signed-off-by: David S. Miller

    Michael Opdenacker
     
  • This patch proposes to remove the use of the IRQF_DISABLED flag

    It's a NOOP since 2.6.35 and it will be removed one day.

    Signed-off-by: Michael Opdenacker
    Signed-off-by: David S. Miller

    Michael Opdenacker
     
  • Amir Vadai says:

    ====================
    net/mlx4: Mellanox driver update 15-10-2013

    This patchset contains small code cleaning patches, and a patch to make
    mlx4_core use module_request() in order to load the relevant link layer module
    (mlx4_en or mlx4_ib) according to the port type.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Mellanox ConnectX architecture is: mlx4_core is the lower level
    PCI driver which register on the PCI id, and protocol specific drivers
    are depended on it: mlx4_en - for Ethernet and mlx4_ib for Infiniband.
    NIC could have multiple ports which can change their type dynamically.
    We use the request_module() call to load the relevant protocol driver
    when needed: on loading time or at port type change event.

    Signed-off-by: Eyal Perry
    Signed-off-by: Amir Vadai
    Signed-off-by: David S. Miller

    Eyal Perry
     
  • Clean up warning added by commit fe6f700d "net/mlx4_core: Respond to
    operation request by firmware".

    Signed-off-by: Amir Vadai
    Signed-off-by: David S. Miller

    Amir Vadai
     
  • Small code cleanup:

    1. change MLX4_DEV_CAP_FLAGS2_REASSIGN_MAC_EN to MLX4_DEV_CAP_FLAG2_REASSIGN_MAC_EN

    2. put MLX4_SET_PORT_PRIO2TC and MLX4_SET_PORT_SCHEDULER in the same union with the
    other MLX4_SET_PORT_yyy

    Signed-off-by: Or Gerlitz
    Signed-off-by: Amir Vadai
    Signed-off-by: David S. Miller

    Or Gerlitz
     
  • Remove code that triggers trivial build warnings.

    drivers/net/ethernet/mellanox/mlx4/cmd.c: In function ‘mlx4_set_vf_vlan’:
    drivers/net/ethernet/mellanox/mlx4/cmd.c:2256: warning: variable ‘vf_oper’ set but not used
    drivers/net/ethernet/mellanox/mlx4/mcg.c: In function ‘mlx4_map_sw_to_hw_steering_mode’:
    drivers/net/ethernet/mellanox/mlx4/mcg.c:648: warning: comparison of unsigned expression < 0 is always false
    drivers/net/ethernet/mellanox/mlx4/mcg.c: In function ‘mlx4_map_sw_to_hw_steering_id’:
    drivers/net/ethernet/mellanox/mlx4/mcg.c:685: warning: comparison of unsigned expression < 0 is always false
    drivers/net/ethernet/mellanox/mlx4/mcg.c: In function ‘mlx4_hw_rule_sz’:
    drivers/net/ethernet/mellanox/mlx4/mcg.c:712: warning: comparison of unsigned expression < 0 is always false
    drivers/net/ethernet/mellanox/mlx4/fw.c: In function ‘mlx4_opreq_action’:
    drivers/net/ethernet/mellanox/mlx4/fw.c:1732: warning: variable ‘type_m’ set but not used
    drivers/net/ethernet/mellanox/mlx4/srq.c:302: warning: no previous prototype for ‘mlx4_srq_lookup’

    Signed-off-by: Or Gerlitz
    Signed-off-by: Amir Vadai
    Signed-off-by: David S. Miller

    Or Gerlitz
     
  • TCP listener refactoring, part 6 :

    Use sock_gen_put() from inet_diag_dump_one_icsk() for future
    SYN_RECV support.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Included changes:
    - ensure RecordRoute information is added to BAT_ICMP echo_request/reply only
    - use VLAN_ETH_HLEN when possible
    - use htons when possible
    - substitute old fragmentation code with a new improved implementation by
    Martin Hundebøll
    - create common header for BAT_ICMP packets to improve extendibility
    - consider the network coding overhead when computing the overall room needed by
    batman headers
    - add dummy soft-interface rx mode handler
    - minor code refactoring and cleanups

    Signed-off-by: David S. Miller

    David S. Miller
     

15 Oct, 2013

9 commits

  • This patch registers the ARP family and he filter chain type
    for this family.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch adds support for tracing the packet travel through
    the ruleset, in a similar fashion to x_tables.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch adds a batch support to nfnetlink. Basically, it adds
    two new control messages:

    * NFNL_MSG_BATCH_BEGIN, that indicates the beginning of a batch,
    the nfgenmsg->res_id indicates the nfnetlink subsystem ID.

    * NFNL_MSG_BATCH_END, that results in the invocation of the
    ss->commit callback function. If not specified or an error
    ocurred in the batch, the ss->abort function is invoked
    instead.

    The end message represents the commit operation in nftables, the
    lack of end message results in an abort. This patch also adds the
    .call_batch function that is only called from the batch receival
    path.

    This patch adds atomic rule updates and dumps based on
    bitmask generations. This allows to atomically commit a set of
    rule-set updates incrementally without altering the internal
    state of existing nf_tables expressions/matches/targets.

    The idea consists of using a generation cursor of 1 bit and
    a bitmask of 2 bits per rule. Assuming the gencursor is 0,
    then the genmask (expressed as a bitmask) can be interpreted
    as:

    00 active in the present, will be active in the next generation.
    01 inactive in the present, will be active in the next generation.
    10 active in the present, will be deleted in the next generation.
    ^
    gencursor

    Once you invoke the transition to the next generation, the global
    gencursor is updated:

    00 active in the present, will be active in the next generation.
    01 active in the present, needs to zero its future, it becomes 00.
    10 inactive in the present, delete now.
    ^
    gencursor

    If a dump is in progress and nf_tables enters a new generation,
    the dump will stop and return -EBUSY to let userspace know that
    it has to retry again. In order to invalidate dumps, a global
    genctr counter is increased everytime nf_tables enters a new
    generation.

    This new operation can be used from the user-space utility
    that controls the firewall, eg.

    nft -f restore

    The rule updates contained in `file' will be applied atomically.

    cat file
    -----
    add filter INPUT ip saddr 1.1.1.1 counter accept #1
    del filter INPUT ip daddr 2.2.2.2 counter drop #2
    -EOF-

    Note that the rule 1 will be inactive until the transition to the
    next generation, the rule 2 will be evicted in the next generation.

    There is a penalty during the rule update due to the branch
    misprediction in the packet matching framework. But that should be
    quickly resolved once the iteration over the commit list that
    contain rules that require updates is finished.

    Event notification happens once the rule-set update has been
    committed. So we skip notifications is case the rule-set update
    is aborted, which can happen in case that the rule-set is tested
    to apply correctly.

    This patch squashed the following patches from Pablo:

    * nf_tables: atomic rule updates and dumps
    * nf_tables: get rid of per rule list_head for commits
    * nf_tables: use per netns commit list
    * nfnetlink: add batch support and use it from nf_tables
    * nf_tables: all rule updates are transactional
    * nf_tables: attach replacement rule after stale one
    * nf_tables: do not allow deletion/replacement of stale rules
    * nf_tables: remove unused NFTA_RULE_FLAGS

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch adds a new rule attribute NFTA_RULE_POSITION which is
    used to store the position of a rule relatively to the others.
    By providing the create command and specifying the position, the
    rule is inserted after the rule with the handle equal to the
    provided position.

    Regarding notification, the position attribute specifies the
    handle of the previous rule to make sure we don't point to any
    stale rule in notifications coming from the commit path.

    This patch includes the following fix from Pablo:

    * nf_tables: fix rule deletion event reporting

    Signed-off-by: Eric Leblond
    Signed-off-by: Pablo Neira Ayuso

    Eric Leblond
     
  • Register family per netnamespace to ensure that sets are
    only visible in its approapriate namespace.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch generalizes the NAT expression to support both IPv4 and IPv6
    using the existing IPv4/IPv6 NAT infrastructure. This also adds the
    NAT chain type for IPv6.

    This patch collapses the following patches that were posted to the
    netfilter-devel mailing list, from Tomasz:

    * nf_tables: Change NFTA_NAT_ attributes to better semantic significance
    * nf_tables: Split IPv4 NAT into NAT expression and IPv4 NAT chain
    * nf_tables: Add support for IPv6 NAT expression
    * nf_tables: Add support for IPv6 NAT chain
    * nf_tables: Fix up build issue on IPv6 NAT support

    And, from Pablo Neira Ayuso:

    * fix missing dependencies in nft_chain_nat

    Signed-off-by: Tomasz Bursztyka
    Signed-off-by: Pablo Neira Ayuso

    Tomasz Bursztyka
     
  • This patch allows you to temporarily disable an entire table.
    You can change the state of a dormant table via NFT_MSG_NEWTABLE
    messages. Using this operation you can wake up a table, so their
    chains are registered.

    This provides atomicity at chain level. Thus, the rule-set of one
    chain is applied at once, avoiding any possible intermediate state
    in every chain. Still, the chains that belongs to a table are
    registered consecutively. This also allows you to have inactive
    tables in the kernel.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • We cannot use skb->transport_header since it's unset, use
    pkt->xt.thoff instead.

    Now possible using information made available through the x_tables
    compatibility layer.

    Reported-by: Eric Leblond
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch adds the x_tables compatibility layer. This allows you
    to use existing x_tables matches and targets from nf_tables.

    This compatibility later allows us to use existing matches/targets
    for features that are still missing in nf_tables. We can progressively
    replace them with native nf_tables extensions. It also provides the
    userspace compatibility software that allows you to express the
    rule-set using the iptables syntax but using the nf_tables kernel
    components.

    In order to get this compatibility layer working, I've done the
    following things:

    * add NFNL_SUBSYS_NFT_COMPAT: this new nfnetlink subsystem is used
    to query the x_tables match/target revision, so we don't need to
    use the native x_table getsockopt interface.

    * emulate xt structures: this required extending the struct nft_pktinfo
    to include the fragment offset, which is already obtained from
    ip[6]_tables and that is used by some matches/targets.

    * add support for default policy to base chains, required to emulate
    x_tables.

    * add NFTA_CHAIN_USE attribute to obtain the number of references to
    chains, required by x_tables emulation.

    * add chain packet/byte counters using per-cpu.

    * support 32-64 bits compat.

    For historical reasons, this patch includes the following patches
    that were posted in the netfilter-devel mailing list.

    From Pablo Neira Ayuso:
    * nf_tables: add default policy to base chains
    * netfilter: nf_tables: add NFTA_CHAIN_USE attribute
    * nf_tables: nft_compat: private data of target and matches in contiguous area
    * nf_tables: validate hooks for compat match/target
    * nf_tables: nft_compat: release cached matches/targets
    * nf_tables: x_tables support as a compile time option
    * nf_tables: fix alias for xtables over nftables module
    * nf_tables: add packet and byte counters per chain
    * nf_tables: fix per-chain counter stats if no counters are passed
    * nf_tables: don't bump chain stats
    * nf_tables: add protocol and flags for xtables over nf_tables
    * nf_tables: add ip[6]t_entry emulation
    * nf_tables: move specific layer 3 compat code to nf_tables_ipv[4|6]
    * nf_tables: support 32bits-64bits x_tables compat
    * nf_tables: fix compilation if CONFIG_COMPAT is disabled

    From Patrick McHardy:
    * nf_tables: move policy to struct nft_base_chain
    * nf_tables: send notifications for base chain policy changes

    From Alexander Primak:
    * nf_tables: remove the duplicate NF_INET_LOCAL_OUT

    From Nicolas Dichtel:
    * nf_tables: fix compilation when nf-netlink is a module

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

14 Oct, 2013

8 commits

  • This patch converts built-in tables/chains to chain types that
    allows you to deploy customized table and chain configurations from
    userspace.

    After this patch, you have to specify the chain type when
    creating a new chain:

    add chain ip filter output { type filter hook input priority 0; }
    ^^^^ ------

    The existing chain types after this patch are: filter, route and
    nat. Note that tables are just containers of chains with no specific
    semantics, which is a significant change with regards to iptables.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Add an optimized payload expression implementation for small (up to 4 bytes)
    aligned data loads from the linear packet area.

    This patch also includes original Patrick McHardy's entitled (nf_tables:
    inline nft_payload_fast_eval() into main evaluation loop).

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • Add an optimized version of nft_data_cmp() that only handles values of to
    4 bytes length.

    This patch includes original Patrick McHardy's patch entitled (nf_tables:
    inline nft_cmp_fast_eval() into main evaluation loop).

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • Split the expression ops into two parts and support overloading of
    the runtime expression ops based on the requested function through
    a ->select_ops() callback.

    This can be used to provide optimized implementations, for instance
    for loading small aligned amounts of data from the packet or inlining
    frequently used operations into the main evaluation loop.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • This patch adds the new netlink API for maintaining nf_tables sets
    independently of the ruleset. The API supports the following operations:

    - creation of sets
    - deletion of sets
    - querying of specific sets
    - dumping of all sets

    - addition of set elements
    - removal of set elements
    - dumping of all set elements

    Sets are identified by name, each table defines an individual namespace.
    The name of a set may be allocated automatically, this is mostly useful
    in combination with the NFT_SET_ANONYMOUS flag, which destroys a set
    automatically once the last reference has been released.

    Sets can be marked constant, meaning they're not allowed to change while
    linked to a rule. This allows to perform lockless operation for set
    types that would otherwise require locking.

    Additionally, if the implementation supports it, sets can (as before) be
    used as maps, associating a data value with each key (or range), by
    specifying the NFT_SET_MAP flag and can be used for interval queries by
    specifying the NFT_SET_INTERVAL flag.

    Set elements are added and removed incrementally. All element operations
    support batching, reducing netlink message and set lookup overhead.

    The old "set" and "hash" expressions are replaced by a generic "lookup"
    expression, which binds to the specified set. Userspace is not aware
    of the actual set implementation used by the kernel anymore, all
    configuration options are generic.

    Currently the implementation selection logic is largely missing and the
    kernel will simply use the first registered implementation supporting the
    requested operation. Eventually, the plan is to have userspace supply a
    description of the data characteristics and select the implementation
    based on expected performance and memory use.

    This patch includes the new 'lookup' expression to look up for element
    matching in the set.

    This patch includes kernel-doc descriptions for this set API and it
    also includes the following fixes.

    From Patrick McHardy:
    * netfilter: nf_tables: fix set element data type in dumps
    * netfilter: nf_tables: fix indentation of struct nft_set_elem comments
    * netfilter: nf_tables: fix oops in nft_validate_data_load()
    * netfilter: nf_tables: fix oops while listing sets of built-in tables
    * netfilter: nf_tables: destroy anonymous sets immediately if binding fails
    * netfilter: nf_tables: propagate context to set iter callback
    * netfilter: nf_tables: add loop detection

    From Pablo Neira Ayuso:
    * netfilter: nf_tables: allow to dump all existing sets
    * netfilter: nf_tables: fix wrong type for flags variable in newelem

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • This patch adds nftables which is the intended successor of iptables.
    This packet filtering framework reuses the existing netfilter hooks,
    the connection tracking system, the NAT subsystem, the transparent
    proxying engine, the logging infrastructure and the userspace packet
    queueing facilities.

    In a nutshell, nftables provides a pseudo-state machine with 4 general
    purpose registers of 128 bits and 1 specific purpose register to store
    verdicts. This pseudo-machine comes with an extensible instruction set,
    a.k.a. "expressions" in the nftables jargon. The expressions included
    in this patch provide the basic functionality, they are:

    * bitwise: to perform bitwise operations.
    * byteorder: to change from host/network endianess.
    * cmp: to compare data with the content of the registers.
    * counter: to enable counters on rules.
    * ct: to store conntrack keys into register.
    * exthdr: to match IPv6 extension headers.
    * immediate: to load data into registers.
    * limit: to limit matching based on packet rate.
    * log: to log packets.
    * meta: to match metainformation that usually comes with the skbuff.
    * nat: to perform Network Address Translation.
    * payload: to fetch data from the packet payload and store it into
    registers.
    * reject (IPv4 only): to explicitly close connection, eg. TCP RST.

    Using this instruction-set, the userspace utility 'nft' can transform
    the rules expressed in human-readable text representation (using a
    new syntax, inspired by tcpdump) to nftables bytecode.

    nftables also inherits the table, chain and rule objects from
    iptables, but in a more configurable way, and it also includes the
    original datatype-agnostic set infrastructure with mapping support.
    This set infrastructure is enhanced in the follow up patch (netfilter:
    nf_tables: add netlink set API).

    This patch includes the following components:

    * the netlink API: net/netfilter/nf_tables_api.c and
    include/uapi/netfilter/nf_tables.h
    * the packet filter core: net/netfilter/nf_tables_core.c
    * the expressions (described above): net/netfilter/nft_*.c
    * the filter tables: arp, IPv4, IPv6 and bridge:
    net/ipv4/netfilter/nf_tables_ipv4.c
    net/ipv6/netfilter/nf_tables_ipv6.c
    net/ipv4/netfilter/nf_tables_arp.c
    net/bridge/netfilter/nf_tables_bridge.c
    * the NAT table (IPv4 only):
    net/ipv4/netfilter/nf_table_nat_ipv4.c
    * the route table (similar to mangle):
    net/ipv4/netfilter/nf_table_route_ipv4.c
    net/ipv6/netfilter/nf_table_route_ipv6.c
    * internal definitions under:
    include/net/netfilter/nf_tables.h
    include/net/netfilter/nf_tables_core.h
    * It also includes an skeleton expression:
    net/netfilter/nft_expr_template.c
    and the preliminary implementation of the meta target
    net/netfilter/nft_meta_target.c

    It also includes a change in struct nf_hook_ops to add a new
    pointer to store private data to the hook, that is used to store
    the rule list per chain.

    This patch is based on the patch from Patrick McHardy, plus merged
    accumulated cleanups, fixes and small enhancements to the nftables
    code that has been done since 2009, which are:

    From Patrick McHardy:
    * nf_tables: adjust netlink handler function signatures
    * nf_tables: only retry table lookup after successful table module load
    * nf_tables: fix event notification echo and avoid unnecessary messages
    * nft_ct: add l3proto support
    * nf_tables: pass expression context to nft_validate_data_load()
    * nf_tables: remove redundant definition
    * nft_ct: fix maxattr initialization
    * nf_tables: fix invalid event type in nf_tables_getrule()
    * nf_tables: simplify nft_data_init() usage
    * nf_tables: build in more core modules
    * nf_tables: fix double lookup expression unregistation
    * nf_tables: move expression initialization to nf_tables_core.c
    * nf_tables: build in payload module
    * nf_tables: use NFPROTO constants
    * nf_tables: rename pid variables to portid
    * nf_tables: save 48 bits per rule
    * nf_tables: introduce chain rename
    * nf_tables: check for duplicate names on chain rename
    * nf_tables: remove ability to specify handles for new rules
    * nf_tables: return error for rule change request
    * nf_tables: return error for NLM_F_REPLACE without rule handle
    * nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification
    * nf_tables: fix NLM_F_MULTI usage in netlink notifications
    * nf_tables: include NLM_F_APPEND in rule dumps

    From Pablo Neira Ayuso:
    * nf_tables: fix stack overflow in nf_tables_newrule
    * nf_tables: nft_ct: fix compilation warning
    * nf_tables: nft_ct: fix crash with invalid packets
    * nft_log: group and qthreshold are 2^16
    * nf_tables: nft_meta: fix socket uid,gid handling
    * nft_counter: allow to restore counters
    * nf_tables: fix module autoload
    * nf_tables: allow to remove all rules placed in one chain
    * nf_tables: use 64-bits rule handle instead of 16-bits
    * nf_tables: fix chain after rule deletion
    * nf_tables: improve deletion performance
    * nf_tables: add missing code in route chain type
    * nf_tables: rise maximum number of expressions from 12 to 128
    * nf_tables: don't delete table if in use
    * nf_tables: fix basechain release

    From Tomasz Bursztyka:
    * nf_tables: Add support for changing users chain's name
    * nf_tables: Change chain's name to be fixed sized
    * nf_tables: Add support for replacing a rule by another one
    * nf_tables: Update uapi nftables netlink header documentation

    From Florian Westphal:
    * nft_log: group is u16, snaplen u32

    From Phil Oester:
    * nf_tables: operational limit match

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • Similar to nat_decode_session, alloc_null_binding is needed for both
    ip_tables and nf_tables, so move it to nf_nat_core.c. This change
    is required by nf_tables.

    This is an adapted version of the original patch from Patrick McHardy.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Pass the hook ops to the hookfn to allow for generic hook
    functions. This change is required by nf_tables.

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     

12 Oct, 2013

4 commits