23 Sep, 2016

6 commits

  • There are some codes which are used to get one random once in netfilter.
    We could use net_get_random_once to simplify these codes.

    Signed-off-by: Gao Feng
    Signed-off-by: Pablo Neira Ayuso

    Gao Feng
     
  • pkt->xt.thoff is not always set properly, but we use it without any check.
    For payload expr, it will cause wrong results. For nftrace, we may notify
    the wrong network or transport header to the user space, furthermore,
    input the following nft rules, warning message will be printed out:
    # nft add rule arp filter output meta nftrace set 1

    WARNING: CPU: 0 PID: 13428 at net/netfilter/nf_tables_trace.c:263
    nft_trace_notify+0x4a3/0x5e0 [nf_tables]
    Call Trace:
    [] dump_stack+0x63/0x85
    [] __warn+0xcb/0xf0
    [] warn_slowpath_null+0x1d/0x20
    [] nft_trace_notify+0x4a3/0x5e0 [nf_tables]
    [ ... ]
    [] nft_do_chain_arp+0x78/0x90 [nf_tables_arp]
    [] nf_iterate+0x62/0x80
    [] nf_hook_slow+0x73/0xd0
    [] arp_xmit+0x8f/0xb0
    [ ... ]
    [] arp_solicit+0x106/0x2c0

    So before we use pkt->xt.thoff, check the tprot_set first.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • There's an off-by-one issue in nft_payload_fast_eval, skb_tail_pointer
    and ptr + priv->len all point to the last valid address plus 1. So if
    they are equal, we can still fetch the valid data. It's unnecessary to
    fall back to nft_payload_eval.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • After commit ac2863445686 ("netfilter: bridge: add nf_afinfo to enable
    queuing to userspace"), we can queue packets to the user space in bridge
    family. But when the user specify the queue range, packets will be only
    delivered to the first queue num. Because in nfqueue_hash, we only support
    ipv4 and ipv6 family. Now add support for bridge family too.

    Suggested-by: Pablo Neira Ayuso
    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • Currently, the user can specify the queue numbers by _QUEUE_NUM and
    _QUEUE_TOTAL attributes, this is enough in most situations.

    But acctually, it is not very flexible, for example:
    tcp dport 80 mapped to queue0
    tcp dport 81 mapped to queue1
    tcp dport 82 mapped to queue2
    In order to do this thing, we must add 3 nft rules, and more
    mapping meant more rules ...

    So take one register to select the queue number, then we can add one
    simple rule to mapping queues, maybe like this:
    queue num tcp dport map { 80:0, 81:1, 82:2 ... }

    Florian Westphal also proposed wider usage scenarios:
    queue num jhash ip saddr . ip daddr mod ...
    queue num meta cpu ...
    queue num meta mark ...

    The last point is how to load a queue number from sreg, although we can
    use *(u16*)®s->data[reg] to load the queue number, just like nat expr
    to load its l4port do.

    But we will cooperate with hash expr, meta cpu, meta mark expr and so on.
    They all store the result to u32 type, so cast it to u16 pointer and
    dereference it will generate wrong result in the big endian system.

    So just keep it simple, we treat queue number as u32 type, although u16
    type is already enough.

    Suggested-by: Pablo Neira Ayuso
    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • Fetch value and validate u32 netlink attribute. This validation is
    usually required when the u32 netlink attributes are being stored in a
    field whose size is smaller.

    This patch revisits 4da449ae1df9 ("netfilter: nft_exthdr: Add size check
    on u8 nft_exthdr attributes").

    Fixes: 96518518cc41 ("netfilter: add nftables")
    Suggested-by: Pablo Neira Ayuso
    Signed-off-by: Laura Garcia Liebana
    Signed-off-by: Pablo Neira Ayuso

    Laura Garcia Liebana
     

22 Sep, 2016

1 commit

  • Add support of an offset value for incremental counter and random. With
    this option the sysadmin is able to start the counter to a certain value
    and then apply the generated number.

    Example:

    meta mark set numgen inc mod 2 offset 100

    This will generate marks with the serie 100, 101, 100, 101, ...

    Suggested-by: Pablo Neira Ayuso
    Signed-off-by: Laura Garcia Liebana
    Signed-off-by: Pablo Neira Ayuso

    Laura Garcia Liebana
     

13 Sep, 2016

14 commits

  • The overflow validation in the init() function establishes that the
    maximum value that the hash could reach is less than U32_MAX, which is
    likely to be true.

    The fix detects the overflow when the maximum hash value is less than
    the offset itself.

    Fixes: 70ca767ea1b2 ("netfilter: nft_hash: Add hash offset value")
    Reported-by: Liping Zhang
    Signed-off-by: Laura Garcia Liebana
    Signed-off-by: Pablo Neira Ayuso

    Laura Garcia Liebana
     
  • After we generate a new number, we still use the priv->counter and
    store it to the dreg. This is not correct, another cpu may already
    change it to a new number. So we must use the generated number, not
    the priv->counter itself.

    Fixes: 91dbc6be0a62 ("netfilter: nf_tables: add number generator expression")
    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • These counters sit in hot path and do show up in perf, this is especially
    true for 'found' and 'searched' which get incremented for every packet
    processed.

    Information like

    searched=212030105
    new=623431
    found=333613
    delete=623327

    does not seem too helpful nowadays:

    - on busy systems found and searched will overflow every few hours
    (these are 32bit integers), other more busy ones every few days.

    - for debugging there are better methods, such as iptables' trace target,
    the conntrack log sysctls. Nowadays we also have perf tool.

    This removes packet path stat counters except those that
    are expected to be 0 (or close to 0) on a normal system, e.g.
    'insert_failed' (race happened) or 'invalid' (proto tracker rejects).

    The insert stat is retained for the ctnetlink case.
    The found stat is retained for the tuple-is-taken check when NAT has to
    determine if it needs to pick a different source address.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • hash_v6 is used by both nftables and ip6tables, so depend on
    IP6_NF_IPTABLES is not properly.

    Actually, it only parses ipv6hdr and computes a hash value, so
    even if IPV6 is disabled, there's no side effect too, remove it.

    Signed-off-by: Liping Zhang
    Signed-off-by: Pablo Neira Ayuso

    Liping Zhang
     
  • There are some codes of netfilter module which did not check the return
    value of nft_register_chain_type. Add the checks now.

    Signed-off-by: Gao Feng
    Signed-off-by: Pablo Neira Ayuso

    Gao Feng
     
  • There are some codes of netfilter module which did not check the return
    value of register_netdevice_notifier. Add the checks now.

    Signed-off-by: Gao Feng
    Signed-off-by: Pablo Neira Ayuso

    Gao Feng
     
  • Instead of several goto's just to return the result, simply return it.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira
     
  • This is overly conservative and not flexible at all, so better let them
    go through and let the filtering policy decide what to do with them. We
    use skb_header_pointer() all over the place so we would just fail to
    match when trying to access fields from malformed traffic.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Consolidate pktinfo setup and validation by using the new generic
    functions so we converge to the netdev family codebase.

    We only need a linear IPv4 and IPv6 header from the reject expression,
    so move nft_bridge_iphdr_validate() and nft_bridge_ip6hdr_validate()
    to net/bridge/netfilter/nft_reject_bridge.c.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • These functions are extracted from the netdev family, they initialize
    the pktinfo structure and validate that the IPv4 and IPv6 headers are
    well-formed given that these functions are called from a path where
    layer 3 sanitization did not happen yet.

    These functions are placed in include/net/netfilter/nf_tables_ipv{4,6}.h
    so they can be reused by a follow up patch to use them from the bridge
    family too.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Make sure the pktinfo protocol fields are initialized if this fails to
    parse the transport header.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • This patch introduces nft_set_pktinfo_unspec() that ensures proper
    initialization all of pktinfo fields for non-IP traffic. This is used
    by the bridge, netdev and arp families.

    This new function relies on nft_set_pktinfo_proto_unspec() to set a new
    tprot_set field that indicates if transport protocol information is
    available. Remain fields are zeroed.

    The meta expression has been also updated to check to tprot_set in first
    place given that zero is a valid tprot value. Even a handcrafted packet
    may come with the IPPROTO_RAW (255) protocol number so we can't rely on
    this value as tprot unset.

    Reported-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • The dynset expression matches if we can fit a new entry into the set.
    If there is no room for it, then it breaks the rule evaluation.

    This patch introduces the inversion flag so you can add rules to
    explicitly drop packets that don't fit into the set. For example:

    # nft filter input flow table xyz size 4 { ip saddr timeout 120s counter } overflow drop

    This is useful to provide a replacement for connlimit.

    For the rule above, every new entry uses the IPv4 address as key in the
    set, this entry gets a timeout of 120 seconds that gets refresh on every
    packet seen. If we get new flow and our set already contains 4 entries
    already, then this packet is dropped.

    You can already express this in positive logic, assuming default policy
    to drop:

    # nft filter input flow table xyz size 4 { ip saddr timeout 10s counter } accept

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Add support to pass through an offset to the hash value. With this
    feature, the sysadmin is able to generate a hash with a given
    offset value.

    Example:

    meta mark set jhash ip saddr mod 2 seed 0xabcd offset 100

    This option generates marks according to the source address from 100 to
    101.

    Signed-off-by: Laura Garcia Liebana

    Laura Garcia Liebana
     

09 Sep, 2016

2 commits


07 Sep, 2016

17 commits

  • Current parsing methods for SIP headers do not allow the presence of
    tab characters between header name and header value. As a result Call-ID
    SIP headers like the following are discarded by IPVS SIP persistence
    engine:

    "Call-ID\t: mycallid@abcde"
    "Call-ID:\tmycallid@abcde"

    In above examples Call-IDs are represented as strings in C language.
    Obviously in real message we have byte "09" before/after colon (":").

    Proposed fix is in nf_conntrack_sip module.
    Function sip_skip_whitespace() should skip tabs in addition to spaces,
    since in SIP grammar whitespace (WSP) corresponds to space or tab.

    Below is an extract of relevant SIP ABNF syntax.

    Call-ID = ( "Call-ID" / "i" ) HCOLON callid
    callid = word [ "@" word ]

    HCOLON = *( SP / HTAB ) ":" SWS
    SWS = [LWS] ; sep whitespace
    LWS = [*WSP CRLF] 1*WSP ; linear whitespace
    WSP = SP / HTAB
    word = 1*(alphanum / "-" / "." / "!" / "%" / "*" /
    "_" / "+" / "`" / "'" / "~" /
    "(" / ")" / "" /
    ":" / "\" / DQUOTE /
    "/" / "[" / "]" / "?" /
    "{" / "}" )

    Signed-off-by: Marco Angaroni
    Signed-off-by: Pablo Neira Ayuso

    Marco Angaroni
     
  • This is patch renames the existing function to nft_overquota() and make
    it return a boolean that tells us if we have exceeded our byte quota.
    Just a cleanup.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Use xor to decide to break further rule evaluation or not, since the
    existing logic doesn't achieve the expected inversion.

    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • The _until_ attribute is renamed to _modulus_ as the behaviour is similar to
    other expresions with number limits (ex. nft_hash).

    Renaming is possible because there isn't a kernel release yet with these
    changes.

    Signed-off-by: Laura Garcia Liebana
    Signed-off-by: Pablo Neira Ayuso

    Laura Garcia Liebana
     
  • There are some debug code which are commented out in find_pattern by #if 0.
    Now remove them.

    Signed-off-by: Gao Feng
    Signed-off-by: Pablo Neira Ayuso

    Gao Feng
     
  • The caller function "help" has already make sure the datalen could not be zero
    before invoke find_pattern as a parameter by the following codes

    if (dataoff >= skb->len) {
    pr_debug("ftp: dataoff(%u) >= skblen(%u)\n", dataoff,
    skb->len);
    return NF_ACCEPT;
    }
    datalen = skb->len - dataoff;

    And the latter codes "ends_in_nl = (fb_ptr[datalen - 1] == '\n');" use datalen
    directly without checking if it is zero.

    So it is unneccessary to check it in find_pattern too.

    Signed-off-by: Gao Feng
    Signed-off-by: Pablo Neira Ayuso

    Gao Feng
     
  • Current parsing methods for SIP header Call-ID do not check correctly all
    characters allowed by RFC 3261. In particular "," character is allowed
    instead of "'" character. As a result Call-ID headers like the following
    are discarded by IPVS SIP persistence engine.

    Call-ID: -.!%*_+`'~()<>:\"/[]?{}

    Above example is composed using all non-alphanumeric characters listed
    in RFC 3261 for Call-ID header syntax.

    Proposed fix is in nf_conntrack_sip module; function iswordc() checks this
    range: (c >= '(' && c
    Signed-off-by: Pablo Neira Ayuso

    Marco Angaroni
     
  • Current parsing methods for SIP headers do not properly manage
    continuation lines: in case of Call-ID header the first character of
    Call-ID header value is truncated. As a result IPVS SIP persistence
    engine hashes over a call-id that is not exactly the one present in
    the originale message.

    Example: "Call-ID: \r\n abcdeABCDE1234"
    results in extracted call-id equal to "bcdeABCDE1234".

    In above example Call-ID is represented as a string in C language.
    Obviously in real message the first bytes after colon (":") are
    "20 0d 0a 20".

    Proposed fix is in nf_conntrack_sip module.
    Since sip_follow_continuation() function walks past the leading
    spaces or tabs of the continuation line, sip_skip_whitespace()
    should simply return the ouput of sip_follow_continuation().
    Otherwise another iteration of the for loop is done and dptr
    is incremented by one pointing to the second character of the
    first word in the header.

    Below is an extract of relevant SIP ABNF syntax.

    Call-ID = ( "Call-ID" / "i" ) HCOLON callid
    callid = word [ "@" word ]

    HCOLON = *( SP / HTAB ) ":" SWS
    SWS = [LWS] ; sep whitespace
    LWS = [*WSP CRLF] 1*WSP ; linear whitespace
    WSP = SP / HTAB
    word = 1*(alphanum / "-" / "." / "!" / "%" / "*" /
    "_" / "+" / "`" / "'" / "~" /
    "(" / ")" / "" /
    ":" / "\" / DQUOTE /
    "/" / "[" / "]" / "?" /
    "{" / "}" )

    Signed-off-by: Marco Angaroni
    Signed-off-by: Pablo Neira Ayuso

    Marco Angaroni
     
  • … defined by netfilter

    There are two existing strutures which defines the GRE and PPTP header.
    So use these two structures instead of the ones defined by netfilter to
    keep consitent with other codes.

    Signed-off-by: Gao Feng <fgao@ikuai8.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

    Gao Feng
     
  • There are already some GRE_* macros in kernel, so it is unnecessary
    to define these macros. And remove some useless macros

    Signed-off-by: Gao Feng
    Signed-off-by: Pablo Neira Ayuso

    Gao Feng
     
  • gpio_to_irq does not return NO_IRQ but instead returns a negative
    error code on failure. Returning NO_IRQ from the function has no
    negative effects as we only compare the result to the expected
    interrupt number, but it's better to return a proper failure
    code for consistency, and we should remove NO_IRQ from the kernel
    entirely.

    Signed-off-by: Arnd Bergmann
    Acked-by: Richard Cochran
    Signed-off-by: David S. Miller

    Arnd Bergmann
     
  • Reported-by: Ma Yuying
    Suggested-by: Jarod Wilson
    Signed-off-by: Bert Kenward
    Reviewed-by: Jarod Wilson
    Signed-off-by: David S. Miller

    Bert Kenward
     
  • The newly added bpf_overflow_handler function is only built of both
    CONFIG_EVENT_TRACING and CONFIG_BPF_SYSCALL are enabled, but the caller
    only checks the latter:

    kernel/events/core.c: In function 'perf_event_alloc':
    kernel/events/core.c:9106:27: error: 'bpf_overflow_handler' undeclared (first use in this function)

    This changes the caller so we also skip this call if CONFIG_EVENT_TRACING
    is disabled entirely.

    Signed-off-by: Arnd Bergmann
    Fixes: aa6a5f3cb2b2 ("perf, bpf: add perf events core support for BPF_PROG_TYPE_PERF_EVENT programs")
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Arnd Bergmann
     
  • We get 1 warning when building kernel with W=1:
    drivers/net/ethernet/arc/emac_mdio.c:107:5: warning: no previous prototype for 'arc_mdio_reset' [-Wmissing-prototypes]

    In fact, this function is only used in the file in which it is
    declared and don't need a declaration, but can be made static.
    so this patch marks this function with 'static'.

    Signed-off-by: Baoyou Xie
    Signed-off-by: David S. Miller

    Baoyou Xie
     
  • We get a few warnings when building kernel with W=1:
    drivers/net/usb/lan78xx.c:1182:6: warning: no previous prototype for 'lan78xx_defer_kevent' [-Wmissing-prototypes]
    drivers/net/usb/lan78xx.c:1409:5: warning: no previous prototype for 'lan78xx_nway_reset' [-Wmissing-prototypes]
    drivers/net/usb/lan78xx.c:2000:5: warning: no previous prototype for 'lan78xx_set_mac_addr' [-Wmissing-prototypes]
    ....

    In fact, these functions are only used in the file in which they are
    declared and don't need a declaration, but can be made static.
    so this patch marks these functions with 'static'.

    Signed-off-by: Baoyou Xie
    Signed-off-by: David S. Miller

    Baoyou Xie
     
  • David S. Miller
     
  • Adds support for several infrastructure operations that are done as part of
    debug data collection.

    Signed-off-by: Tomer Tayar
    Signed-off-by: Yuval Mintz
    Signed-off-by: David S. Miller

    Tomer Tayar