23 Sep, 2016
6 commits
-
There are some codes which are used to get one random once in netfilter.
We could use net_get_random_once to simplify these codes.Signed-off-by: Gao Feng
Signed-off-by: Pablo Neira Ayuso -
pkt->xt.thoff is not always set properly, but we use it without any check.
For payload expr, it will cause wrong results. For nftrace, we may notify
the wrong network or transport header to the user space, furthermore,
input the following nft rules, warning message will be printed out:
# nft add rule arp filter output meta nftrace set 1WARNING: CPU: 0 PID: 13428 at net/netfilter/nf_tables_trace.c:263
nft_trace_notify+0x4a3/0x5e0 [nf_tables]
Call Trace:
[] dump_stack+0x63/0x85
[] __warn+0xcb/0xf0
[] warn_slowpath_null+0x1d/0x20
[] nft_trace_notify+0x4a3/0x5e0 [nf_tables]
[ ... ]
[] nft_do_chain_arp+0x78/0x90 [nf_tables_arp]
[] nf_iterate+0x62/0x80
[] nf_hook_slow+0x73/0xd0
[] arp_xmit+0x8f/0xb0
[ ... ]
[] arp_solicit+0x106/0x2c0So before we use pkt->xt.thoff, check the tprot_set first.
Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso -
There's an off-by-one issue in nft_payload_fast_eval, skb_tail_pointer
and ptr + priv->len all point to the last valid address plus 1. So if
they are equal, we can still fetch the valid data. It's unnecessary to
fall back to nft_payload_eval.Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso -
After commit ac2863445686 ("netfilter: bridge: add nf_afinfo to enable
queuing to userspace"), we can queue packets to the user space in bridge
family. But when the user specify the queue range, packets will be only
delivered to the first queue num. Because in nfqueue_hash, we only support
ipv4 and ipv6 family. Now add support for bridge family too.Suggested-by: Pablo Neira Ayuso
Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso -
Currently, the user can specify the queue numbers by _QUEUE_NUM and
_QUEUE_TOTAL attributes, this is enough in most situations.But acctually, it is not very flexible, for example:
tcp dport 80 mapped to queue0
tcp dport 81 mapped to queue1
tcp dport 82 mapped to queue2
In order to do this thing, we must add 3 nft rules, and more
mapping meant more rules ...So take one register to select the queue number, then we can add one
simple rule to mapping queues, maybe like this:
queue num tcp dport map { 80:0, 81:1, 82:2 ... }Florian Westphal also proposed wider usage scenarios:
queue num jhash ip saddr . ip daddr mod ...
queue num meta cpu ...
queue num meta mark ...The last point is how to load a queue number from sreg, although we can
use *(u16*)®s->data[reg] to load the queue number, just like nat expr
to load its l4port do.But we will cooperate with hash expr, meta cpu, meta mark expr and so on.
They all store the result to u32 type, so cast it to u16 pointer and
dereference it will generate wrong result in the big endian system.So just keep it simple, we treat queue number as u32 type, although u16
type is already enough.Suggested-by: Pablo Neira Ayuso
Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso -
Fetch value and validate u32 netlink attribute. This validation is
usually required when the u32 netlink attributes are being stored in a
field whose size is smaller.This patch revisits 4da449ae1df9 ("netfilter: nft_exthdr: Add size check
on u8 nft_exthdr attributes").Fixes: 96518518cc41 ("netfilter: add nftables")
Suggested-by: Pablo Neira Ayuso
Signed-off-by: Laura Garcia Liebana
Signed-off-by: Pablo Neira Ayuso
22 Sep, 2016
1 commit
-
Add support of an offset value for incremental counter and random. With
this option the sysadmin is able to start the counter to a certain value
and then apply the generated number.Example:
meta mark set numgen inc mod 2 offset 100
This will generate marks with the serie 100, 101, 100, 101, ...
Suggested-by: Pablo Neira Ayuso
Signed-off-by: Laura Garcia Liebana
Signed-off-by: Pablo Neira Ayuso
13 Sep, 2016
14 commits
-
The overflow validation in the init() function establishes that the
maximum value that the hash could reach is less than U32_MAX, which is
likely to be true.The fix detects the overflow when the maximum hash value is less than
the offset itself.Fixes: 70ca767ea1b2 ("netfilter: nft_hash: Add hash offset value")
Reported-by: Liping Zhang
Signed-off-by: Laura Garcia Liebana
Signed-off-by: Pablo Neira Ayuso -
After we generate a new number, we still use the priv->counter and
store it to the dreg. This is not correct, another cpu may already
change it to a new number. So we must use the generated number, not
the priv->counter itself.Fixes: 91dbc6be0a62 ("netfilter: nf_tables: add number generator expression")
Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso -
These counters sit in hot path and do show up in perf, this is especially
true for 'found' and 'searched' which get incremented for every packet
processed.Information like
searched=212030105
new=623431
found=333613
delete=623327does not seem too helpful nowadays:
- on busy systems found and searched will overflow every few hours
(these are 32bit integers), other more busy ones every few days.- for debugging there are better methods, such as iptables' trace target,
the conntrack log sysctls. Nowadays we also have perf tool.This removes packet path stat counters except those that
are expected to be 0 (or close to 0) on a normal system, e.g.
'insert_failed' (race happened) or 'invalid' (proto tracker rejects).The insert stat is retained for the ctnetlink case.
The found stat is retained for the tuple-is-taken check when NAT has to
determine if it needs to pick a different source address.Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
hash_v6 is used by both nftables and ip6tables, so depend on
IP6_NF_IPTABLES is not properly.Actually, it only parses ipv6hdr and computes a hash value, so
even if IPV6 is disabled, there's no side effect too, remove it.Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso -
There are some codes of netfilter module which did not check the return
value of nft_register_chain_type. Add the checks now.Signed-off-by: Gao Feng
Signed-off-by: Pablo Neira Ayuso -
There are some codes of netfilter module which did not check the return
value of register_netdevice_notifier. Add the checks now.Signed-off-by: Gao Feng
Signed-off-by: Pablo Neira Ayuso -
Instead of several goto's just to return the result, simply return it.
Signed-off-by: Pablo Neira Ayuso
-
This is overly conservative and not flexible at all, so better let them
go through and let the filtering policy decide what to do with them. We
use skb_header_pointer() all over the place so we would just fail to
match when trying to access fields from malformed traffic.Signed-off-by: Pablo Neira Ayuso
-
Consolidate pktinfo setup and validation by using the new generic
functions so we converge to the netdev family codebase.We only need a linear IPv4 and IPv6 header from the reject expression,
so move nft_bridge_iphdr_validate() and nft_bridge_ip6hdr_validate()
to net/bridge/netfilter/nft_reject_bridge.c.Signed-off-by: Pablo Neira Ayuso
-
These functions are extracted from the netdev family, they initialize
the pktinfo structure and validate that the IPv4 and IPv6 headers are
well-formed given that these functions are called from a path where
layer 3 sanitization did not happen yet.These functions are placed in include/net/netfilter/nf_tables_ipv{4,6}.h
so they can be reused by a follow up patch to use them from the bridge
family too.Signed-off-by: Pablo Neira Ayuso
-
Make sure the pktinfo protocol fields are initialized if this fails to
parse the transport header.Signed-off-by: Pablo Neira Ayuso
-
This patch introduces nft_set_pktinfo_unspec() that ensures proper
initialization all of pktinfo fields for non-IP traffic. This is used
by the bridge, netdev and arp families.This new function relies on nft_set_pktinfo_proto_unspec() to set a new
tprot_set field that indicates if transport protocol information is
available. Remain fields are zeroed.The meta expression has been also updated to check to tprot_set in first
place given that zero is a valid tprot value. Even a handcrafted packet
may come with the IPPROTO_RAW (255) protocol number so we can't rely on
this value as tprot unset.Reported-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso -
The dynset expression matches if we can fit a new entry into the set.
If there is no room for it, then it breaks the rule evaluation.This patch introduces the inversion flag so you can add rules to
explicitly drop packets that don't fit into the set. For example:# nft filter input flow table xyz size 4 { ip saddr timeout 120s counter } overflow drop
This is useful to provide a replacement for connlimit.
For the rule above, every new entry uses the IPv4 address as key in the
set, this entry gets a timeout of 120 seconds that gets refresh on every
packet seen. If we get new flow and our set already contains 4 entries
already, then this packet is dropped.You can already express this in positive logic, assuming default policy
to drop:# nft filter input flow table xyz size 4 { ip saddr timeout 10s counter } accept
Signed-off-by: Pablo Neira Ayuso
-
Add support to pass through an offset to the hash value. With this
feature, the sysadmin is able to generate a hash with a given
offset value.Example:
meta mark set jhash ip saddr mod 2 seed 0xabcd offset 100
This option generates marks according to the source address from 100 to
101.Signed-off-by: Laura Garcia Liebana
09 Sep, 2016
2 commits
-
After commit adf0516845bc ("netfilter: remove ip_conntrack* sysctl
compat code"), ctl_table_path member in struct nf_conntrack_l3proto{}
is not used anymore, remove it.Signed-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso -
Although the validation of queues_total and queuenum is checked in nft
utility, but user can add nft rules via nfnetlink, so it is necessary
to check the validation at the nft_queue expr init routine too.Tested by run ./nft-test.py any/queue.t:
any/queue.t: 6 unit tests, 0 error, 0 warningSigned-off-by: Liping Zhang
Signed-off-by: Pablo Neira Ayuso
07 Sep, 2016
17 commits
-
Current parsing methods for SIP headers do not allow the presence of
tab characters between header name and header value. As a result Call-ID
SIP headers like the following are discarded by IPVS SIP persistence
engine:"Call-ID\t: mycallid@abcde"
"Call-ID:\tmycallid@abcde"In above examples Call-IDs are represented as strings in C language.
Obviously in real message we have byte "09" before/after colon (":").Proposed fix is in nf_conntrack_sip module.
Function sip_skip_whitespace() should skip tabs in addition to spaces,
since in SIP grammar whitespace (WSP) corresponds to space or tab.Below is an extract of relevant SIP ABNF syntax.
Call-ID = ( "Call-ID" / "i" ) HCOLON callid
callid = word [ "@" word ]HCOLON = *( SP / HTAB ) ":" SWS
SWS = [LWS] ; sep whitespace
LWS = [*WSP CRLF] 1*WSP ; linear whitespace
WSP = SP / HTAB
word = 1*(alphanum / "-" / "." / "!" / "%" / "*" /
"_" / "+" / "`" / "'" / "~" /
"(" / ")" / "" /
":" / "\" / DQUOTE /
"/" / "[" / "]" / "?" /
"{" / "}" )Signed-off-by: Marco Angaroni
Signed-off-by: Pablo Neira Ayuso -
This is patch renames the existing function to nft_overquota() and make
it return a boolean that tells us if we have exceeded our byte quota.
Just a cleanup.Signed-off-by: Pablo Neira Ayuso
-
Use xor to decide to break further rule evaluation or not, since the
existing logic doesn't achieve the expected inversion.Signed-off-by: Pablo Neira Ayuso
-
The _until_ attribute is renamed to _modulus_ as the behaviour is similar to
other expresions with number limits (ex. nft_hash).Renaming is possible because there isn't a kernel release yet with these
changes.Signed-off-by: Laura Garcia Liebana
Signed-off-by: Pablo Neira Ayuso -
There are some debug code which are commented out in find_pattern by #if 0.
Now remove them.Signed-off-by: Gao Feng
Signed-off-by: Pablo Neira Ayuso -
The caller function "help" has already make sure the datalen could not be zero
before invoke find_pattern as a parameter by the following codesif (dataoff >= skb->len) {
pr_debug("ftp: dataoff(%u) >= skblen(%u)\n", dataoff,
skb->len);
return NF_ACCEPT;
}
datalen = skb->len - dataoff;And the latter codes "ends_in_nl = (fb_ptr[datalen - 1] == '\n');" use datalen
directly without checking if it is zero.So it is unneccessary to check it in find_pattern too.
Signed-off-by: Gao Feng
Signed-off-by: Pablo Neira Ayuso -
Current parsing methods for SIP header Call-ID do not check correctly all
characters allowed by RFC 3261. In particular "," character is allowed
instead of "'" character. As a result Call-ID headers like the following
are discarded by IPVS SIP persistence engine.Call-ID: -.!%*_+`'~()<>:\"/[]?{}
Above example is composed using all non-alphanumeric characters listed
in RFC 3261 for Call-ID header syntax.Proposed fix is in nf_conntrack_sip module; function iswordc() checks this
range: (c >= '(' && c
Signed-off-by: Pablo Neira Ayuso -
Current parsing methods for SIP headers do not properly manage
continuation lines: in case of Call-ID header the first character of
Call-ID header value is truncated. As a result IPVS SIP persistence
engine hashes over a call-id that is not exactly the one present in
the originale message.Example: "Call-ID: \r\n abcdeABCDE1234"
results in extracted call-id equal to "bcdeABCDE1234".In above example Call-ID is represented as a string in C language.
Obviously in real message the first bytes after colon (":") are
"20 0d 0a 20".Proposed fix is in nf_conntrack_sip module.
Since sip_follow_continuation() function walks past the leading
spaces or tabs of the continuation line, sip_skip_whitespace()
should simply return the ouput of sip_follow_continuation().
Otherwise another iteration of the for loop is done and dptr
is incremented by one pointing to the second character of the
first word in the header.Below is an extract of relevant SIP ABNF syntax.
Call-ID = ( "Call-ID" / "i" ) HCOLON callid
callid = word [ "@" word ]HCOLON = *( SP / HTAB ) ":" SWS
SWS = [LWS] ; sep whitespace
LWS = [*WSP CRLF] 1*WSP ; linear whitespace
WSP = SP / HTAB
word = 1*(alphanum / "-" / "." / "!" / "%" / "*" /
"_" / "+" / "`" / "'" / "~" /
"(" / ")" / "" /
":" / "\" / DQUOTE /
"/" / "[" / "]" / "?" /
"{" / "}" )Signed-off-by: Marco Angaroni
Signed-off-by: Pablo Neira Ayuso -
… defined by netfilter
There are two existing strutures which defines the GRE and PPTP header.
So use these two structures instead of the ones defined by netfilter to
keep consitent with other codes.Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> -
There are already some GRE_* macros in kernel, so it is unnecessary
to define these macros. And remove some useless macrosSigned-off-by: Gao Feng
Signed-off-by: Pablo Neira Ayuso -
gpio_to_irq does not return NO_IRQ but instead returns a negative
error code on failure. Returning NO_IRQ from the function has no
negative effects as we only compare the result to the expected
interrupt number, but it's better to return a proper failure
code for consistency, and we should remove NO_IRQ from the kernel
entirely.Signed-off-by: Arnd Bergmann
Acked-by: Richard Cochran
Signed-off-by: David S. Miller -
Reported-by: Ma Yuying
Suggested-by: Jarod Wilson
Signed-off-by: Bert Kenward
Reviewed-by: Jarod Wilson
Signed-off-by: David S. Miller -
The newly added bpf_overflow_handler function is only built of both
CONFIG_EVENT_TRACING and CONFIG_BPF_SYSCALL are enabled, but the caller
only checks the latter:kernel/events/core.c: In function 'perf_event_alloc':
kernel/events/core.c:9106:27: error: 'bpf_overflow_handler' undeclared (first use in this function)This changes the caller so we also skip this call if CONFIG_EVENT_TRACING
is disabled entirely.Signed-off-by: Arnd Bergmann
Fixes: aa6a5f3cb2b2 ("perf, bpf: add perf events core support for BPF_PROG_TYPE_PERF_EVENT programs")
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller -
We get 1 warning when building kernel with W=1:
drivers/net/ethernet/arc/emac_mdio.c:107:5: warning: no previous prototype for 'arc_mdio_reset' [-Wmissing-prototypes]In fact, this function is only used in the file in which it is
declared and don't need a declaration, but can be made static.
so this patch marks this function with 'static'.Signed-off-by: Baoyou Xie
Signed-off-by: David S. Miller -
We get a few warnings when building kernel with W=1:
drivers/net/usb/lan78xx.c:1182:6: warning: no previous prototype for 'lan78xx_defer_kevent' [-Wmissing-prototypes]
drivers/net/usb/lan78xx.c:1409:5: warning: no previous prototype for 'lan78xx_nway_reset' [-Wmissing-prototypes]
drivers/net/usb/lan78xx.c:2000:5: warning: no previous prototype for 'lan78xx_set_mac_addr' [-Wmissing-prototypes]
....In fact, these functions are only used in the file in which they are
declared and don't need a declaration, but can be made static.
so this patch marks these functions with 'static'.Signed-off-by: Baoyou Xie
Signed-off-by: David S. Miller -
Adds support for several infrastructure operations that are done as part of
debug data collection.Signed-off-by: Tomer Tayar
Signed-off-by: Yuval Mintz
Signed-off-by: David S. Miller