10 Jan, 2014
5 commits
-
We don't encode argument types into function names and since besides
nft_do_chain() there are only AF-specific versions, there is no risk
of confusion.Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso -
Minor nf_chain_type cleanups:
- reorder struct to plug a hoe
- rename struct module member to "owner" for consistency
- rename nf_hookfn array to "hooks" for consistency
- reorder initializers for better readabilitySigned-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso -
Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso -
The chain type module reference handling makes no sense at all: we take
a reference immediately when the module is registered, preventing the
module from ever being unloaded.Fix by taking a reference when we're actually creating a chain of the
chain type and release the reference when destroying the chain.Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso -
This patch adds kernel support for setting properties of tracked
connections. Currently, only connmark is supported. One use-case
for this feature is to provide the same functionality as
-j CONNMARK --save-mark in iptables.Some restructuring was needed to implement the set op. The new
structure follows that of nft_meta.Signed-off-by: Kristian Evensen
Signed-off-by: Pablo Neira Ayuso
08 Jan, 2014
6 commits
-
For L3-proto independant rules we need to get at the L4 protocol value
directly. Add it to the nft_pktinfo struct and use the meta expression
to retrieve it.Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso -
Needed by multi-family tables to distinguish IPv4 and IPv6 packets.
Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso -
This patch adds a new table family and a new filter chain that you can
use to attach IPv4 and IPv6 rules. This should help to simplify
rule-set maintainance in dual-stack setups.Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso -
Add support to register chains to multiple hooks for different address
families for mixed IPv4/IPv6 tables.Signed-off-by: Patrick McHardy
-
Multi-family tables need the AF from the hook ops. Add a pointer to the
hook ops and replace usage of the hooknum member in struct nft_pktinfo.Signed-off-by: Patrick McHardy
Signed-off-by: Pablo Neira Ayuso -
This change allows to follow a recommandation of RFC4942.
- Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses
as source addresses for ICMPv6 echo reply. This sysctl is false by default
to preserve existing behavior.
- Add inline check ipv6_anycast_destination().
- Use them in icmpv6_echo_reply().Reference:
RFC4942 - IPv6 Transition/Coexistence Security Considerations
(http://tools.ietf.org/html/rfc4942#section-2.1.6)2.1.6. Anycast Traffic Identification and Security
[...]
To avoid exposing knowledge about the internal structure of the
network, it is recommended that anycast servers now take advantage of
the ability to return responses with the anycast address as the
source address if possible.Signed-off-by: Francois-Xavier Le Bail
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
07 Jan, 2014
10 commits
-
GRO/GSO layers can be enabled on a node, even if said
node is only forwarding packets.This patch permits GSO (and upcoming GRO) support for GRE
encapsulated packets, even if the host has no GRE tunnel setup.Signed-off-by: Eric Dumazet
Cc: H.K. Jerry Chu
Signed-off-by: David S. Miller -
Jesse Gross says:
====================
[GIT net-next] Open vSwitchOpen vSwitch changes for net-next/3.14. Highlights are:
* Performance improvements in the mechanism to get packets to userspace
using memory mapped netlink and skb zero copy where appropriate.
* Per-cpu flow stats in situations where flows are likely to be shared
across CPUs. Standard flow stats are used in other situations to save
memory and allocation time.
* A handful of code cleanups and rationalization.
====================Signed-off-by: David S. Miller
-
Drop user features if an outdated user space instance that does not
understand the concept of user_features attempted to create a new
datapath.Signed-off-by: Thomas Graf
Signed-off-by: Jesse Gross -
Signed-off-by: Thomas Graf
Reviewed-by: Daniel Borkmann
Signed-off-by: Jesse Gross -
Make the skb zerocopy logic written for nfnetlink queue available for
use by other modules.Signed-off-by: Thomas Graf
Reviewed-by: Daniel Borkmann
Acked-by: David S. Miller
Signed-off-by: Jesse Gross -
Allocates a new sk_buff large enough to cover the specified payload
plus required Netlink headers. Will check receiving socket for
memory mapped i/o capability and use it if enabled. Will fall back
to non-mapped skb if message size exceeds the frame size of the ring.Signed-of-by: Thomas Graf
Reviewed-by: Daniel Borkmann
Signed-off-by: Jesse Gross -
Conflicts:
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
net/ipv6/ip6_tunnel.c
net/ipv6/ip6_vti.cipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.Signed-off-by: David S. Miller
-
TCP out_of_order_queue lock is not used, as queue manipulation
happens with socket lock held and we therefore use the lockless
skb queue routines (as __skb_queue_head())We can use __skb_queue_head_init() instead of skb_queue_head_init()
to make this more consistent.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
Proportional Integral controller Enhanced (PIE) is a scheduler to address the
bufferbloat problem.>From the IETF draft below:
" Bufferbloat is a phenomenon where excess buffers in the network cause high
latency and jitter. As more and more interactive applications (e.g. voice over
IP, real time video streaming and financial transactions) run in the Internet,
high latency and jitter degrade application performance. There is a pressing
need to design intelligent queue management schemes that can control latency and
jitter; and hence provide desirable quality of service to users.We present here a lightweight design, PIE(Proportional Integral controller
Enhanced) that can effectively control the average queueing latency to a target
value. Simulation results, theoretical analysis and Linux testbed results have
shown that PIE can ensure low latency and achieve high link utilization under
various congestion situations. The design does not require per-packet
timestamp, so it incurs very small overhead and is simple enough to implement
in both hardware and software. "Many thanks to Dave Taht for extensive feedback, reviews, testing and
suggestions. Thanks also to Stephen Hemminger and Eric Dumazet for reviews and
suggestions. Naeem Khademi and Dave Taht independently contributed to ECN
support.For more information, please see technical paper about PIE in the IEEE
Conference on High Performance Switching and Routing 2013. A copy of the paper
can be found at ftp://ftpeng.cisco.com/pie/.Please also refer to the IETF draft submission at
http://tools.ietf.org/html/draft-pan-tsvwg-pie-00All relevant code, documents and test scripts and results can be found at
ftp://ftpeng.cisco.com/pie/.For problems with the iproute2/tc or Linux kernel code, please contact Vijay
Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
(mysuryan@cisco.com)Signed-off-by: Vijay Subramanian
Signed-off-by: Mythili Prabhu
CC: Dave Taht
Signed-off-by: David S. Miller -
Pablo Neira Ayuso says:
====================
nftables updates for net-nextThe following patchset contains nftables updates for your net-next tree,
they are:* Add set operation to the meta expression by means of the select_ops()
infrastructure, this allows us to set the packet mark among other things.
From Arturo Borrero Gonzalez.* Fix wrong format in sscanf in nf_tables_set_alloc_name(), from Daniel
Borkmann.* Add new queue expression to nf_tables. These comes with two previous patches
to prepare this new feature, one to add mask in nf_tables_core to
evaluate the queue verdict appropriately and another to refactor common
code with xt_NFQUEUE, from Eric Leblond.* Do not hide nftables from Kconfig if nfnetlink is not enabled, also from
Eric Leblond.* Add the reject expression to nf_tables, this adds the missing TCP RST
support. It comes with an initial patch to refactor common code with
xt_NFQUEUE, again from Eric Leblond.* Remove an unused variable assignment in nf_tables_dump_set(), from Michal
Nazarewicz.* Remove the nft_meta_target code, now that Arturo added the set operation
to the meta expression, from me.* Add help information for nf_tables to Kconfig, also from me.
* Allow to dump all sets by specifying NFPROTO_UNSPEC, similar feature is
available to other nf_tables objects, requested by Arturo, from me.* Expose the table usage counter, so we can know how many chains are using
this table without dumping the list of chains, from Tomasz Bursztyka.
====================Signed-off-by: David S. Miller
06 Jan, 2014
2 commits
-
macvlan needs vlan_pcpu_stats so make it visible even if compiling
without VLAN_8021Q support. Otherwise a very long compiler error happens.Fixes: cdf3e274cf1b36 ("macvlan: unify macvlan_pcpu_stats and vlan_pcpu_stats")
Cc: Li RongQing
Signed-off-by: Hannes Frederic Sowa
Acked-By: Li RongQing
Signed-off-by: David S. Miller -
Pablo Neira Ayuso says:
====================
netfilter/IPVS updates for net-nextThe following patchset contains Netfilter updates for your net-next tree,
they are:* Add full port randomization support. Some crazy researchers found a way
to reconstruct the secure ephemeral ports that are allocated in random mode
by sending off-path bursts of UDP packets to overrun the socket buffer of
the DNS resolver to trigger retransmissions, then if the timing for the
DNS resolution done by a client is larger than usual, then they conclude
that the port that received the burst of UDP packets is the one that was
opened. It seems a bit aggressive method to me but it seems to work for
them. As a result, Daniel Borkmann and Hannes Frederic Sowa came up with a
new NAT mode to fully randomize ports using prandom.* Add a new classifier to x_tables based on the socket net_cls set via
cgroups. These includes two patches to prepare the field as requested by
Zefan Li. Also from Daniel Borkmann.* Use prandom instead of get_random_bytes in several locations of the
netfilter code, from Florian Westphal.* Allow to use the CTA_MARK_MASK in ctnetlink when mangling the conntrack
mark, also from Florian Westphal.* Fix compilation warning due to unused variable in IPVS, from Geert
Uytterhoeven.* Add support for UID/GID via nfnetlink_queue, from Valentina Giusti.
* Add IPComp extension to x_tables, from Fan Du.
====================Signed-off-by: David S. Miller
05 Jan, 2014
7 commits
-
This function is used to get a specific core when there is more than
one core of that specific type. This is used in bgmac to reset all GMAC
cores.Signed-off-by: Hauke Mehrtens
Acked-by: Rafał Miłecki
Signed-off-by: David S. Miller -
They are same, so unify them as one; since macvlan is a kind of vlan,
vlan_pcpu_stats should be a proper name for vlan and macvlan.Signed-off-by: Li RongQing
Signed-off-by: David S. Miller -
They are same, so unify them as one, pcpu_sw_netstats.
Define pcpu_sw_netstat in netdevice.h, remove pcpu_tstats
from if_tunnel and remove br_cpu_netstats from br_private.hCc: Cong Wang
Cc: Stephen Hemminger
Signed-off-by: Li RongQing
Signed-off-by: David S. Miller -
Jeff Kirsher says:
====================
Intel Wired LAN Driver UpdatesThis series contains updates to i40e and pci_regs.h.
Anjali provides a patch to prevent messages from stray HMC events, except
at interrupt message level, and refactors the HMC error handling.Catherine adds routines in probe to populate/check PCI bus speed and width,
then verify we are in a 8GT/s x8 PCIe slot and warn when we are not.Shannon adds Wake-on-LAN support for i40e, fixes curly brace use as well as
return type for i40e_vsi_clear_rings().Joseph implements receive offload for VXLAN for i40e, where the hardware
supports checksum offload/verification of the inner/outer header.Mitch provides the bulk of the changes, where he refactors the VF reset
code so that it works on real hardware. Then does code cleanup by
calling existing functions to enable and disable queues for VFs and
remove unused functions. Removes a unnecessary log messages that are
seen at every VF reset, for example complaining about disabling queues
that are already disabled. Fixes an error return when the VF asks to
add an invalid MAC address and if the VF sends a bad message, make it
more informative about what is actually going on.Jesse refactors the LED function to flash LED lights correctly.
v2:
- removed patch 5 "i40e: add set settings and pauseparam" based on
feedback from Ben Hutchings, will re-work that patch for later
submission
- Added patch "i40e: Implementation of vxlan ndo's" from Joseph to
address Or Gerlitz's questions and concerns. This patch adds the
implementation for the VXLAN ndo's and allows the hardware to do
receive checksum offload for inner packets on the UDP ports that
VXLAN notifies us about.
- Added patch "i40e: using for_each_set_bit to simplify the code"
from Wei Yongjun. This patch uses for_each_set_bit() to simply
the code.v3:
- fixed indentation issue in patch 11 based on feedback from
Sergei Shtylyov.Sorry for the delayed release of v4, it was delayed to the holidays.
v4:
- Addressed Or Gerlitz's concerns about trying to get a hold of a mutex
while holding a spin lock in patch 6 by executing the AQ commands from
a subtask.
- Addressed David Miller's Kconfig concerns by creating a Kconfig VXLAN
option for i40e and wrapped appropriate code with the config option in
patch 6.
- Updated patch 7 based on the changes made in patch 6 in the above two
bullets.v5:
- Added the patch to pci_regs.h based on David Miller's feedback to add
PCI defines for speed and width
- Updated patch 3 description to better explain the changes based on
feedback from David Miller
- Updated patch 4 to use the newly added defines to pci_regs.h instead
of local defines
- Updated patch 7 to use in the #include based on feedback
from David Miller
====================Signed-off-by: David S. Miller
-
phy_scan_fixups() isn't and shouldn't be called by the drivers directly, so
unexport it. And since Florian Fainelli's recent patches, the function is only
called locally, so we can make it static as well.Signed-off-by: Sergei Shtylyov
Signed-off-by: David S. Miller -
Remove adjust_state() callback from 'struct phy_device' since it seems to have
never been really used from the inception: phy_start_machine() has been always
called with 2nd argument equal to NULL.Signed-off-by: Sergei Shtylyov
Signed-off-by: David S. Miller -
Running 'checkpatch.pl' gives some errors and warnings:
- no spaces around =;
- * separated by space from the function name;
- { in function definition not on a separate line;
- line over 80 characters.
While fixing these, also fix the following style issues:
- file name in the heading comment;
- alignment not matching open paren.
Signed-off-by: Sergei Shtylyov
Signed-off-by: David S. Miller
04 Jan, 2014
10 commits
-
Add missing PCI bus link speed 8.0 GT/s and bus link widths of
x1, x2, x4 and x8.CC:
CC: Bjorn Helgaas
Signed-off-by: Jeff Kirsher
Acked-by: Bjorn Helgaas -
Add nested IFLA_BOND_AD_INFO for bonding 802.3ad info.
Signed-off-by: Scott Feldman
Signed-off-by: David S. Miller -
Add IFLA_BOND_AD_SELECT to allow get/set of bonding parameter
ad_select via netlink.Signed-off-by: Scott Feldman
Signed-off-by: David S. Miller -
Add IFLA_BOND_AD_LACP_RATE to allow get/set of bonding parameter
lacp_rate via netlink.Signed-off-by: Scott Feldman
Signed-off-by: David S. Miller -
The llc_sap_list_lock does not need to be global, only acquired
in core.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller -
Namespace related cleaning
* make cred_to_ucred static
* remove unused sock_rmalloc functionSigned-off-by: Stephen Hemminger
Signed-off-by: David S. Miller -
percpu route cache eliminates share of dst refcnt between CPUs.
Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller -
Avoid doing a route lookup on every packet being tunneled.
In ip_tunnel.c cache the route returned from ip_route_output if
the tunnel is "connected" so that all the rouitng parameters are
taken from tunnel parms for a packet. Specifically, not NBMA tunnel
and tos is from tunnel parms (not inner packet).Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller -
It would be useful e.g. in a server or desktop environment to have
a facility in the notion of fine-grained "per application" or "per
application group" firewall policies. Probably, users in the mobile,
embedded area (e.g. Android based) with different security policy
requirements for application groups could have great benefit from
that as well. For example, with a little bit of configuration effort,
an admin could whitelist well-known applications, and thus block
otherwise unwanted "hard-to-track" applications like [1] from a
user's machine. Blocking is just one example, but it is not limited
to that, meaning we can have much different scenarios/policies that
netfilter allows us than just blocking, e.g. fine grained settings
where applications are allowed to connect/send traffic to, application
traffic marking/conntracking, application-specific packet mangling,
and so on.Implementation of PID-based matching would not be appropriate
as they frequently change, and child tracking would make that
even more complex and ugly. Cgroups would be a perfect candidate
for accomplishing that as they associate a set of tasks with a
set of parameters for one or more subsystems, in our case the
netfilter subsystem, which, of course, can be combined with other
cgroup subsystems into something more complex if needed.As mentioned, to overcome this constraint, such processes could
be placed into one or multiple cgroups where different fine-grained
rules can be defined depending on the application scenario, while
e.g. everything else that is not part of that could be dropped (or
vice versa), thus making life harder for unwanted processes to
communicate to the outside world. So, we make use of cgroups here
to track jobs and limit their resources in terms of iptables
policies; in other words, limiting, tracking, etc what they are
allowed to communicate.In our case we're working on outgoing traffic based on which local
socket that originated from. Also, one doesn't even need to have
an a-prio knowledge of the application internals regarding their
particular use of ports or protocols. Matching is *extremly*
lightweight as we just test for the sk_classid marker of sockets,
originating from net_cls. net_cls and netfilter do not contradict
each other; in fact, each construct can live as standalone or they
can be used in combination with each other, which is perfectly fine,
plus it serves Tejun's requirement to not introduce a new cgroups
subsystem. Through this, we result in a very minimal and efficient
module, and don't add anything except netfilter code.One possible, minimal usage example (many other iptables options
can be applied obviously):1) Configuring cgroups if not already done, e.g.:
mkdir /sys/fs/cgroup/net_cls
mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls
mkdir /sys/fs/cgroup/net_cls/0
echo 1 > /sys/fs/cgroup/net_cls/0/net_cls.classid
(resp. a real flow handle id for tc)2) Configuring netfilter (iptables-nftables), e.g.:
iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP
3) Running applications, e.g.:
ping 208.67.222.222
echo 1799 > /sys/fs/cgroup/net_cls/0/tasks
64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms
[...]
ping 208.67.220.220
ping: sendmsg: Operation not permitted
[...]
echo 1804 > /sys/fs/cgroup/net_cls/0/tasks
64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms
[...]Of course, real-world deployments would make use of cgroups user
space toolsuite, or own custom policy daemons dynamically moving
applications from/to various cgroups.[1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdf
Signed-off-by: Daniel Borkmann
Cc: Tejun Heo
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan
Signed-off-by: Pablo Neira Ayuso -
While we're at it and introduced CGROUP_NET_CLASSID, lets also make
NETPRIO_CGROUP more consistent with the rest of cgroups and rename it
into CONFIG_CGROUP_NET_PRIO so that for networking, we now have
CONFIG_CGROUP_NET_{PRIO,CLASSID}. This not only makes the CONFIG
option consistent among networking cgroups, but also among cgroups
CONFIG conventions in general as the vast majority has a prefix of
CONFIG_CGROUP_.Signed-off-by: Daniel Borkmann
Cc: Zefan Li
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan
Signed-off-by: Pablo Neira Ayuso