Eric Lee / smarc-fsl-linux-kernel

13 Dec, 2011

1 commit

3dc43e3e4 per-netns ipv4 sysctl_tcp_mem ... Browse Code »
129

This patch allows each namespace to independently set up
its levels for tcp memory pressure thresholds. This patch
alone does not buy much: we need to make this values
per group of process somehow. This is achieved in the
patches that follows in this patchset.

Signed-off-by: Glauber Costa
Reviewed-by: KAMEZAWA Hiroyuki
CC: David S. Miller
CC: Eric W. Biederman
Signed-off-by: David S. Miller

Glauber Costa
2011-12-13 08:04:11 +0800

12 Dec, 2011

1 commit

dfd56b8b3 net: use IS_ENABLED(CONFIG_IPV6) ... Browse Code »

Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-12 07:25:16 +0800

03 Dec, 2011

1 commit

b3613118e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2011-12-03 02:49:21 +0800

22 Nov, 2011

1 commit

70e9942f1 netfilter: nf_conntrack: make event callback registration per-netns ... Browse Code »
43

This patch fixes an oops that can be triggered following this recipe:

0) make sure nf_conntrack_netlink and nf_conntrack_ipv4 are loaded.
1) container is started.
2) connect to it via lxc-console.
3) generate some traffic with the container to create some conntrack
entries in its table.
4) stop the container: you hit one oops because the conntrack table
cleanup tries to report the destroy event to user-space but the
per-netns nfnetlink socket has already gone (as the nfnetlink
socket is per-netns but event callback registration is global).

To fix this situation, we make the ctnl_notifier per-netns so the
callback is registered/unregistered if the container is
created/destroyed.

Alex Bligh and Alexey Dobriyan originally proposed one small patch to
check if the nfnetlink socket is gone in nfnetlink_has_listeners,
but this is a very visited path for events, thus, it may reduce
performance and it looks a bit hackish to check for the nfnetlink
socket only to workaround this situation. As a result, I decided
to follow the bigger path choice, which seems to look nicer to me.

Cc: Alexey Dobriyan
Reported-by: Alex Bligh
Signed-off-by: Pablo Neira Ayuso

Pablo Neira Ayuso
2011-11-22 07:34:47 +0800

14 Nov, 2011

1 commit

2a24444f8 ipv6: reduce percpu needs for icmpv6msg mibs ... Browse Code »

Reading /proc/net/snmp6 on a machine with a lot of cpus is very
expensive (can be ~88000 us).

This is because ICMPV6MSG MIB uses 4096 bytes per cpu, and folding
values for all possible cpus can read 16 Mbytes of memory (32MBytes on
non x86 arches)

ICMP messages are not considered as fast path on a typical server, and
eventually few cpus handle them anyway. We can afford an atomic
operation instead of using percpu data.

This saves 4096 bytes per cpu and per network namespace.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-11-14 13:12:26 +0800

10 Nov, 2011

1 commit

acb32ba3d ipv4: reduce percpu needs for icmpmsg mibs ... Browse Code »

Reading /proc/net/snmp on a machine with a lot of cpus is very expensive
(can be ~88000 us).

This is because ICMPMSG MIB uses 4096 bytes per cpu, and folding values
for all possible cpus can read 16 Mbytes of memory.

ICMP messages are not considered as fast path on a typical server, and
eventually few cpus handle them anyway. We can afford an atomic
operation instead of using percpu data.

This saves 4096 bytes per cpu and per network namespace.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-11-10 05:04:20 +0800

27 Jul, 2011

1 commit

60063497a atomic: use <linux/atomic.h> ... Browse Code »

This allows us to move duplicated code in
(atomic_inc_not_zero() for now) to

Signed-off-by: Arun Sharma
Reviewed-by: Eric Dumazet
Cc: Ingo Molnar
Cc: David Miller
Cc: Eric Dumazet
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arun Sharma
2011-07-27 07:49:47 +0800

14 May, 2011

1 commit

c319b4d76 net: ipv4: add IPPROTO_ICMP socket kind ... Browse Code »

This patch adds IPPROTO_ICMP socket kind. It makes it possible to send
ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages
without any special privileges. In other words, the patch makes it
possible to implement setuid-less and CAP_NET_RAW-less /bin/ping. In
order not to increase the kernel's attack surface, the new functionality
is disabled by default, but is enabled at bootup by supporting Linux
distributions, optionally with restriction to a group or a group range
(see below).

Similar functionality is implemented in Mac OS X:
http://www.manpagez.com/man/4/icmp/

A new ping socket is created with

socket(PF_INET, SOCK_DGRAM, PROT_ICMP)

Message identifiers (octets 4-5 of ICMP header) are interpreted as local
ports. Addresses are stored in struct sockaddr_in. No port numbers are
reserved for privileged processes, port 0 is reserved for API ("let the
kernel pick a free number"). There is no notion of remote ports, remote
port numbers provided by the user (e.g. in connect()) are ignored.

Data sent and received include ICMP headers. This is deliberate to:
1) Avoid the need to transport headers values like sequence numbers by
other means.
2) Make it easier to port existing programs using raw sockets.

ICMP headers given to send() are checked and sanitized. The type must be
ICMP_ECHO and the code must be zero (future extensions might relax this,
see below). The id is set to the number (local port) of the socket, the
checksum is always recomputed.

ICMP reply packets received from the network are demultiplexed according
to their id's, and are returned by recv() without any modifications.
IP header information and ICMP errors of those packets may be obtained
via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source
quenches and redirects are reported as fake errors via the error queue
(IP_RECVERR); the next hop address for redirects is saved to ee_info (in
network order).

socket(2) is restricted to the group range specified in
"/proc/sys/net/ipv4/ping_group_range". It is "1 0" by default, meaning
that nobody (not even root) may create ping sockets. Setting it to "100
100" would grant permissions to the single group (to either make
/sbin/ping g+s and owned by this group or to grant permissions to the
"netadmins" group), "0 4294967295" would enable it for the world, "100
4294967295" would enable it for the users, but not daemons.

The existing code might be (in the unlikely case anyone needs it)
extended rather easily to handle other similar pairs of ICMP messages
(Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply
etc.).

Userspace ping util & patch for it:
http://openwall.info/wiki/people/segoon/ping

For Openwall GNU/*/Linux it was the last step on the road to the
setuid-less distro. A revision of this patch (for RHEL5/OpenVZ kernels)
is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs:
http://mirrors.kernel.org/openwall/Owl/current/iso/

Initially this functionality was written by Pavel Kankovsky for
Linux 2.4.32, but unfortunately it was never made public.

All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with
the patch.

PATCH v3:
- switched to flowi4.
- minor changes to be consistent with raw sockets code.

PATCH v2:
- changed ping_debug() to pr_debug().
- removed CONFIG_IP_PING.
- removed ping_seq_fops.owner field (unused for procfs).
- switched to proc_net_fops_create().
- switched to %pK in seq_printf().

PATCH v1:
- fixed checksumming bug.
- CAP_NET_RAW may not create icmp sockets anymore.

RFC v2:
- minor cleanups.
- introduced sysctl'able group range to restrict socket(2).

Signed-off-by: Vasiliy Kulikov
Signed-off-by: David S. Miller

Vasiliy Kulikov
2011-05-14 04:08:13 +0800

25 Mar, 2011

1 commit

436c3b66e ipv4: Invalidate nexthop cache nh_saddr more correctly. ... Browse Code »

Any operation that:

1) Brings up an interface
2) Adds an IP address to an interface
3) Deletes an IP address from an interface

can potentially invalidate the nh_saddr value, requiring
it to be recomputed.

Perform the recomputation lazily using a generation ID.

Reported-by: Julian Anastasov
Signed-off-by: David S. Miller

David S. Miller
2011-03-25 08:42:21 +0800

15 Mar, 2011

1 commit

2553d064f ipvs: move struct netns_ipvs ... Browse Code »

Remove include/net/netns/ip_vs.h because it depends on
structures from include/net/ip_vs.h. As ipvs is pointer in
struct net it is better to move struct netns_ipvs into
include/net/ip_vs.h, so that we can easily use other structures
in struct netns_ipvs.

Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman

Julian Anastasov
2011-03-15 08:36:50 +0800

19 Jan, 2011

1 commit

a992ca2a0 netfilter: nf_conntrack_tstamp: add flow-based timestamp extension ... Browse Code »

This patch adds flow-based timestamping for conntracks. This
conntrack extension is disabled by default. Basically, we use
two 64-bits variables to store the creation timestamp once the
conntrack has been confirmed and the other to store the deletion
time. This extension is disabled by default, to enable it, you
have to:

echo 1 > /proc/sys/net/netfilter/nf_conntrack_timestamp

This patch allows to save memory for user-space flow-based
loogers such as ulogd2. In short, ulogd2 does not need to
keep a hashtable with the conntrack in user-space to know
when they were created and destroyed, instead we use the
kernel timestamp. If we want to have a sane IPFIX implementation
in user-space, this nanosecs resolution timestamps are also
useful. Other custom user-space applications can benefit from
this via libnetfilter_conntrack.

This patch modifies the /proc output to display the delta time
in seconds since the flow start. You can also obtain the
flow-start date by means of the conntrack-tools.

Signed-off-by: Pablo Neira Ayuso
Signed-off-by: Patrick McHardy

Pablo Neira Ayuso
2011-01-19 23:00:07 +0800

14 Jan, 2011

1 commit

d862a6622 netfilter: nf_conntrack: use is_vmalloc_addr() ... Browse Code »

Use is_vmalloc_addr() in nf_ct_free_hashtable() and get rid of
the vmalloc flags to indicate that a hash table has been allocated
using vmalloc().

Signed-off-by: Patrick McHardy

Patrick McHardy
2011-01-14 22:45:56 +0800

13 Jan, 2011

17 commits

763f8d0ed IPVS: netns, svc counters moved in ip_vs_ctl,c ... Browse Code »

Last two global vars to be moved,
ip_vs_ftpsvc_counter and ip_vs_nullsvc_counter.

[horms@verge.net.au: removed whitespace-change-only hunk]
Signed-off-by: Hans Schillstrom
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Hans Schillstrom
2011-01-13 09:30:28 +0800
f2431e6e9 IPVS: netns, trash handling ... Browse Code »

trash list per namspace,
and reordering of some params in dst struct.

[ horms@verge.net.au: Use cancel_delayed_work_sync() instead of
cancel_rearming_delayed_work(). Found during
merge conflict resoliution ]
Signed-off-by: Hans Schillstrom
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Hans Schillstrom
2011-01-13 09:30:28 +0800
f6340ee0c IPVS: netns, defense work timer. ... Browse Code »

This patch makes defense work timer per name-space,
A net ptr had to be added to the ipvs struct,
since it's needed by defense_work_handler.

[ horms@verge.net.au: Use cancel_delayed_work_sync() instead of
cancel_rearming_delayed_work(). Found during
merge conflict resoliution ]
Signed-off-by: Hans Schillstrom
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Hans Schillstrom
2011-01-13 09:30:28 +0800
a0840e2e1 IPVS: netns, ip_vs_ctl local vars moved to ipvs struct. ... Browse Code »

Moving global vars to ipvs struct, except for svc table lock.
Next patch for ctl will be drop-rate handling.

*v3
__ip_vs_mutex remains global
ip_vs_conntrack_enabled(struct netns_ipvs *ipvs)

Signed-off-by: Hans Schillstrom
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Hans Schillstrom
2011-01-13 09:30:28 +0800
6e67e586e IPVS: netns, connection hash got net as param. ... Browse Code »

Connection hash table is now name space aware.
i.e. net ptr >> 8 is xor:ed to the hash,
and this is the first param to be compared.
The net struct is 0xa40 in size ( a little bit smaller for 32 bit arch:s)
and cache-line aligned, so a ptr >> 5 might be a more clever solution ?

All lookups where net is compared uses net_eq() which returns 1 when netns
is disabled, and the compiler seems to do something clever in that case.

ip_vs_conn_fill_param() have *net as first param now.

Three new inlines added to keep conn struct smaller
when names space is disabled.
- ip_vs_conn_net()
- ip_vs_conn_net_set()
- ip_vs_conn_net_eq()

*v3
moved net compare to the end in "fast path"

Signed-off-by: Hans Schillstrom
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Hans Schillstrom
2011-01-13 09:30:28 +0800
b17fc9963 IPVS: netns, ip_vs_stats and its procfs ... Browse Code »

The statistic counter locks for every packet are now removed,
and that statistic is now per CPU, i.e. no locks needed.
However summing is made in ip_vs_est into ip_vs_stats struct
which is moved to ipvs struc.

procfs, ip_vs_stats now have a "per cpu" count and a grand total.
A new function seq_file_single_net() in ip_vs.h created for handling of
single_open_net() since it does not place net ptr in a struct, like others.

/var/lib/lxc # cat /proc/net/ip_vs_stats_percpu
Total Incoming Outgoing Incoming Outgoing
CPU Conns Packets Packets Bytes Bytes
0 0 3 1 9D 34
1 0 1 2 49 70
2 0 1 2 34 76
3 1 2 2 70 74
~ 1 7 7 18A 18E

Conns/s Pkts/s Pkts/s Bytes/s Bytes/s
0 0 0 0 0

*v3
ip_vs_stats reamains as before, instead ip_vs_stats_percpu is added.
u64 seq lock added

*v4
Bug correction inbytes and outbytes as own vars..
per_cpu counter for all stats now as suggested by Julian.

[horms@verge.net.au: removed whitespace-change-only hunk]
Signed-off-by: Hans Schillstrom
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Hans Schillstrom
2011-01-13 09:30:28 +0800
f131315fa IPVS: netns awareness to ip_vs_sync ... Browse Code »

All global variables moved to struct ipvs,
most external changes fixed (i.e. init_net removed)
in sync_buf create + 4 replaced by sizeof(struct..)

Signed-off-by: Hans Schillstrom
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Hans Schillstrom
2011-01-13 09:30:28 +0800
29c2026fd IPVS: netns awareness to ip_vs_est ... Browse Code »

All variables moved to struct ipvs,
most external changes fixed (i.e. init_net removed)

*v3
timer per ns instead of a common timer in estimator.

Signed-off-by: Hans Schillstrom
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Hans Schillstrom
2011-01-13 09:30:28 +0800
ab8a5e840 IPVS: netns awareness to ip_vs_app ... Browse Code »

All variables moved to struct ipvs,
most external changes fixed (i.e. init_net removed)

in ip_vs_protocol param struct net *net added to:
- register_app()
- unregister_app()
This affected almost all proto_xxx.c files

Signed-off-by: Hans Schillstrom
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Hans Schillstrom
2011-01-13 09:30:28 +0800
9d934878e IPVS: netns preparation for proto_sctp ... Browse Code »

In this phase (one), all local vars will be moved to ipvs struct.

Remaining work, add param struct net *net to a couple of
functions that is common for all protos and use ip_vs_proto_data

*v3
Removed unuset function set_state_timeout()

Signed-off-by: Hans Schillstrom
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Hans Schillstrom
2011-01-13 09:30:27 +0800
78b16bde1 IPVS: netns preparation for proto_udp ... Browse Code »

In this phase (one), all local vars will be moved to ipvs struct.

Remaining work, add param struct net *net to a couple of
functions that is common for all protos and use ip_vs_proto_data

*v3
Removed unused function set_state_timeout()

Signed-off-by: Hans Schillstrom
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Hans Schillstrom
2011-01-13 09:30:27 +0800
4a85b96c0 IPVS: netns preparation for proto_tcp ... Browse Code »

In this phase (one), all local vars will be moved to ipvs struct.

Remaining work, add param struct net *net to a couple of
functions that is common for all protos and use all
ip_vs_proto_data

*v3
Removed unused function as sugested by Simon

Signed-off-by: Hans Schillstrom
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Hans Schillstrom
2011-01-13 09:30:27 +0800
252c64103 IPVS: netns, prepare protocol ... Browse Code »

Add support for protocol data per name-space.
in struct ip_vs_protocol, appcnt will be removed when all protos
are modified for network name-space.

This patch causes warnings of unused functions, they will be used
when next patch will be applied.

Signed-off-by: Hans Schillstrom
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Hans Schillstrom
2011-01-13 09:30:27 +0800
b6e885ddb IPVS: netns awarness to lblc sheduler ... Browse Code »

var sysctl_ip_vs_lblc_expiration moved to ipvs struct as
sysctl_lblc_expiration

procfs updated to handle this.

Signed-off-by: Hans Schillstrom
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Hans Schillstrom
2011-01-13 09:30:27 +0800
d0a1eef9c IPVS: netns awarness to lblcr sheduler ... Browse Code »

var sysctl_ip_vs_lblcr_expiration moved to ipvs struct as
sysctl_lblcr_expiration

procfs updated to handle this.

Signed-off-by: Hans Schillstrom
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Hans Schillstrom
2011-01-13 09:30:27 +0800
fc723250c IPVS: netns to services part 1 ... Browse Code »

Services hash tables got netns ptr a hash arg,
While Real Servers (rs) has been moved to ipvs struct.
Two new inline functions added to get net ptr from skb.

Since ip_vs is called from different contexts there is two
places to dig for the net ptr skb->dev or skb->sk
this is handled in skb_net() and skb_sknet()

Global functions, ip_vs_service_get() ip_vs_lookup_real_service()
etc have got struct net *net as first param.
If possible get net ptr skb etc,
- if not &init_net is used at this early stage of patching.

ip_vs_ctl.c procfs not ready for netns yet.

*v3
Comments by Julian
- __ip_vs_service_find and __ip_vs_svc_fwm_find are fast path,
net_eq(svc->net, net) so the check is at the end now.
- net = skb_net(skb) in ip_vs_out moved after check for skb_dst.

Signed-off-by: Hans Schillstrom
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Hans Schillstrom
2011-01-13 09:30:26 +0800
61b1ab458 IPVS: netns, add basic init per netns. ... Browse Code »

Preparation for network name-space init, in this stage
some empty functions exists.

In most files there is a check if it is root ns i.e. init_net
if (!net_eq(net, &init_net))
return ...
this will be removed by the last patch, when enabling name-space.

*v3
ip_vs_conn.c merge error corrected.
net_ipvs #ifdef removed as sugested by Jan Engelhardt

[ horms@verge.net.au: Removed whitespace-change-only hunks ]
Signed-off-by: Hans Schillstrom
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Hans Schillstrom
2011-01-13 09:30:26 +0800

22 Nov, 2010

1 commit

20a95a216 netns: let net_generic take pointer-to-const args ... Browse Code »

This commit is same in nature as v2.6.37-rc1-755-g3654654; the network
namespace itself is not modified when calling net_generic, so the
parameter can be const.

Signed-off-by: Jan Engelhardt
Signed-off-by: David S. Miller

Jan Engelhardt
2010-11-22 02:05:10 +0800

18 Oct, 2010

1 commit

8e602ce29 netns: reorder fields in struct net ... Browse Code »

In a network bench, I noticed an unfortunate false sharing between
'loopback_dev' and 'count' fields in "struct net".

'count' is written each time a socket is created or destroyed, while
loopback_dev might be often read in routing code.

Move loopback_dev in a read mostly section of "struct net"

Note: struct netns_xfrm is cache line aligned on SMP.
(It contains a "struct dst_ops")
Move it at the end to avoid holes, and reduce sizeof(struct net) by 128
bytes on ia32.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-10-18 04:49:14 +0800

11 May, 2010

4 commits

d1db275dd ipv6: ip6mr: support multiple tables ... Browse Code »

This patch adds support for multiple independant multicast routing instances,
named "tables".

Userspace multicast routing daemons can bind to a specific table instance by
issuing a setsockopt call using a new option MRT6_TABLE. The table number is
stored in the raw socket data and affects all following ip6mr setsockopt(),
getsockopt() and ioctl() calls. By default, a single table (RT6_TABLE_DFLT)
is created with a default routing rule pointing to it. Newly created pim6reg
devices have the table number appended ("pim6regX"), with the exception of
devices created in the default table, which are named just "pim6reg" for
compatibility reasons.

Packets are directed to a specific table instance using routing rules,
similar to how regular routing rules work. Currently iif, oif and mark
are supported as keys, source and destination addresses could be supported
additionally.

Example usage:

- bind pimd/xorp/... to a specific table:

uint32_t table = 123;
setsockopt(fd, SOL_IPV6, MRT6_TABLE, &table, sizeof(table));

- create routing rules directing packets to the new table:

# ip -6 mrule add iif eth0 lookup 123
# ip -6 mrule add oif eth0 lookup 123

Signed-off-by: Patrick McHardy

Patrick McHardy
2010-05-11 20:40:55 +0800
6bd521433 ipv6: ip6mr: move mroute data into seperate structure ... Browse Code »

Signed-off-by: Patrick McHardy

Patrick McHardy
2010-05-11 20:40:53 +0800
f30a77842 ipv6: ip6mr: convert struct mfc_cache to struct list_head ... Browse Code »

Signed-off-by: Patrick McHardy

Patrick McHardy
2010-05-11 20:40:51 +0800
c476efbcd ipv6: ip6mr: move unres_queue and timer to per-namespace data ... Browse Code »

The unres_queue is currently shared between all namespaces. Following patches
will additionally allow to create multiple multicast routing tables in each
namespace. Having a single shared queue for all these users seems to excessive,
move the queue and the cleanup timer to the per-namespace data to unshare it.

As a side-effect, this fixes a bug in the seq file iteration functions: the
first entry returned is always from the current namespace, entries returned
after that may belong to any namespace.

Signed-off-by: Patrick McHardy

Patrick McHardy
2010-05-11 20:40:48 +0800

08 May, 2010

1 commit

3ee943728 ipv4: remove ip_rt_secret timer (v4) ... Browse Code »

A while back there was a discussion regarding the rt_secret_interval timer.
Given that we've had the ability to do emergency route cache rebuilds for awhile
now, based on a statistical analysis of the various hash chain lengths in the
cache, the use of the flush timer is somewhat redundant. This patch removes the
rt_secret_interval sysctl, allowing us to rely solely on the statistical
analysis mechanism to determine the need for route cache flushes.

Signed-off-by: Neil Horman
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Neil Horman
2010-05-08 16:57:52 +0800

28 Apr, 2010

1 commit

05fceb4ad net: disallow to use net_assign_generic externally ... Browse Code »

Now there's no need to use this fuction directly because it's handled by
register_pernet_device. So to make this simple and easy to understand,
make this static to do not tempt potentional users.

Signed-off-by: Jiri Pirko
Signed-off-by: David S. Miller

Jiri Pirko
2010-04-28 06:49:02 +0800

14 Apr, 2010

3 commits

f0ad0860d ipv4: ipmr: support multiple tables ... Browse Code »

This patch adds support for multiple independant multicast routing instances,
named "tables".

Userspace multicast routing daemons can bind to a specific table instance by
issuing a setsockopt call using a new option MRT_TABLE. The table number is
stored in the raw socket data and affects all following ipmr setsockopt(),
getsockopt() and ioctl() calls. By default, a single table (RT_TABLE_DEFAULT)
is created with a default routing rule pointing to it. Newly created pimreg
devices have the table number appended ("pimregX"), with the exception of
devices created in the default table, which are named just "pimreg" for
compatibility reasons.

Packets are directed to a specific table instance using routing rules,
similar to how regular routing rules work. Currently iif, oif and mark
are supported as keys, source and destination addresses could be supported
additionally.

Example usage:

- bind pimd/xorp/... to a specific table:

uint32_t table = 123;
setsockopt(fd, IPPROTO_IP, MRT_TABLE, &table, sizeof(table));

- create routing rules directing packets to the new table:

# ip mrule add iif eth0 lookup 123
# ip mrule add oif eth0 lookup 123

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2010-04-14 05:49:34 +0800
0c12295a7 ipv4: ipmr: move mroute data into seperate structure ... Browse Code »

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2010-04-14 05:49:34 +0800
862465f2e ipv4: ipmr: convert struct mfc_cache to struct list_head ... Browse Code »

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2010-04-14 05:49:33 +0800