24 Sep, 2015
15 commits
-
Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman -
Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman -
Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman -
Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman -
Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman -
Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman -
Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman -
In practice struct netns_ipvs is as meaningful as struct net and more
useful as it holds the ipvs specific data. So store a pointer to
struct netns_ipvs.Update the accesses of tinfo->net to access tinfo->ipvs->net instead.
Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman -
Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman -
Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman -
Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman -
Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman -
Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman -
Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman -
ipvs is what is actually desired so change the parameter and the modify
the callers to pass struct netns_ipvs.Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman
22 Aug, 2015
3 commits
-
- mcast_group: configure the multicast address, now IPv6
is supported too- mcast_port: configure the multicast port
- mcast_ttl: configure the multicast TTL/HOP_LIMIT
Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman -
Allow setups with large MTU to send large sync packets by
adding sync_maxlen parameter. The default value is now based
on MTU but no more than 1500 for compatibility reasons.To avoid problems if MTU changes allow fragmentation by
sending packets with DF=0. Problem reported by Dan Carpenter.Reported-by: Dan Carpenter
Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman -
When the sync damon is started we need to hold rtnl
lock while calling ip_mc_join_group. Currently, we have
a wrong locking order because the correct one is
rtnl_lock->__ip_vs_mutex. It is implied from the usage
of __ip_vs_mutex in ip_vs_dst_event() which is called
under rtnl lock during NETDEV_* notifications.Fix the problem by calling rtnl_lock early only for the
start_sync_thread call. As a bonus this fixes the usage
__dev_get_by_name which was not called under rtnl lock.This patch actually extends and depends on commit 54ff9ef36bdf
("ipv4, ipv6: kill ip_mc_{join, leave}_group and
ipv6_sock_mc_{join, drop}").Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman
14 Jul, 2015
1 commit
-
Fix crash in 3.5+ if FTP is used after switching
sync_version to 0.Fixes: 749c42b620a9 ("ipvs: reduce sync rate with time thresholds")
Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman
11 May, 2015
2 commits
-
Now that sk_alloc knows when a kernel socket is being allocated modify
it to not reference count the network namespace of kernel sockets.Keep track of if a socket needs reference counting by adding a flag to
struct sock called sk_net_refcnt.Update all of the callers of sock_create_kern to stop using
sk_change_net and sk_release_kernel as those hacks are no longer
needed, to avoid reference counting a kernel socket.Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller -
This is long overdue, and is part of cleaning up how we allocate kernel
sockets that don't reference count struct net.Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller
19 Mar, 2015
1 commit
-
in favor of their inner __ ones, which doesn't grab rtnl.
As these functions need to operate on a locked socket, we can't be
grabbing rtnl by then. It's too late and doing so causes reversed
locking.So this patch:
- move rtnl handling to callers instead while already fixing some
reversed locking situations, like on vxlan and ipvs code.
- renames __ ones to not have the __ mark:
__ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group
__ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop}Signed-off-by: Marcelo Ricardo Leitner
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller
10 Mar, 2015
1 commit
-
Conflicts:
drivers/net/ethernet/cadence/macb.cOverlapping changes in macb driver, mostly fixes and cleanups
in 'net' overlapping with the integration of at91_ether into
macb in 'net-next'.Signed-off-by: David S. Miller
25 Feb, 2015
1 commit
-
Currently, when TCP/SCTP port reusing happens, IPVS will find the old
entry and use it for the new one, behaving like a forced persistence.
But if you consider a cluster with a heavy load of small connections,
such reuse will happen often and may lead to a not optimal load
balancing and might prevent a new node from getting a fair load.This patch introduces a new sysctl, conn_reuse_mode, that allows
controlling how to proceed when port reuse is detected. The default
value will allow rescheduling of new connections only if the old entry
was in TIME_WAIT state for TCP or CLOSED for SCTP.Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman
23 Feb, 2015
1 commit
-
ip_vs_conn_fill_param_sync() gets in param.pe a module
reference for persistence engine from __ip_vs_pe_getbyname()
but forgets to put it. Problem occurs in backup for
sync protocol v1 (2.6.39).Also, pe_data usually comes in sync messages for
connection templates and ip_vs_conn_new() copies
the pointer only in this case. Make sure pe_data
is not leaked if it comes unexpectedly for normal
connections. Leak can happen only if bogus messages
are sent to backup server.Fixes: fe5e7a1efb66 ("IPVS: Backup, Adding Version 1 receive capability")
Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman
20 Nov, 2014
1 commit
-
The functions free_percpu() and module_put() test whether their argument
is NULL and then return immediately. Thus the test around the call is
not needed.This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring
Acked-by: Julian Anastasov
Acked-by: Simon Horman
Signed-off-by: Pablo Neira Ayuso
16 Sep, 2014
2 commits
-
The assumption that dest af is equal to service af is now unreliable, so we
must specify it manually so as not to copy just the first 4 bytes of a v6
address or doing an illegal read of 16 butes on a v6 address.We "lie" in two places: for synchronization (which we will explicitly
disallow from happening when we have heterogeneous pools) and for black
hole addresses where there's no real dest.Signed-off-by: Alex Gartrell
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman -
We need to remove the assumption that virtual address family is the same as
real address family in order to support heterogeneous services (that is,
services with v4 vips and v6 backends or the opposite).Signed-off-by: Alex Gartrell
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman
16 Jul, 2014
1 commit
-
Fix checkpatch warning:
WARNING: kfree(NULL) is safe this check is probably not requiredSigned-off-by: Fabian Frederick
Signed-off-by: Simon Horman
27 Dec, 2013
1 commit
-
net/netfilter/ipvs/ip_vs_sync.c: In function 'sync_thread_master':
net/netfilter/ipvs/ip_vs_sync.c:1640:8: warning: unused variable 'ret' [-Wunused-variable]Commit 35a2af94c7ce7130ca292c68b1d27fcfdb648f6b ("sched/wait: Make the
__wait_event*() interface more friendly") changed how the interruption
state is returned. However, sync_thread_master() ignores this state,
now causing a compile warning.According to Julian Anastasov , this behavior is OK:
"Yes, your patch looks ok to me. In the past we used ssleep() but IPVS
users were confused why IPVS threads increase the load average. So, we
switched to _interruptible calls and later the socket polling was
added."Document this, as requested by Peter Zijlstra, to avoid precious developers
disappearing in this pitfall in the future.Signed-off-by: Geert Uytterhoeven
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman
04 Oct, 2013
1 commit
-
Change all __wait_event*() implementations to match the corresponding
wait_event*() signature for convenience.In particular this does away with the weird 'ret' logic. Since there
are __wait_event*() users this requires we update them too.Reviewed-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20131002092529.042563462@infradead.org
Signed-off-by: Ingo Molnar
26 Jun, 2013
2 commits
-
Add sync_persist_mode flag to reduce sync traffic
by syncing only persistent templates.Signed-off-by: Julian Anastasov
Tested-by: Aleksey Chudov
Signed-off-by: Simon Horman -
Convert the SCTP state table, so that it is more readable.
Change the states to be according to the diagram in RFC 2960
and add more states suitable for middle box. Still, such
change in states adds incompatibility if systems in sync
setup include this change and others do not include it.With this change we also have proper transitions in INPUT-ONLY
mode (DR/TUN) where we see packets only from client. Now
we should not switch to 10-second CLOSED state at a time
when we should stay in ESTABLISHED state.The short names for states are because we have 16-char space
in ipvsadm and 11-char limit for the connection list format.
It is a sequence of the TCP implementation where the longest
state name is ESTABLISHED.Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman
23 Apr, 2013
1 commit
-
struct ip_vs_sync_mesg and ip_vs_sync_mesg_v0 are both sent across the wire
and used internally to store IPVS synchronisation messages.Up until now the scheme used has been to convert the size field
to network byte order before sending a message on the wire and
convert it to host byte order when sending a message.This patch changes that scheme to always treat the field
as being network byte order. This seems appropriate as
the structure is sent across the wire. And by consistently
treating the field has network byte order it is now possible
to take advantage of sparse to flag any future miss-use.Acked-by: Julian Anastasov
Acked-by: Hans Schillstrom
Signed-off-by: Simon Horman
02 Apr, 2013
3 commits
-
We used a global BH disable in LOCAL_OUT hook.
Add _bh suffix to all places that need it and remove
the disabling from LOCAL_OUT and sync code.Functions like ip_defrag need protection from
BH, so add it. As for nf_nat_mangle_tcp_packet, it needs
RCU lock.Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman -
In previous commits the schedulers started to access
svc->destinations with _rcu list traversal primitives
because the IP_VS_WAIT_WHILE macro still plays the role of
grace period. Now it is time to finish the updating part,
i.e. adding and deleting of dests with _rcu suffix before
removing the IP_VS_WAIT_WHILE in next commit.We use the same rule for conns as for the
schedulers: dests can be searched in RCU read-side critical
section where ip_vs_dest_hold can be called by ip_vs_bind_dest.Some things are not perfect, for example, calling
functions like ip_vs_lookup_dest from updating code under
RCU, just because we use some function both from reader
and from updater.Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman -
ip_vs_dest_hold will be used under RCU lock
while ip_vs_dest_put can be called even after dest
is removed from service, as it happens for conns and
some schedulers.Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman
28 Jan, 2013
1 commit
-
If state != IP_VS_STATE_BACKUP then tinfo->buf is uninitialized. If
kthread_run() fails then it means we free random memory resulting in an
oops.Signed-off-by: Dan Carpenter
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman
09 May, 2012
2 commits
-
Allow master and backup servers to use many threads
for sync traffic. Add sysctl var "sync_ports" to define the
number of threads. Every thread will use single UDP port,
thread 0 will use the default port 8848 while last thread
will use port 8848+sync_ports-1.The sync traffic for connections is scheduled to many
master threads based on the cp address but one connection is
always assigned to same thread to avoid reordering of the
sync messages.Remove ip_vs_sync_switch_mode because this check
for sync mode change is still risky. Instead, check for mode
change under sync_buff_lock.Make sure the backup socks do not block on reading.
Special thanks to Aleksey Chudov for helping in all tests.
Signed-off-by: Julian Anastasov
Tested-by: Aleksey Chudov
Signed-off-by: Simon Horman -
Add two new sysctl vars to control the sync rate with the
main idea to reduce the rate for connection templates because
currently it depends on the packet rate for controlled connections.
This mechanism should be useful also for normal connections
with high traffic.sync_refresh_period: in seconds, difference in reported connection
timer that triggers new sync message. It can be used to
avoid sync messages for the specified period (or half of
the connection timeout if it is lower) if connection state
is not changed from last sync.sync_retries: integer, 0..3, defines sync retries with period of
sync_refresh_period/8. Useful to protect against loss of
sync messages.Allow sysctl_sync_threshold to be used with
sysctl_sync_period=0, so that only single sync message is sent
if sync_refresh_period is also 0.Add new field "sync_endtime" in connection structure to
hold the reported time when connection expires. The 2 lowest
bits will represent the retry count.As the sysctl_sync_period now can be 0 use ACCESS_ONCE to
avoid division by zero.Special thanks to Aleksey Chudov for being patient with me,
for his extensive reports and helping in all tests.Signed-off-by: Julian Anastasov
Tested-by: Aleksey Chudov
Signed-off-by: Simon Horman