Eric Lee / smarc-fsl-linux-kernel

24 Sep, 2015

15 commits

ebea1f7c0 ipvs: Pass ipvs not net to ip_vs_sync_net_cleanup ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Eric W. Biederman
2015-09-24 08:34:38 +0800
802cb4370 ipvs: Pass ipvs not net to ip_vs_sync_net_init ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Eric W. Biederman
2015-09-24 08:34:38 +0800
1fc12004d ipvs: Pass ipvs not net to ip_vs_proc_sync_conn ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Eric W. Biederman
2015-09-24 08:34:38 +0800
4f30665ba ipvs: Pass ipvs not net to ip_vs_proc_conn ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Eric W. Biederman
2015-09-24 08:34:38 +0800
b61a8c1a4 ipvs: Pass ipvs not net to ip_vs_sync_conn ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Eric W. Biederman
2015-09-24 08:34:38 +0800
72e9481e2 ipvs: Pass ipvs not net to ip_vs_sync_conn_v0 ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Eric W. Biederman
2015-09-24 08:34:38 +0800
7d537f3ab ipvs: Pass ipvs not net to ip_vs_process_message ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Eric W. Biederman
2015-09-24 08:34:38 +0800
37b68e6de ipvs: Store ipvs not net in struct ip_vs_sync_thread_data ... Browse Code »

In practice struct netns_ipvs is as meaningful as struct net and more
useful as it holds the ipvs specific data. So store a pointer to
struct netns_ipvs.

Update the accesses of tinfo->net to access tinfo->ipvs->net instead.

Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Eric W. Biederman
2015-09-24 08:34:38 +0800
fd124e2f8 ipvs: Pass ipvs not net to make_receive_sock ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Eric W. Biederman
2015-09-24 08:34:37 +0800
68c76b6aa ipvs: Pass ipvs not net to make_send_sock ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Eric W. Biederman
2015-09-24 08:34:37 +0800
b3cf3cbfb ipvs: Pass ipvs not net to stop_sync_thread ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Eric W. Biederman
2015-09-24 08:34:37 +0800
6ac121d71 ipvs: Pass ipvs not net to start_sync_thread ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Eric W. Biederman
2015-09-24 08:34:37 +0800
18d6ade63 ipvs: Pass ipvs not net to ip_vs_proto_data_get ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Eric W. Biederman
2015-09-24 08:34:35 +0800
dc2add6f2 ipvs: Pass ipvs not net to ip_vs_find_dest ... Browse Code »

Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Eric W. Biederman
2015-09-24 08:34:34 +0800
19913dec1 ipvs: Pass ipvs not net to ip_vs_fill_conn ... Browse Code »

ipvs is what is actually desired so change the parameter and the modify
the callers to pass struct netns_ipvs.

Signed-off-by: "Eric W. Biederman"
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Eric W. Biederman
2015-09-24 08:34:33 +0800

22 Aug, 2015

3 commits

d33288172 ipvs: add more mcast parameters for the sync daemon ... Browse Code »

- mcast_group: configure the multicast address, now IPv6
is supported too

- mcast_port: configure the multicast port

- mcast_ttl: configure the multicast TTL/HOP_LIMIT

Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman

Julian Anastasov
2015-08-22 00:10:11 +0800
e4ff67513 ipvs: add sync_maxlen parameter for the sync daemon ... Browse Code »

Allow setups with large MTU to send large sync packets by
adding sync_maxlen parameter. The default value is now based
on MTU but no more than 1500 for compatibility reasons.

To avoid problems if MTU changes allow fragmentation by
sending packets with DF=0. Problem reported by Dan Carpenter.

Reported-by: Dan Carpenter
Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman

Julian Anastasov
2015-08-22 00:10:03 +0800
e0b26cc99 ipvs: call rtnl_lock early ... Browse Code »

When the sync damon is started we need to hold rtnl
lock while calling ip_mc_join_group. Currently, we have
a wrong locking order because the correct one is
rtnl_lock->__ip_vs_mutex. It is implied from the usage
of __ip_vs_mutex in ip_vs_dst_event() which is called
under rtnl lock during NETDEV_* notifications.

Fix the problem by calling rtnl_lock early only for the
start_sync_thread call. As a bonus this fixes the usage
__dev_get_by_name which was not called under rtnl lock.

This patch actually extends and depends on commit 54ff9ef36bdf
("ipv4, ipv6: kill ip_mc_{join, leave}_group and
ipv6_sock_mc_{join, drop}").

Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman

Julian Anastasov
2015-08-22 00:09:09 +0800

14 Jul, 2015

1 commit

56184858d ipvs: fix crash with sync protocol v0 and FTP ... Browse Code »

Fix crash in 3.5+ if FTP is used after switching
sync_version to 0.

Fixes: 749c42b620a9 ("ipvs: reduce sync rate with time thresholds")
Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman

Julian Anastasov
2015-07-14 15:41:27 +0800

11 May, 2015

2 commits

26abe1437 net: Modify sk_alloc to not reference count the netns of kernel sockets. ... Browse Code »

Now that sk_alloc knows when a kernel socket is being allocated modify
it to not reference count the network namespace of kernel sockets.

Keep track of if a socket needs reference counting by adding a flag to
struct sock called sk_net_refcnt.

Update all of the callers of sock_create_kern to stop using
sk_change_net and sk_release_kernel as those hacks are no longer
needed, to avoid reference counting a kernel socket.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-05-11 22:50:18 +0800
eeb1bd5c4 net: Add a struct net parameter to sock_create_kern ... Browse Code »

This is long overdue, and is part of cleaning up how we allocate kernel
sockets that don't reference count struct net.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-05-11 22:50:17 +0800

19 Mar, 2015

1 commit

54ff9ef36 ipv4, ipv6: kill ip_mc_{join, leave}_group and ipv6_sock_mc_{join, drop} ... Browse Code »

in favor of their inner __ ones, which doesn't grab rtnl.

As these functions need to operate on a locked socket, we can't be
grabbing rtnl by then. It's too late and doing so causes reversed
locking.

So this patch:
- move rtnl handling to callers instead while already fixing some
reversed locking situations, like on vxlan and ipvs code.
- renames __ ones to not have the __ mark:
__ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group
__ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop}

Signed-off-by: Marcelo Ricardo Leitner
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Marcelo Ricardo Leitner
2015-03-19 10:05:09 +0800

10 Mar, 2015

1 commit

3cef5c5b0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/cadence/macb.c

Overlapping changes in macb driver, mostly fixes and cleanups
in 'net' overlapping with the integration of at91_ether into
macb in 'net-next'.

Signed-off-by: David S. Miller

David S. Miller
2015-03-10 11:38:02 +0800

25 Feb, 2015

1 commit

d752c3645 ipvs: allow rescheduling of new connections when port reuse is detected ... Browse Code »

Currently, when TCP/SCTP port reusing happens, IPVS will find the old
entry and use it for the new one, behaving like a forced persistence.
But if you consider a cluster with a heavy load of small connections,
such reuse will happen often and may lead to a not optimal load
balancing and might prevent a new node from getting a fair load.

This patch introduces a new sysctl, conn_reuse_mode, that allows
controlling how to proceed when port reuse is detected. The default
value will allow rescheduling of new connections only if the old entry
was in TIME_WAIT state for TCP or CLOSED for SCTP.

Signed-off-by: Marcelo Ricardo Leitner
Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman

Marcelo Ricardo Leitner
2015-02-25 12:46:35 +0800

23 Feb, 2015

1 commit

528c943f3 ipvs: add missing ip_vs_pe_put in sync code ... Browse Code »

ip_vs_conn_fill_param_sync() gets in param.pe a module
reference for persistence engine from __ip_vs_pe_getbyname()
but forgets to put it. Problem occurs in backup for
sync protocol v1 (2.6.39).

Also, pe_data usually comes in sync messages for
connection templates and ip_vs_conn_new() copies
the pointer only in this case. Make sure pe_data
is not leaked if it comes unexpectedly for normal
connections. Leak can happen only if bogus messages
are sent to backup server.

Fixes: fe5e7a1efb66 ("IPVS: Backup, Adding Version 1 receive capability")
Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman

Julian Anastasov
2015-02-23 05:16:36 +0800

20 Nov, 2014

1 commit

982f40513 netfilter: Deletion of unnecessary checks before two function calls ... Browse Code »

The functions free_percpu() and module_put() test whether their argument
is NULL and then return immediately. Thus the test around the call is
not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring
Acked-by: Julian Anastasov
Acked-by: Simon Horman
Signed-off-by: Pablo Neira Ayuso

Markus Elfring
2014-11-20 20:08:43 +0800

16 Sep, 2014

2 commits

ba38528aa ipvs: Supply destination address family to ip_vs_conn_new ... Browse Code »

The assumption that dest af is equal to service af is now unreliable, so we
must specify it manually so as not to copy just the first 4 bytes of a v6
address or doing an illegal read of 16 butes on a v6 address.

We "lie" in two places: for synchronization (which we will explicitly
disallow from happening when we have heterogeneous pools) and for black
hole addresses where there's no real dest.

Signed-off-by: Alex Gartrell
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Alex Gartrell
2014-09-16 08:03:34 +0800
655eef103 ipvs: Supply destination addr family to ip_vs_{lookup_dest,find_dest} ... Browse Code »

We need to remove the assumption that virtual address family is the same as
real address family in order to support heterogeneous services (that is,
services with v4 vips and v6 backends or the opposite).

Signed-off-by: Alex Gartrell
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Alex Gartrell
2014-09-16 08:03:33 +0800

16 Jul, 2014

1 commit

b734427a4 ipvs: remove null test before kfree ... Browse Code »

Fix checkpatch warning:
WARNING: kfree(NULL) is safe this check is probably not required

Signed-off-by: Fabian Frederick
Signed-off-by: Simon Horman

Fabian Frederick
2014-07-16 09:07:01 +0800

27 Dec, 2013

1 commit

9dcbe1b87 ipvs: Remove unused variable ret from sync_thread_master() ... Browse Code »

net/netfilter/ipvs/ip_vs_sync.c: In function 'sync_thread_master':
net/netfilter/ipvs/ip_vs_sync.c:1640:8: warning: unused variable 'ret' [-Wunused-variable]

Commit 35a2af94c7ce7130ca292c68b1d27fcfdb648f6b ("sched/wait: Make the
__wait_event*() interface more friendly") changed how the interruption
state is returned. However, sync_thread_master() ignores this state,
now causing a compile warning.

According to Julian Anastasov , this behavior is OK:

"Yes, your patch looks ok to me. In the past we used ssleep() but IPVS
users were confused why IPVS threads increase the load average. So, we
switched to _interruptible calls and later the socket polling was
added."

Document this, as requested by Peter Zijlstra, to avoid precious developers
disappearing in this pitfall in the future.

Signed-off-by: Geert Uytterhoeven
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Geert Uytterhoeven
2013-12-27 11:19:32 +0800

04 Oct, 2013

1 commit

35a2af94c sched/wait: Make the __wait_event*() interface more friendly ... Browse Code »

Change all __wait_event*() implementations to match the corresponding
wait_event*() signature for convenience.

In particular this does away with the weird 'ret' logic. Since there
are __wait_event*() users this requires we update them too.

Reviewed-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20131002092529.042563462@infradead.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2013-10-04 16:16:25 +0800

26 Jun, 2013

2 commits

4d0c875dc ipvs: add sync_persist_mode flag ... Browse Code »

Add sync_persist_mode flag to reduce sync traffic
by syncing only persistent templates.

Signed-off-by: Julian Anastasov
Tested-by: Aleksey Chudov
Signed-off-by: Simon Horman

Julian Anastasov
2013-06-26 17:01:46 +0800
61e7c420b ipvs: replace the SCTP state machine ... Browse Code »

Convert the SCTP state table, so that it is more readable.
Change the states to be according to the diagram in RFC 2960
and add more states suitable for middle box. Still, such
change in states adds incompatibility if systems in sync
setup include this change and others do not include it.

With this change we also have proper transitions in INPUT-ONLY
mode (DR/TUN) where we see packets only from client. Now
we should not switch to 10-second CLOSED state at a time
when we should stay in ESTABLISHED state.

The short names for states are because we have 16-char space
in ipvsadm and 11-char limit for the connection list format.
It is a sequence of the TCP implementation where the longest
state name is ESTABLISHED.

Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman

Julian Anastasov
2013-06-26 17:01:46 +0800

23 Apr, 2013

1 commit

38561437d ipvs: Use network byte order for sync message size ... Browse Code »

struct ip_vs_sync_mesg and ip_vs_sync_mesg_v0 are both sent across the wire
and used internally to store IPVS synchronisation messages.

Up until now the scheme used has been to convert the size field
to network byte order before sending a message on the wire and
convert it to host byte order when sending a message.

This patch changes that scheme to always treat the field
as being network byte order. This seems appropriate as
the structure is sent across the wire. And by consistently
treating the field has network byte order it is now possible
to take advantage of sparse to flag any future miss-use.

Acked-by: Julian Anastasov
Acked-by: Hans Schillstrom
Signed-off-by: Simon Horman

Simon Horman
2013-04-23 10:43:06 +0800

02 Apr, 2013

3 commits

ac69269a4 ipvs: do not disable bh for long time ... Browse Code »

We used a global BH disable in LOCAL_OUT hook.
Add _bh suffix to all places that need it and remove
the disabling from LOCAL_OUT and sync code.

Functions like ip_defrag need protection from
BH, so add it. As for nf_nat_mangle_tcp_packet, it needs
RCU lock.

Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman

Julian Anastasov
2013-04-02 06:23:58 +0800
413c2d04e ipvs: convert dests to rcu ... Browse Code »

In previous commits the schedulers started to access
svc->destinations with _rcu list traversal primitives
because the IP_VS_WAIT_WHILE macro still plays the role of
grace period. Now it is time to finish the updating part,
i.e. adding and deleting of dests with _rcu suffix before
removing the IP_VS_WAIT_WHILE in next commit.

We use the same rule for conns as for the
schedulers: dests can be searched in RCU read-side critical
section where ip_vs_dest_hold can be called by ip_vs_bind_dest.

Some things are not perfect, for example, calling
functions like ip_vs_lookup_dest from updating code under
RCU, just because we use some function both from reader
and from updater.

Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman

Julian Anastasov
2013-04-02 06:23:57 +0800
fca9c20ae ipvs: add ip_vs_dest_hold and ip_vs_dest_put ... Browse Code »

ip_vs_dest_hold will be used under RCU lock
while ip_vs_dest_put can be called even after dest
is removed from service, as it happens for conns and
some schedulers.

Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman

Julian Anastasov
2013-04-02 06:23:48 +0800

28 Jan, 2013

1 commit

b425df4cd ipvs: freeing uninitialized pointer on error ... Browse Code »

If state != IP_VS_STATE_BACKUP then tinfo->buf is uninitialized. If
kthread_run() fails then it means we free random memory resulting in an
oops.

Signed-off-by: Dan Carpenter
Acked-by: Julian Anastasov
Signed-off-by: Simon Horman

Dan Carpenter
2013-01-28 09:14:37 +0800

09 May, 2012

2 commits

f73181c82 ipvs: add support for sync threads ... Browse Code »

Allow master and backup servers to use many threads
for sync traffic. Add sysctl var "sync_ports" to define the
number of threads. Every thread will use single UDP port,
thread 0 will use the default port 8848 while last thread
will use port 8848+sync_ports-1.

The sync traffic for connections is scheduled to many
master threads based on the cp address but one connection is
always assigned to same thread to avoid reordering of the
sync messages.

Remove ip_vs_sync_switch_mode because this check
for sync mode change is still risky. Instead, check for mode
change under sync_buff_lock.

Make sure the backup socks do not block on reading.

Special thanks to Aleksey Chudov for helping in all tests.

Signed-off-by: Julian Anastasov
Tested-by: Aleksey Chudov
Signed-off-by: Simon Horman

Pablo Neira Ayuso
2012-05-09 01:40:33 +0800
749c42b62 ipvs: reduce sync rate with time thresholds ... Browse Code »

Add two new sysctl vars to control the sync rate with the
main idea to reduce the rate for connection templates because
currently it depends on the packet rate for controlled connections.
This mechanism should be useful also for normal connections
with high traffic.

sync_refresh_period: in seconds, difference in reported connection
timer that triggers new sync message. It can be used to
avoid sync messages for the specified period (or half of
the connection timeout if it is lower) if connection state
is not changed from last sync.

sync_retries: integer, 0..3, defines sync retries with period of
sync_refresh_period/8. Useful to protect against loss of
sync messages.

Allow sysctl_sync_threshold to be used with
sysctl_sync_period=0, so that only single sync message is sent
if sync_refresh_period is also 0.

Add new field "sync_endtime" in connection structure to
hold the reported time when connection expires. The 2 lowest
bits will represent the retry count.

As the sysctl_sync_period now can be 0 use ACCESS_ONCE to
avoid division by zero.

Special thanks to Aleksey Chudov for being patient with me,
for his extensive reports and helping in all tests.

Signed-off-by: Julian Anastasov
Tested-by: Aleksey Chudov
Signed-off-by: Simon Horman

Julian Anastasov
2012-05-09 01:40:10 +0800