Eric Lee / smarc-fsl-linux-kernel

18 Oct, 2018

1 commit

49984ca4e tipc: fix flow control accounting for implicit connect ... Browse Code »

[ Upstream commit 92ef12b32feab8f277b69e9fb89ede2796777f4d ]

In the case of implicit connect message with data > 1K, the flow
control accounting is incorrect. At this state, the socket does not
know the peer nodes capability and falls back to legacy flow control
by return 1, however the receiver of this message will perform the
new block accounting. This leads to a slack and eventually traffic
disturbance.

In this commit, we perform tipc_node_get_capabilities() at implicit
connect and perform accounting based on the peer's capability.

Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Parthasarathy Bhuvaragan
2018-10-18 15:16:19 +0800

15 Sep, 2018

1 commit

8fed734df tipc: fix a missing rhashtable_walk_exit() ... Browse Code »

[ Upstream commit bd583fe30427500a2d0abe25724025b1cb5e2636 ]

rhashtable_walk_exit() must be paired with rhashtable_walk_enter().

Fixes: 40f9f4397060 ("tipc: Fix tipc_sk_reinit race conditions")
Cc: Herbert Xu
Cc: Ying Xue
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Cong Wang
2018-09-15 15:45:25 +0800

03 Jan, 2018

1 commit

92ae82334 tipc: fix hanging poll() for stream sockets ... Browse Code »

[ Upstream commit 517d7c79bdb39864e617960504bdc1aa560c75c6 ]

In commit 42b531de17d2f6 ("tipc: Fix missing connection request
handling"), we replaced unconditional wakeup() with condtional
wakeup for clients with flags POLLIN | POLLRDNORM | POLLRDBAND.

This breaks the applications which do a connect followed by poll
with POLLOUT flag. These applications are not woken when the
connection is ESTABLISHED and hence sleep forever.

In this commit, we fix it by including the POLLOUT event for
sockets in TIPC_CONNECTING state.

Fixes: 42b531de17d2f6 ("tipc: Fix missing connection request handling")
Acked-by: Jon Maloy
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Parthasarathy Bhuvaragan
2018-01-03 03:31:11 +0800

25 Aug, 2017

1 commit

6c7e983b2 tipc: Fix tipc_sk_reinit handling of -EAGAIN ... Browse Code »

In 9dbbfb0ab6680c6a85609041011484e6658e7d3c function tipc_sk_reinit
had additional logic added to loop in the event that function
rhashtable_walk_next() returned -EAGAIN. No worries.

However, if rhashtable_walk_start returns -EAGAIN, it does "continue",
and therefore skips the call to rhashtable_walk_stop(). That has
the effect of calling rcu_read_lock() without its paired call to
rcu_read_unlock(). Since rcu_read_lock() may be nested, the problem
may not be apparent for a while, especially since resize events may
be rare. But the comments to rhashtable_walk_start() state:

* ...Note that we take the RCU lock in all
* cases including when we return an error. So you must always call
* rhashtable_walk_stop to clean up.

This patch replaces the continue with a goto and label to ensure a
matching call to rhashtable_walk_stop().

Signed-off-by: Bob Peterson
Acked-by: Herbert Xu
Signed-off-by: David S. Miller

Bob Peterson
2017-08-25 05:02:26 +0800

01 Jul, 2017

1 commit

41c6d650f net: convert sock.sk_refcnt from atomic_t to refcount_t ... Browse Code »

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

This patch uses refcount_inc_not_zero() instead of
atomic_inc_not_zero_hint() due to absense of a _hint()
version of refcount API. If the hint() version must
be used, we might need to revisit API.

Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller

Reshetova, Elena
2017-07-01 22:39:08 +0800

12 May, 2017

1 commit

844cf763f tipc: make macro tipc_wait_for_cond() smp safe ... Browse Code »

The macro tipc_wait_for_cond() is embedding the macro sk_wait_event()
to fulfil its task. The latter, in turn, is evaluating the stated
condition outside the socket lock context. This is problematic if
the condition is accessing non-trivial data structures which may be
altered by incoming interrupts, as is the case with the cong_links()
linked list, used by socket to keep track of the current set of
congested links. We sometimes see crashes when this list is accessed
by a condition function at the same time as a SOCK_WAKEUP interrupt
is removing an element from the list.

We fix this by expanding selected parts of sk_wait_event() into the
outer macro, while ensuring that all evaluations of a given condition
are performed under socket lock protection.

Fixes: commit 365ad353c256 ("tipc: reduce risk of user starvation during link congestion")
Reviewed-by: Parthasarathy Bhuvaragan
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2017-05-12 10:19:30 +0800

03 May, 2017

3 commits

8d65b08de Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Millar:
"Here are some highlights from the 2065 networking commits that
happened this development cycle:

1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)

2) Add a generic XDP driver, so that anyone can test XDP even if they
lack a networking device whose driver has explicit XDP support
(me).

3) Sparc64 now has an eBPF JIT too (me)

4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
Starovoitov)

5) Make netfitler network namespace teardown less expensive (Florian
Westphal)

6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)

7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)

8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)

9) Multiqueue support in stmmac driver (Joao Pinto)

10) Remove TCP timewait recycling, it never really could possibly work
well in the real world and timestamp randomization really zaps any
hint of usability this feature had (Soheil Hassas Yeganeh)

11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
Aleksandrov)

12) Add socket busy poll support to epoll (Sridhar Samudrala)

13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
and several others)

14) IPSEC hw offload infrastructure (Steffen Klassert)"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
tipc: refactor function tipc_sk_recv_stream()
tipc: refactor function tipc_sk_recvmsg()
net: thunderx: Optimize page recycling for XDP
net: thunderx: Support for XDP header adjustment
net: thunderx: Add support for XDP_TX
net: thunderx: Add support for XDP_DROP
net: thunderx: Add basic XDP support
net: thunderx: Cleanup receive buffer allocation
net: thunderx: Optimize CQE_TX handling
net: thunderx: Optimize RBDR descriptor handling
net: thunderx: Support for page recycling
ipx: call ipxitf_put() in ioctl error path
net: sched: add helpers to handle extended actions
qed*: Fix issues in the ptp filter config implementation.
qede: Fix concurrency issue in PTP Tx path processing.
stmmac: Add support for SIMATIC IOT2000 platform
net: hns: fix ethtool_get_strings overflow in hns driver
tcp: fix wraparound issue in tcp_lp
bpf, arm64: fix jit branch offset related to ldimm64
bpf, arm64: implement jiting of BPF_XADD
...

Linus Torvalds
2017-05-03 07:40:27 +0800
ec8a09fbb tipc: refactor function tipc_sk_recv_stream() ... Browse Code »

We try to make this function more readable by improving variable names
and comments, using more stack variables, and doing some smaller changes
to the logics. We also rename the function to make it consistent with
naming conventions used elsewhere in the code.

Reviewed-by: Parthasarathy Bhuvaragan
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2017-05-03 03:56:54 +0800
e9f8b1010 tipc: refactor function tipc_sk_recvmsg() ... Browse Code »

We try to make this function more readable by improving variable names
and comments, plus some minor changes to the logics.

Reviewed-by: Parthasarathy Bhuvaragan
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2017-05-03 03:56:54 +0800

29 Apr, 2017

3 commits

c1be77562 tipc: close the connection if protocol messages contain errors ... Browse Code »

When a socket is shutting down, we notify the peer node about the
connection termination by reusing an incoming message if possible.
If the last received message was a connection acknowledgment
message, we reverse this message and set the error code to
TIPC_ERR_NO_PORT and send it to peer.

In tipc_sk_proto_rcv(), we never check for message errors while
processing the connection acknowledgment or probe messages. Thus
this message performs the usual flow control accounting and leaves
the session hanging.

In this commit, we terminate the connection when we receive such
error messages.

Signed-off-by: Parthasarathy Bhuvaragan
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Parthasarathy Bhuvaragan
2017-04-29 00:20:42 +0800
4e0df4951 tipc: improve error validations for sockets in CONNECTING state ... Browse Code »

Until now, the checks for sockets in CONNECTING state was based on
the assumption that the incoming message was always from the
peer's accepted data socket.

However an application using a non-blocking socket sends an implicit
connect, this socket which is in CONNECTING state can receive error
messages from the peer's listening socket. As we discard these
messages, the application socket hangs as there due to inactivity.
In addition to this, there are other places where we process errors
but do not notify the user.

In this commit, we process such incoming error messages and notify
our users about them using sk_state_change().

Signed-off-by: Parthasarathy Bhuvaragan
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Parthasarathy Bhuvaragan
2017-04-29 00:20:42 +0800
42b531de1 tipc: Fix missing connection request handling ... Browse Code »

In filter_connect, we use waitqueue_active() to check for any
connections to wakeup. But waitqueue_active() is missing memory
barriers while accessing the critical sections, leading to
inconsistent results.

In this commit, we replace this with an SMP safe wq_has_sleeper()
using the generic socket callback sk_data_ready().

Signed-off-by: Parthasarathy Bhuvaragan
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Parthasarathy Bhuvaragan
2017-04-29 00:20:42 +0800

27 Apr, 2017

1 commit

b1513c353 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2017-04-27 10:39:08 +0800

25 Apr, 2017

2 commits

05ff83789 tipc: fix socket flow control accounting error at tipc_recv_stream ... Browse Code »

Until now in tipc_recv_stream(), we update the received
unacknowledged bytes based on a stack variable and not based on the
actual message size.
If the user buffer passed at tipc_recv_stream() is smaller than the
received skb, the size variable in stack differs from the actual
message size in the skb. This leads to a flow control accounting
error causing permanent congestion.

In this commit, we fix this accounting error by always using the
size of the incoming message.

Fixes: 10724cc7bb78 ("tipc: redesign connection-level flow control")
Signed-off-by: Parthasarathy Bhuvaragan
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Parthasarathy Bhuvaragan
2017-04-25 23:45:38 +0800
3364d61c9 tipc: fix socket flow control accounting error at tipc_send_stream ... Browse Code »

Until now in tipc_send_stream(), we return -1 when the socket
encounters link congestion even if the socket had successfully
sent partial data. This is incorrect as the application resends
the same the partial data leading to data corruption at
receiver's end.

In this commit, we return the partially sent bytes as the return
value at link congestion.

Fixes: 10724cc7bb78 ("tipc: redesign connection-level flow control")
Signed-off-by: Parthasarathy Bhuvaragan
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Parthasarathy Bhuvaragan
2017-04-25 23:45:37 +0800

14 Apr, 2017

1 commit

fceb6435e netlink: pass extended ACK struct to parsing functions ... Browse Code »

Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)

Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller

Johannes Berg
2017-04-14 01:58:22 +0800

30 Mar, 2017

2 commits

66bc1e8d5 tipc: allow rdm/dgram socketpairs ... Browse Code »

for socketpairs using connectionless transport, we cache
the respective node local TIPC portid to use in subsequent
calls to send() in the socket's private data.

Signed-off-by: Erik Hugne
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller

Erik Hugne
2017-03-30 05:10:11 +0800
70b03759e tipc: add support for stream/seqpacket socketpairs ... Browse Code »

sockets A and B are connected back-to-back, similar to what
AF_UNIX does.

Signed-off-by: Erik Hugne
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller

Erik Hugne
2017-03-30 05:10:11 +0800

10 Mar, 2017

1 commit

cdfbabfb2 net: Work around lockdep limitation in sockets that use sockets ... Browse Code »

Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

The theory lockdep comes up with is as follows:

(1) If the pagefault handler decides it needs to read pages from AFS, it
calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
creating a call requires the socket lock:

mmap_sem must be taken before sk_lock-AF_RXRPC

(2) afs_open_socket() opens an AF_RXRPC socket and binds it. rxrpc_bind()
binds the underlying UDP socket whilst holding its socket lock.
inet_bind() takes its own socket lock:

sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

(3) Reading from a TCP socket into a userspace buffer might cause a fault
and thus cause the kernel to take the mmap_sem, but the TCP socket is
locked whilst doing this:

sk_lock-AF_INET must be taken before mmap_sem

However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks. The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace. This is
a limitation in the design of lockdep.

Fix the general case by:

(1) Double up all the locking keys used in sockets so that one set are
used if the socket is created by userspace and the other set is used
if the socket is created by the kernel.

(2) Store the kern parameter passed to sk_alloc() in a variable in the
sock struct (sk_kern_sock). This informs sock_lock_init(),
sock_init_data() and sk_clone_lock() as to the lock keys to be used.

Note that the child created by sk_clone_lock() inherits the parent's
kern setting.

(3) Add a 'kern' parameter to ->accept() that is analogous to the one
passed in to ->create() that distinguishes whether kernel_accept() or
sys_accept4() was the caller and can be passed to sk_alloc().

Note that a lot of accept functions merely dequeue an already
allocated socket. I haven't touched these as the new socket already
exists before we get the parameter.

Note also that there are a couple of places where I've made the accepted
socket unconditionally kernel-based:

irda_accept()
rds_rcp_accept_one()
tcp_accept_from_sock()

because they follow a sock_create_kern() and accept off of that.

Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal. I wonder if these should do that so
that they use the new set of lock keys.

Signed-off-by: David Howells
Signed-off-by: David S. Miller

David Howells
2017-03-10 10:23:27 +0800

02 Mar, 2017

1 commit

174cd4b1e sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sc… ... Browse Code »

…hed.h> into <linux/sched/signal.h>

Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2017-03-02 15:42:32 +0800

18 Feb, 2017

1 commit

40f9f4397 tipc: Fix tipc_sk_reinit race conditions ... Browse Code »

There are two problems with the function tipc_sk_reinit. Firstly
it's doing a manual walk over an rhashtable. This is broken as
an rhashtable can be resized and if you manually walk over it
during a resize then you may miss entries.

Secondly it's missing memory barriers as previously the code used
spinlocks which provide the barriers implicitly.

This patch fixes both problems.

Fixes: 07f6c4bc048a ("tipc: convert tipc reference table to...")
Signed-off-by: Herbert Xu
Acked-by: Ying Xue
Signed-off-by: David S. Miller

Herbert Xu
2017-02-18 01:28:35 +0800

26 Jan, 2017

1 commit

a08ef4768 tipc: uninitialized return code in tipc_setsockopt() ... Browse Code »

We shuffled some code around and added some new case statements here and
now "res" isn't initialized on all paths.

Fixes: 01fd12bb189a ("tipc: make replicast a user selectable option")
Signed-off-by: Dan Carpenter
Signed-off-by: David S. Miller

Dan Carpenter
2017-01-26 01:41:34 +0800

21 Jan, 2017

2 commits

01fd12bb1 tipc: make replicast a user selectable option ... Browse Code »

If the bearer carrying multicast messages supports broadcast, those
messages will be sent to all cluster nodes, irrespective of whether
these nodes host any actual destinations socket or not. This is clearly
wasteful if the cluster is large and there are only a few real
destinations for the message being sent.

In this commit we extend the eligibility of the newly introduced
"replicast" transmit option. We now make it possible for a user to
select which method he wants to be used, either as a mandatory setting
via setsockopt(), or as a relative setting where we let the broadcast
layer decide which method to use based on the ratio between cluster
size and the message's actual number of destination nodes.

In the latter case, a sending socket must stick to a previously
selected method until it enters an idle period of at least 5 seconds.
This eliminates the risk of message reordering caused by method change,
i.e., when changes to cluster size or number of destinations would
otherwise mandate a new method to be used.

Reviewed-by: Parthasarathy Bhuvaragan
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2017-01-21 01:10:17 +0800
a853e4c6d tipc: introduce replicast as transport option for multicast ... Browse Code »

TIPC multicast messages are currently carried over a reliable
'broadcast link', making use of the underlying media's ability to
transport packets as L2 broadcast or IP multicast to all nodes in
the cluster.

When the used bearer is lacking that ability, we can instead emulate
the broadcast service by replicating and sending the packets over as
many unicast links as needed to reach all identified destinations.
We now introduce a new TIPC link-level 'replicast' service that does
this.

Reviewed-by: Parthasarathy Bhuvaragan
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2017-01-21 01:10:17 +0800

04 Jan, 2017

3 commits

365ad353c tipc: reduce risk of user starvation during link congestion ... Browse Code »

The socket code currently handles link congestion by either blocking
and trying to send again when the congestion has abated, or just
returning to the user with -EAGAIN and let him re-try later.

This mechanism is prone to starvation, because the wakeup algorithm is
non-atomic. During the time the link issues a wakeup signal, until the
socket wakes up and re-attempts sending, other senders may have come
in between and occupied the free buffer space in the link. This in turn
may lead to a socket having to make many send attempts before it is
successful. In extremely loaded systems we have observed latency times
of several seconds before a low-priority socket is able to send out a
message.

In this commit, we simplify this mechanism and reduce the risk of the
described scenario happening. When a message is attempted sent via a
congested link, we now let it be added to the link's backlog queue
anyway, thus permitting an oversubscription of one message per source
socket. We still create a wakeup item and return an error code, hence
instructing the sender to block or stop sending. Only when enough space
has been freed up in the link's backlog queue do we issue a wakeup event
that allows the sender to continue with the next message, if any.

The fact that a socket now can consider a message sent even when the
link returns a congestion code means that the sending socket code can
be simplified. Also, since this is a good opportunity to get rid of the
obsolete 'mtu change' condition in the three socket send functions, we
now choose to refactor those functions completely.

Signed-off-by: Parthasarathy Bhuvaragan
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2017-01-04 00:13:05 +0800
4d8642d89 tipc: modify struct tipc_plist to be more versatile ... Browse Code »

During multicast reception we currently use a simple linked list with
push/pop semantics to store port numbers.

We now see a need for a more generic list for storing values of type
u32. We therefore make some modifications to this list, while replacing
the prefix 'tipc_plist_' with 'u32_'. We also add a couple of new
functions which will come to use in the next commits.

Acked-by: Parthasarathy Bhuvaragan
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2017-01-04 00:13:05 +0800
8c44e1af1 tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() functions ... Browse Code »

The functions tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() are very
similar. The latter function is also called from two locations, and
there will be more in the coming commits, which will all need to test on
different conditions.

Instead of making yet another duplicates of the function, we now
introduce a new macro tipc_wait_for_cond() where the wakeup condition
can be stated as an argument to the call. This macro replaces all
current and future uses of the two functions, which can now be
eliminated.

Acked-by: Parthasarathy Bhuvaragan
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2017-01-04 00:13:05 +0800

24 Dec, 2016

1 commit

693c56491 tipc: don't send FIN message from connectionless socket ... Browse Code »

In commit 6f00089c7372 ("tipc: remove SS_DISCONNECTING state") the
check for socket type is in the wrong place, causing a closing socket
to always send out a FIN message even when the socket was never
connected. This is normally harmless, since the destination node for
such messages most often is zero, and the message will be dropped, but
it is still a wrong and confusing behavior.

We fix this in this commit.

Reviewed-by: Parthasarathy Bhuvaragan
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2016-12-24 06:53:47 +0800

27 Nov, 2016

1 commit

0b42f25d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

udplite conflict is resolved by taking what 'net-next' did
which removed the backlog receive method assignment, since
it is no longer necessary.

Two entries were added to the non-priv ethtool operations
switch statement, one in 'net' and one in 'net-next, so
simple overlapping changes.

Signed-off-by: David S. Miller

David S. Miller
2016-11-27 12:42:21 +0800

26 Nov, 2016

1 commit

6998cc6ec tipc: resolve connection flow control compatibility problem ... Browse Code »

In commit 10724cc7bb78 ("tipc: redesign connection-level flow control")
we replaced the previous message based flow control with one based on
1k blocks. In order to ensure backwards compatibility the mechanism
falls back to using message as base unit when it senses that the peer
doesn't support the new algorithm. The default flow control window,
i.e., how many units can be sent before the sender blocks and waits
for an acknowledge (aka advertisement) is 512. This was tested against
the previous version, which uses an acknowledge frequency of on ack per
256 received message, and found to work fine.

However, we missed the fact that versions older than Linux 3.15 use an
acknowledge frequency of 512, which is exactly the limit where a 4.6+
sender will stop and wait for acknowledge. This would also work fine if
it weren't for the fact that if the first sent message on a 4.6+ server
side is an empty SYNACK, this one is also is counted as a sent message,
while it is not counted as a received message on a legacy 3.15-receiver.
This leads to the sender always being one step ahead of the receiver, a
scenario causing the sender to block after 512 sent messages, while the
receiver only has registered 511 read messages. Hence, the legacy
receiver is not trigged to send an acknowledge, with a permanently
blocked sender as result.

We solve this deadlock by simply allowing the sender to send one more
message before it blocks, i.e., by a making minimal change to the
condition used for determining connection congestion.

Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2016-11-26 10:38:16 +0800

23 Nov, 2016

1 commit

f9aa9dc7d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

All conflicts were simple overlapping changes except perhaps
for the Thunder driver.

That driver has a change_mtu method explicitly for sending
a message to the hardware. If that fails it returns an
error.

Normally a driver doesn't need an ndo_change_mtu method becuase those
are usually just range changes, which are now handled generically.
But since this extra operation is needed in the Thunder driver, it has
to stay.

However, if the message send fails we have to restore the original
MTU before the change because the entire call chain expects that if
an error is thrown by ndo_change_mtu then the MTU did not change.
Therefore code is added to nicvf_change_mtu to remember the original
MTU, and to restore it upon nicvf_update_hw_max_frs() failue.

Signed-off-by: David S. Miller

David S. Miller
2016-11-23 02:27:16 +0800

20 Nov, 2016

1 commit

51b9a31c4 tipc: eliminate obsolete socket locking policy description ... Browse Code »

The comment block in socket.c describing the locking policy is
obsolete, and does not reflect current reality. We remove it in this
commit.

Since the current locking policy is much simpler and follows a
mainstream approach, we see no need to add a new description.

Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2016-11-20 11:15:41 +0800

15 Nov, 2016

1 commit

d9dc8b0f8 net: fix sleeping for sk_wait_event() ... Browse Code »

Similar to commit 14135f30e33c ("inet: fix sleeping inside inet_wait_for_connect()"),
sk_wait_event() needs to fix too, because release_sock() is blocking,
it changes the process state back to running after sleep, which breaks
the previous prepare_to_wait().

Switch to the new wait API.

Cc: Eric Dumazet
Cc: Peter Zijlstra
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

WANG Cong
2016-11-15 02:17:21 +0800

01 Nov, 2016

7 commits

f40acbaf4 tipc: remove SS_CONNECTED sock state ... Browse Code »

In this commit, we replace references to sock->state SS_CONNECTE
with sk_state TIPC_ESTABLISHED.

Finally, the sock->state is no longer explicitly used by tipc.
The FSM below is for various types of connection oriented sockets.

Stream Server Listening Socket:
+-----------+ +-------------+
| TIPC_OPEN |------>| TIPC_LISTEN |
+-----------+ +-------------+

Stream Server Data Socket:
+-----------+ +------------------+
| TIPC_OPEN |------>| TIPC_ESTABLISHED |
+-----------+ +------------------+
^ |
| |
| v
+--------------------+
| TIPC_DISCONNECTING |
+--------------------+

Stream Socket Client:
+-----------+ +-----------------+
| TIPC_OPEN |------>| TIPC_CONNECTING |------+
+-----------+ +-----------------+ |
| |
| |
v |
+------------------+ |
| TIPC_ESTABLISHED | |
+------------------+ |
^ | |
| | |
| v |
+--------------------+ |
| TIPC_DISCONNECTING |
Signed-off-by: David S. Miller

Parthasarathy Bhuvaragan
2016-11-01 23:53:26 +0800
99a208898 tipc: create TIPC_CONNECTING as a new sk_state ... Browse Code »

In this commit, we create a new tipc socket state TIPC_CONNECTING
by primarily replacing the SS_CONNECTING with TIPC_CONNECTING.

There is no functional change in this commit.

Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller

Parthasarathy Bhuvaragan
2016-11-01 23:53:26 +0800
6f00089c7 tipc: remove SS_DISCONNECTING state ... Browse Code »

In this commit, we replace the references to SS_DISCONNECTING with
the combination of sk_state TIPC_DISCONNECTING and flags set in
sk_shutdown.
We introduce a new function _tipc_shutdown(), which provides
the common code required by tipc_release() and tipc_shutdown().

Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller

Parthasarathy Bhuvaragan
2016-11-01 23:53:25 +0800
9fd4b070f tipc: create TIPC_DISCONNECTING as a new sk_state ... Browse Code »

In this commit, we create a new tipc socket state TIPC_DISCONNECTING in
sk_state. TIPC_DISCONNECTING is replacing the socket connection status
update using SS_DISCONNECTING.
TIPC_DISCONNECTING is set for connection oriented sockets at:
- tipc_shutdown()
- connection probe timeout
- when we receive an error message on the connection.

There is no functional change in this commit.

Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller

Parthasarathy Bhuvaragan
2016-11-01 23:53:25 +0800
438adcaf0 tipc: create TIPC_OPEN as a new sk_state ... Browse Code »

In this commit, we create a new tipc socket state TIPC_OPEN in
sk_state. We primarily replace the SS_UNCONNECTED sock->state with
TIPC_OPEN.

Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller

Parthasarathy Bhuvaragan
2016-11-01 23:53:25 +0800
8ea642ee9 tipc: create TIPC_ESTABLISHED as a new sk_state ... Browse Code »

Until now, tipc maintains probing state for connected sockets in
tsk->probing_state variable.

In this commit, we express this information as socket states and
this remove the variable. We set probe_unacked flag when a probe
is sent out and reset it if we receive a reply. Instead of the
probing state TIPC_CONN_OK, we create a new state TIPC_ESTABLISHED.

There is no functional change in this commit.

Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller

Parthasarathy Bhuvaragan
2016-11-01 23:53:25 +0800
0c288c869 tipc: create TIPC_LISTEN as a new sk_state ... Browse Code »

Until now, tipc maintains the socket state in sock->state variable.
This is used to maintain generic socket states, but in tipc
we overload it and save tipc socket states like TIPC_LISTEN.
Other protocols like TCP, UDP store protocol specific states
in sk->sk_state instead.

In this commit, we :
- declare a new tipc state TIPC_LISTEN, that replaces SS_LISTEN
- Create a new function tipc_set_state(), to update sk->sk_state.
- TIPC_LISTEN state is maintained in sk->sk_state.
- replace references to SS_LISTEN with TIPC_LISTEN.

There is no functional change in this commit.

Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller

Parthasarathy Bhuvaragan
2016-11-01 23:53:25 +0800