03 May, 2017
3 commits
-
Pull networking updates from David Millar:
"Here are some highlights from the 2065 networking commits that
happened this development cycle:1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)
2) Add a generic XDP driver, so that anyone can test XDP even if they
lack a networking device whose driver has explicit XDP support
(me).3) Sparc64 now has an eBPF JIT too (me)
4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
Starovoitov)5) Make netfitler network namespace teardown less expensive (Florian
Westphal)6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)
7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)
8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)
9) Multiqueue support in stmmac driver (Joao Pinto)
10) Remove TCP timewait recycling, it never really could possibly work
well in the real world and timestamp randomization really zaps any
hint of usability this feature had (Soheil Hassas Yeganeh)11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
Aleksandrov)12) Add socket busy poll support to epoll (Sridhar Samudrala)
13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
and several others)14) IPSEC hw offload infrastructure (Steffen Klassert)"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
tipc: refactor function tipc_sk_recv_stream()
tipc: refactor function tipc_sk_recvmsg()
net: thunderx: Optimize page recycling for XDP
net: thunderx: Support for XDP header adjustment
net: thunderx: Add support for XDP_TX
net: thunderx: Add support for XDP_DROP
net: thunderx: Add basic XDP support
net: thunderx: Cleanup receive buffer allocation
net: thunderx: Optimize CQE_TX handling
net: thunderx: Optimize RBDR descriptor handling
net: thunderx: Support for page recycling
ipx: call ipxitf_put() in ioctl error path
net: sched: add helpers to handle extended actions
qed*: Fix issues in the ptp filter config implementation.
qede: Fix concurrency issue in PTP Tx path processing.
stmmac: Add support for SIMATIC IOT2000 platform
net: hns: fix ethtool_get_strings overflow in hns driver
tcp: fix wraparound issue in tcp_lp
bpf, arm64: fix jit branch offset related to ldimm64
bpf, arm64: implement jiting of BPF_XADD
... -
We try to make this function more readable by improving variable names
and comments, using more stack variables, and doing some smaller changes
to the logics. We also rename the function to make it consistent with
naming conventions used elsewhere in the code.Reviewed-by: Parthasarathy Bhuvaragan
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
We try to make this function more readable by improving variable names
and comments, plus some minor changes to the logics.Reviewed-by: Parthasarathy Bhuvaragan
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
29 Apr, 2017
3 commits
-
When a socket is shutting down, we notify the peer node about the
connection termination by reusing an incoming message if possible.
If the last received message was a connection acknowledgment
message, we reverse this message and set the error code to
TIPC_ERR_NO_PORT and send it to peer.In tipc_sk_proto_rcv(), we never check for message errors while
processing the connection acknowledgment or probe messages. Thus
this message performs the usual flow control accounting and leaves
the session hanging.In this commit, we terminate the connection when we receive such
error messages.Signed-off-by: Parthasarathy Bhuvaragan
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller -
Until now, the checks for sockets in CONNECTING state was based on
the assumption that the incoming message was always from the
peer's accepted data socket.However an application using a non-blocking socket sends an implicit
connect, this socket which is in CONNECTING state can receive error
messages from the peer's listening socket. As we discard these
messages, the application socket hangs as there due to inactivity.
In addition to this, there are other places where we process errors
but do not notify the user.In this commit, we process such incoming error messages and notify
our users about them using sk_state_change().Signed-off-by: Parthasarathy Bhuvaragan
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller -
In filter_connect, we use waitqueue_active() to check for any
connections to wakeup. But waitqueue_active() is missing memory
barriers while accessing the critical sections, leading to
inconsistent results.In this commit, we replace this with an SMP safe wq_has_sleeper()
using the generic socket callback sk_data_ready().Signed-off-by: Parthasarathy Bhuvaragan
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller
27 Apr, 2017
1 commit
-
Signed-off-by: David S. Miller
25 Apr, 2017
3 commits
-
Until now in tipc_recv_stream(), we update the received
unacknowledged bytes based on a stack variable and not based on the
actual message size.
If the user buffer passed at tipc_recv_stream() is smaller than the
received skb, the size variable in stack differs from the actual
message size in the skb. This leads to a flow control accounting
error causing permanent congestion.In this commit, we fix this accounting error by always using the
size of the incoming message.Fixes: 10724cc7bb78 ("tipc: redesign connection-level flow control")
Signed-off-by: Parthasarathy Bhuvaragan
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller -
Until now in tipc_send_stream(), we return -1 when the socket
encounters link congestion even if the socket had successfully
sent partial data. This is incorrect as the application resends
the same the partial data leading to data corruption at
receiver's end.In this commit, we return the partially sent bytes as the return
value at link congestion.Fixes: 10724cc7bb78 ("tipc: redesign connection-level flow control")
Signed-off-by: Parthasarathy Bhuvaragan
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller -
Function nlmsg_new() will return a NULL pointer if there is no enough
memory, and its return value should be checked before it is used.
However, in function tipc_nl_node_get_monitor(), the validation of the
return value of function nlmsg_new() is missed. This patch fixes the
bug.Signed-off-by: Pan Bian
Signed-off-by: David S. Miller
14 Apr, 2017
2 commits
-
This is an add-on to the previous patch that passes the extended ACK
structure where it's already available by existing genl_info or extack
function arguments.This was done with this spatch (with some manual adjustment of
indentation):@@
expression A, B, C, D, E;
identifier fn, info;
@@
fn(..., struct genl_info *info, ...) {
...
-nlmsg_parse(A, B, C, D, E, NULL)
+nlmsg_parse(A, B, C, D, E, info->extack)
...
}@@
expression A, B, C, D, E;
identifier fn, info;
@@
fn(..., struct genl_info *info, ...) {
extack)
...>
}@@
expression A, B, C, D, E;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {}
@@
expression A, B, C, D, E;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {}
@@
expression A, B, C, D, E;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {
...
-nlmsg_parse(A, B, C, D, E, NULL)
+nlmsg_parse(A, B, C, D, E, extack)
...
}@@
expression A, B, C, D;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {}
@@
expression A, B, C, D;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {}
@@
expression A, B, C, D;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {}
@@
expression A, B, C;
identifier fn, extack;
@@
fn(..., struct netlink_ext_ack *extack, ...) {}
Signed-off-by: Johannes Berg
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller -
Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)Signed-off-by: Johannes Berg
Signed-off-by: David S. Miller
30 Mar, 2017
2 commits
-
for socketpairs using connectionless transport, we cache
the respective node local TIPC portid to use in subsequent
calls to send() in the socket's private data.Signed-off-by: Erik Hugne
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller -
sockets A and B are connected back-to-back, similar to what
AF_UNIX does.Signed-off-by: Erik Hugne
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller
29 Mar, 2017
2 commits
-
When a new subscription object is inserted into name_seq->subscriptions
list, it's under name_seq->lock protection; when a subscription is
deleted from the list, it's also under the same lock protection;
similarly, when accessing a subscription by going through subscriptions
list, the entire process is also protected by the name_seq->lock.Therefore, if subscription refcount is increased before it's inserted
into subscriptions list, and its refcount is decreased after it's
deleted from the list, it will be unnecessary to hold refcount at all
before accessing subscription object which is obtained by going through
subscriptions list under name_seq->lock protection.Signed-off-by: Ying Xue
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller -
After a subscription object is created, it's inserted into its
subscriber subscrp_list list under subscriber lock protection,
similarly, before it's destroyed, it should be first removed from
its subscriber->subscrp_list. Since the subscription list is
accessed with subscriber lock, all the subscriptions are valid
during the lock duration. Hence in tipc_subscrb_subscrp_delete(), we
remove subscription get/put and the extra subscriber unlock/lock.After this change, the subscriptions refcount cleanup is very simple
and does not access any lock.Acked-by: Jon Maloy
Signed-off-by: Ying Xue
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller
23 Mar, 2017
1 commit
-
Until now, tipc_nametbl_unsubscribe() is called at subscriptions
reference count cleanup. Usually the subscriptions cleanup is
called at subscription timeout or at subscription cancel or at
subscriber delete.We have ignored the possibility of this being called from other
locations, which causes deadlock as we try to grab the
tn->nametbl_lock while holding it already.CPU1: CPU2:
---------- ----------------
tipc_nametbl_publish
spin_lock_bh(&tn->nametbl_lock)
tipc_nametbl_insert_publ
tipc_nameseq_insert_publ
tipc_subscrp_report_overlap
tipc_subscrp_get
tipc_subscrp_send_event
tipc_close_conn
tipc_subscrb_release_cb
tipc_subscrb_delete
tipc_subscrp_put
tipc_subscrp_put
tipc_subscrp_kref_release
tipc_nametbl_unsubscribe
spin_lock_bh(&tn->nametbl_lock)
<>CPU1: CPU2:
---------- ----------------
tipc_nametbl_stop
spin_lock_bh(&tn->nametbl_lock)
tipc_purge_publications
tipc_nameseq_remove_publ
tipc_subscrp_report_overlap
tipc_subscrp_get
tipc_subscrp_send_event
tipc_close_conn
tipc_subscrb_release_cb
tipc_subscrb_delete
tipc_subscrp_put
tipc_subscrp_put
tipc_subscrp_kref_release
tipc_nametbl_unsubscribe
spin_lock_bh(&tn->nametbl_lock)
<>In this commit, we advance the calling of tipc_nametbl_unsubscribe()
from the refcount cleanup to the intended callers.Fixes: d094c4d5f5c7 ("tipc: add subscription refcount to avoid invalid delete")
Reported-by: John Thompson
Acked-by: Jon Maloy
Signed-off-by: Ying Xue
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller
10 Mar, 2017
1 commit
-
Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.The theory lockdep comes up with is as follows:
(1) If the pagefault handler decides it needs to read pages from AFS, it
calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
creating a call requires the socket lock:mmap_sem must be taken before sk_lock-AF_RXRPC
(2) afs_open_socket() opens an AF_RXRPC socket and binds it. rxrpc_bind()
binds the underlying UDP socket whilst holding its socket lock.
inet_bind() takes its own socket lock:sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET
(3) Reading from a TCP socket into a userspace buffer might cause a fault
and thus cause the kernel to take the mmap_sem, but the TCP socket is
locked whilst doing this:sk_lock-AF_INET must be taken before mmap_sem
However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks. The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace. This is
a limitation in the design of lockdep.Fix the general case by:
(1) Double up all the locking keys used in sockets so that one set are
used if the socket is created by userspace and the other set is used
if the socket is created by the kernel.(2) Store the kern parameter passed to sk_alloc() in a variable in the
sock struct (sk_kern_sock). This informs sock_lock_init(),
sock_init_data() and sk_clone_lock() as to the lock keys to be used.Note that the child created by sk_clone_lock() inherits the parent's
kern setting.(3) Add a 'kern' parameter to ->accept() that is analogous to the one
passed in to ->create() that distinguishes whether kernel_accept() or
sys_accept4() was the caller and can be passed to sk_alloc().Note that a lot of accept functions merely dequeue an already
allocated socket. I haven't touched these as the new socket already
exists before we get the parameter.Note also that there are a couple of places where I've made the accepted
socket unconditionally kernel-based:irda_accept()
rds_rcp_accept_one()
tcp_accept_from_sock()because they follow a sock_create_kern() and accept off of that.
Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal. I wonder if these should do that so
that they use the new set of lock keys.Signed-off-by: David Howells
Signed-off-by: David S. Miller
02 Mar, 2017
1 commit
-
…hed.h> into <linux/sched/signal.h>
Fix up affected files that include this signal functionality via sched.h.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
25 Feb, 2017
1 commit
-
In the function tipc_rcv() we initialize a couple of stack variables
from the message header before that same header has been validated.
In rare cases when the arriving header is non-linar, the validation
function itself may linearize the buffer by calling skb_may_pull(),
while the wrongly initialized stack fields are not updated accordingly.We fix this in this commit.
Reported-by: Matthew Wong
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
18 Feb, 2017
1 commit
-
There are two problems with the function tipc_sk_reinit. Firstly
it's doing a manual walk over an rhashtable. This is broken as
an rhashtable can be resized and if you manually walk over it
during a resize then you may miss entries.Secondly it's missing memory barriers as previously the code used
spinlocks which provide the barriers implicitly.This patch fixes both problems.
Fixes: 07f6c4bc048a ("tipc: convert tipc reference table to...")
Signed-off-by: Herbert Xu
Acked-by: Ying Xue
Signed-off-by: David S. Miller
28 Jan, 2017
1 commit
-
Two trivial overlapping changes conflicts in MPLS and mlx5.
Signed-off-by: David S. Miller
26 Jan, 2017
1 commit
-
We shuffled some code around and added some new case statements here and
now "res" isn't initialized on all paths.Fixes: 01fd12bb189a ("tipc: make replicast a user selectable option")
Signed-off-by: Dan Carpenter
Signed-off-by: David S. Miller
25 Jan, 2017
6 commits
-
In tipc_server_stop(), we iterate over the connections with limiting
factor as server's idr_in_use. We ignore the fact that this variable
is decremented in tipc_close_conn(), leading to premature exit.In this commit, we iterate until the we have no connections left.
Acked-by: Ying Xue
Acked-by: Jon Maloy
Tested-by: John Thompson
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller -
In tipc_conn_sendmsg(), we first queue the request to the outqueue
followed by the connection state check. If the connection is not
connected, we should not queue this message.In this commit, we reject the messages if the connection state is
not CF_CONNECTED.Acked-by: Ying Xue
Acked-by: Jon Maloy
Tested-by: John Thompson
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller -
Commit 333f796235a527 ("tipc: fix a race condition leading to
subscriber refcnt bug") reveals a soft lockup while acquiring
nametbl_lock.Before commit 333f796235a527, we call tipc_conn_shutdown() from
tipc_close_conn() in the context of tipc_topsrv_stop(). In that
context, we are allowed to grab the nametbl_lock.Commit 333f796235a527, moved tipc_conn_release (renamed from
tipc_conn_shutdown) to the connection refcount cleanup. This allows
either tipc_nametbl_withdraw() or tipc_topsrv_stop() to the cleanup.Since tipc_exit_net() first calls tipc_topsrv_stop() and then
tipc_nametble_withdraw() increases the chances for the later to
perform the connection cleanup.The soft lockup occurs in the call chain of tipc_nametbl_withdraw(),
when it performs the tipc_conn_kref_release() as it tries to grab
nametbl_lock again while holding it already.
tipc_nametbl_withdraw() grabs nametbl_lock
tipc_nametbl_remove_publ()
tipc_subscrp_report_overlap()
tipc_subscrp_send_event()
tipc_conn_sendmsg()
<< if (con->flags != CF_CONNECTED) we do conn_put(),
triggering the cleanup as refcount=0. >>
tipc_conn_kref_release
tipc_sock_release
tipc_conn_release
tipc_subscrb_delete
tipc_subscrp_delete
tipc_nametbl_unsubscribe << Soft Lockup >>The previous changes in this series fixes the race conditions fixed
by commit 333f796235a527. Hence we can now revert the commit.Fixes: 333f796235a52727 ("tipc: fix a race condition leading to subscriber refcnt bug")
Reported-and-Tested-by: John Thompson
Acked-by: Ying Xue
Acked-by: Jon Maloy
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller -
Until now, the generic server framework maintains the connection
id's per subscriber in server's conn_idr. At tipc_close_conn, we
remove the connection id from the server list, but the connection is
valid until we call the refcount cleanup. Hence we have a window
where the server allocates the same connection to an new subscriber
leading to inconsistent reference count. We have another refcount
warning we grab the refcount in tipc_conn_lookup() for connections
with flag with CF_CONNECTED not set. This usually occurs at shutdown
when the we stop the topology server and withdraw TIPC_CFG_SRV
publication thereby triggering a withdraw message to subscribers.In this commit, we:
1. remove the connection from the server list at recount cleanup.
2. grab the refcount for a connection only if CF_CONNECTED is set.Tested-by: John Thompson
Acked-by: Ying Xue
Acked-by: Jon Maloy
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller -
Until now, the subscribers keep track of the subscriptions using
reference count at subscriber level. At subscription cancel or
subscriber delete, we delete the subscription only if the timer
was pending for the subscription. This approach is incorrect as:
1. del_timer() is not SMP safe, if on CPU0 the check for pending
timer returns true but CPU1 might schedule the timer callback
thereby deleting the subscription. Thus when CPU0 is scheduled,
it deletes an invalid subscription.
2. We export tipc_subscrp_report_overlap(), which accesses the
subscription pointer multiple times. Meanwhile the subscription
timer can expire thereby freeing the subscription and we might
continue to access the subscription pointer leading to memory
violations.In this commit, we introduce subscription refcount to avoid deleting
an invalid subscription.Reported-and-Tested-by: John Thompson
Acked-by: Ying Xue
Acked-by: Jon Maloy
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller -
We trigger a soft lockup as we grab nametbl_lock twice if the node
has a pending node up/down or link up/down event while:
- we process an incoming named message in tipc_named_rcv() and
perform an tipc_update_nametbl().
- we have pending backlog items in the name distributor queue
during a nametable update using tipc_nametbl_publish() or
tipc_nametbl_withdraw().The following are the call chain associated:
tipc_named_rcv() Grabs nametbl_lock
tipc_update_nametbl() (publish/withdraw)
tipc_node_subscribe()/unsubscribe()
tipc_node_write_unlock()
<< lockup occurs if an outstanding node/link event
exits, as we grabs nametbl_lock again >>tipc_nametbl_withdraw() Grab nametbl_lock
tipc_named_process_backlog()
tipc_update_nametbl()
<< rest as above >>The function tipc_node_write_unlock(), in addition to releasing the
lock processes the outstanding node/link up/down events. To do this,
we need to grab the nametbl_lock again leading to the lockup.In this commit we fix the soft lockup by introducing a fast variant of
node_unlock(), where we just release the lock. We adapt the
node_subscribe()/node_unsubscribe() to use the fast variants.Reported-and-Tested-by: John Thompson
Acked-by: Ying Xue
Acked-by: Jon Maloy
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller
21 Jan, 2017
4 commits
-
If the bearer carrying multicast messages supports broadcast, those
messages will be sent to all cluster nodes, irrespective of whether
these nodes host any actual destinations socket or not. This is clearly
wasteful if the cluster is large and there are only a few real
destinations for the message being sent.In this commit we extend the eligibility of the newly introduced
"replicast" transmit option. We now make it possible for a user to
select which method he wants to be used, either as a mandatory setting
via setsockopt(), or as a relative setting where we let the broadcast
layer decide which method to use based on the ratio between cluster
size and the message's actual number of destination nodes.In the latter case, a sending socket must stick to a previously
selected method until it enters an idle period of at least 5 seconds.
This eliminates the risk of message reordering caused by method change,
i.e., when changes to cluster size or number of destinations would
otherwise mandate a new method to be used.Reviewed-by: Parthasarathy Bhuvaragan
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
TIPC multicast messages are currently carried over a reliable
'broadcast link', making use of the underlying media's ability to
transport packets as L2 broadcast or IP multicast to all nodes in
the cluster.When the used bearer is lacking that ability, we can instead emulate
the broadcast service by replicating and sending the packets over as
many unicast links as needed to reach all identified destinations.
We now introduce a new TIPC link-level 'replicast' service that does
this.Reviewed-by: Parthasarathy Bhuvaragan
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
As a further preparation for the upcoming 'replicast' functionality,
we add some necessary structs and functions for looking up and returning
a list of all nodes that host destinations for a given multicast message.Reviewed-by: Parthasarathy Bhuvaragan
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
As a preparation for the 'replicast' functionality we are going to
introduce in the next commits, we need the broadcast base structure to
store whether bearer broadcast is available at all from the currently
used bearer or bearers.We do this by adding a new function tipc_bearer_bcast_support() to
the bearer layer, and letting the bearer selection function in
bcast.c use this to give a new boolean field, 'bcast_support' the
appropriate value.Reviewed-by: Parthasarathy Bhuvaragan
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
18 Jan, 2017
1 commit
17 Jan, 2017
1 commit
-
Until now, we allocate memory always with GFP_ATOMIC flag.
When the system is under memory pressure and a user tries to send,
the send fails due to low memory. However, the user application
can wait for free memory if we allocate it using GFP_KERNEL flag.In this commit, we use allocate memory with GFP_KERNEL for all user
allocation.Reported-by: Rune Torgersen
Acked-by: Jon Maloy
Signed-off-by: Parthasarathy Bhuvaragan
Signed-off-by: David S. Miller
04 Jan, 2017
3 commits
-
The socket code currently handles link congestion by either blocking
and trying to send again when the congestion has abated, or just
returning to the user with -EAGAIN and let him re-try later.This mechanism is prone to starvation, because the wakeup algorithm is
non-atomic. During the time the link issues a wakeup signal, until the
socket wakes up and re-attempts sending, other senders may have come
in between and occupied the free buffer space in the link. This in turn
may lead to a socket having to make many send attempts before it is
successful. In extremely loaded systems we have observed latency times
of several seconds before a low-priority socket is able to send out a
message.In this commit, we simplify this mechanism and reduce the risk of the
described scenario happening. When a message is attempted sent via a
congested link, we now let it be added to the link's backlog queue
anyway, thus permitting an oversubscription of one message per source
socket. We still create a wakeup item and return an error code, hence
instructing the sender to block or stop sending. Only when enough space
has been freed up in the link's backlog queue do we issue a wakeup event
that allows the sender to continue with the next message, if any.The fact that a socket now can consider a message sent even when the
link returns a congestion code means that the sending socket code can
be simplified. Also, since this is a good opportunity to get rid of the
obsolete 'mtu change' condition in the three socket send functions, we
now choose to refactor those functions completely.Signed-off-by: Parthasarathy Bhuvaragan
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
During multicast reception we currently use a simple linked list with
push/pop semantics to store port numbers.We now see a need for a more generic list for storing values of type
u32. We therefore make some modifications to this list, while replacing
the prefix 'tipc_plist_' with 'u32_'. We also add a couple of new
functions which will come to use in the next commits.Acked-by: Parthasarathy Bhuvaragan
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller -
The functions tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() are very
similar. The latter function is also called from two locations, and
there will be more in the coming commits, which will all need to test on
different conditions.Instead of making yet another duplicates of the function, we now
introduce a new macro tipc_wait_for_cond() where the wakeup condition
can be stated as an argument to the call. This macro replaces all
current and future uses of the two functions, which can now be
eliminated.Acked-by: Parthasarathy Bhuvaragan
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
24 Dec, 2016
1 commit
-
In commit 6f00089c7372 ("tipc: remove SS_DISCONNECTING state") the
check for socket type is in the wrong place, causing a closing socket
to always send out a FIN message even when the socket was never
connected. This is normally harmless, since the destination node for
such messages most often is zero, and the message will be dropped, but
it is still a wrong and confusing behavior.We fix this in this commit.
Reviewed-by: Parthasarathy Bhuvaragan
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller
17 Dec, 2016
1 commit
-
Pull vfs updates from Al Viro:
- more ->d_init() stuff (work.dcache)
- pathname resolution cleanups (work.namei)
- a few missing iov_iter primitives - copy_from_iter_full() and
friends. Either copy the full requested amount, advance the iterator
and return true, or fail, return false and do _not_ advance the
iterator. Quite a few open-coded callers converted (and became more
readable and harder to fuck up that way) (work.iov_iter)- several assorted patches, the big one being logfs removal
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
logfs: remove from tree
vfs: fix put_compat_statfs64() does not handle errors
namei: fold should_follow_link() with the step into not-followed link
namei: pass both WALK_GET and WALK_MORE to should_follow_link()
namei: invert WALK_PUT logics
namei: shift interpretation of LOOKUP_FOLLOW inside should_follow_link()
namei: saner calling conventions for mountpoint_last()
namei.c: get rid of user_path_parent()
switch getfrag callbacks to ..._full() primitives
make skb_add_data,{_nocache}() and skb_copy_to_page_nocache() advance only on success
[iov_iter] new primitives - copy_from_iter_full() and friends
don't open-code file_inode()
ceph: switch to use of ->d_init()
ceph: unify dentry_operations instances
lustre: switch to use of ->d_init()