Eric Lee / smarc-fsl-linux-kernel

09 Jul, 2015

1 commit

fdd75ea8d net/tipc: initialize security state for new connection socket ... Browse Code »

Calling connect() with an AF_TIPC socket would trigger a series
of error messages from SELinux along the lines of:
SELinux: Invalid class 0
type=AVC msg=audit(1434126658.487:34500): avc: denied { }
for pid=292 comm="kworker/u16:5" scontext=system_u:system_r:kernel_t:s0
tcontext=system_u:object_r:unlabeled_t:s0 tclass=
permissive=0

This was due to a failure to initialize the security state of the new
connection sock by the tipc code, leaving it with junk in the security
class field and an unlabeled secid. Add a call to security_sk_clone()
to inherit the security state from the parent socket.

Reported-by: Tim Shearer
Signed-off-by: Stephen Smalley
Acked-by: Paul Moore
Acked-by: Ying Xue
Signed-off-by: David S. Miller

Stephen Smalley
2015-07-09 07:08:23 +0800

29 Jun, 2015

1 commit

7d967b673 tipc: purge backlog queue counters when broadcast link is reset ... Browse Code »

In commit 1f66d161ab3d8b518903fa6c3f9c1f48d6919e74
("tipc: introduce starvation free send algorithm")
we introduced a counter per priority level for buffers
in the link backlog queue. We also introduced a new
function tipc_link_purge_backlog(), to reset these
counters to zero when the link is reset.

Unfortunately, we missed to call this function when
the broadcast link is reset, with the result that the
values of these counters might be permanently skewed
when new nodes are attached. This may in the worst case
lead to permananent, but spurious, broadcast link
congestion, where no broadcast packets can be sent at
all.

We fix this bug with this commit.

Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-06-29 07:43:02 +0800

14 Jun, 2015

1 commit

25c43bf13 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2015-06-14 14:56:52 +0800

11 Jun, 2015

1 commit

b3be5e3e7 tipc: disconnect socket directly after probe failure ... Browse Code »

If the TIPC connection timer expires in a probing state, a
self abort message is supposed to be generated and delivered
to the local socket. This is currently broken, and the abort
message is actually sent out to the peer node with invalid
addressing information. This will cause the link to enter
a constant retransmission state and eventually reset.
We fix this by removing the self-abort message creation and
tear down connection immediately instead.

Signed-off-by: Erik Hugne
Reviewed-by: Ying Xue
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Erik Hugne
2015-06-11 13:05:20 +0800

31 May, 2015

1 commit

1ea23a211 tipc: unconditionally put sock refcnt when sock timer to be deleted is pending ... Browse Code »

As sock refcnt is taken when sock timer is started in
sk_reset_timer(), the sock refcnt should be put when sock timer
to be deleted is in pending state no matter what "probing_state"
value of tipc sock is.

Reviewed-by: Erik Hugne
Reviewed-by: Jon Maloy
Signed-off-by: Ying Xue
Signed-off-by: David S. Miller

Ying Xue
2015-05-31 09:08:37 +0800

27 May, 2015

1 commit

f3903bcc0 tipc: fix bug in link protocol message create function ... Browse Code »

In commit dd3f9e70f59f43a5712eba9cf3ee4f1e6999540c
("tipc: add packet sequence number at instant of transmission") we
made a change with the consequence that packets in the link backlog
queue don't contain valid sequence numbers.

However, when we create a link protocol message, we still use the
sequence number of the first packet in the backlog, if there is any,
as "next_sent" indicator in the message. This may entail unnecessary
retransissions or stale packet transmission when there is very low
traffic on the link.

This commit fixes this issue by only using the current value of
tipc_link::snd_nxt as indicator.

Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-05-27 07:43:03 +0800

15 May, 2015

9 commits

fa787ae06 tipc: use sock_create_kern interface to create kernel socket ... Browse Code »

After commit eeb1bd5c40ed ("net: Add a struct net parameter to
sock_create_kern"), we should use sock_create_kern() to create kernel
socket as the interface doesn't reference count struct net any more.

Signed-off-by: Ying Xue
Signed-off-by: David S. Miller

Ying Xue
2015-05-15 01:39:33 +0800
dd3f9e70f tipc: add packet sequence number at instant of transmission ... Browse Code »

Currently, the packet sequence number is updated and added to each
packet at the moment a packet is added to the link backlog queue.
This is wasteful, since it forces the code to traverse the send
packet list packet by packet when adding them to the backlog queue.
It would be better to just splice the whole packet list into the
backlog queue when that is the right action to do.

In this commit, we do this change. Also, since the sequence numbers
cannot now be assigned to the packets at the moment they are added
the backlog queue, we do instead calculate and add them at the moment
of transmission, when the backlog queue has to be traversed anyway.
We do this in the function tipc_link_push_packet().

Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-05-15 00:24:46 +0800
f21e897ec tipc: improve link congestion algorithm ... Browse Code »

The link congestion algorithm used until now implies two problems.

- It is too generous towards lower-level messages in situations of high
load by giving "absolute" bandwidth guarantees to the different
priority levels. LOW traffic is guaranteed 10%, MEDIUM is guaranted
20%, HIGH is guaranteed 30%, and CRITICAL is guaranteed 40% of the
available bandwidth. But, in the absence of higher level traffic, the
ratio between two distinct levels becomes unreasonable. E.g. if there
is only LOW and MEDIUM traffic on a system, the former is guaranteed
1/3 of the bandwidth, and the latter 2/3. This again means that if
there is e.g. one LOW user and 10 MEDIUM users, the former will have
33.3% of the bandwidth, and the others will have to compete for the
remainder, i.e. each will end up with 6.7% of the capacity.

- Packets of type MSG_BUNDLER are created at SYSTEM importance level,
but only after the packets bundled into it have passed the congestion
test for their own respective levels. Since bundled packets don't
result in incrementing the level counter for their own importance,
only occasionally for the SYSTEM level counter, they do in practice
obtain SYSTEM level importance. Hence, the current implementation
provides a gap in the congestion algorithm that in the worst case
may lead to a link reset.

We now refine the congestion algorithm as follows:

- A message is accepted to the link backlog only if its own level
counter, and all superior level counters, permit it.

- The importance of a created bundle packet is set according to its
contents. A bundle packet created from messges at levels LOW to
CRITICAL is given importance level CRITICAL, while a bundle created
from a SYSTEM level message is given importance SYSTEM. In the latter
case only subsequent SYSTEM level messages are allowed to be bundled
into it.

This solves the first problem described above, by making the bandwidth
guarantee relative to the total number of users at all levels; only
the upper limit for each level remains absolute. In the example
described above, the single LOW user would use 1/11th of the bandwidth,
the same as each of the ten MEDIUM users, but he still has the same
guarantee against starvation as the latter ones.

The fix also solves the second problem. If the CRITICAL level is filled
up by bundle packets of that level, no lower level packets will be
accepted any more.

Suggested-by: Gergely Kiss
Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-05-15 00:24:46 +0800
cd4eee3c2 tipc: simplify link supervision checkpointing ... Browse Code »

We change the sequence number checkpointing that is performed
by the timer in order to discover if the peer is active. Currently,
we store a checkpoint of the next expected sequence number "rcv_nxt"
at each timer expiration, and compare it to the current expected
number at next timeout expiration. Instead, we now use the already
existing field "silent_intv_cnt" for this task. We step the counter
at each timeout expiration, and zero it at each valid received packet.
If no valid packet has been received from the peer after "abort_limit"
number of silent timer intervals, the link is declared faulty and reset.

We also remove the multiple instances of timer activation from inside
the FSM function "link_state_event()", and now do it at only one place;
at the end of the timer function itself.

Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-05-15 00:24:46 +0800
a97b9d3fa tipc: rename fields in struct tipc_link ... Browse Code »

We rename some fields in struct tipc_link, in order to give them more
descriptive names:

next_in_no -> rcv_nxt
next_out_no-> snd_nxt
fsm_msg_cnt-> silent_intv_cnt
cont_intv -> keepalive_intv
last_retransmitted -> last_retransm

There are no functional changes in this commit.

Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-05-15 00:24:46 +0800
e4bf4f769 tipc: simplify packet sequence number handling ... Browse Code »

Although the sequence number in the TIPC protocol is 16 bits, we have
until now stored it internally as an unsigned 32 bits integer.
We got around this by always doing explicit modulo-65535 operations
whenever we need to access a sequence number.

We now make the incoming and outgoing sequence numbers to unsigned
16-bit integers, and remove the modulo operations where applicable.

We also move the arithmetic inline functions for 16 bit integers
to core.h, and the function buf_seqno() to msg.h, so they can easily
be accessed from anywhere in the code.

Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-05-15 00:24:46 +0800
a6bf70f79 tipc: simplify include dependencies ... Browse Code »

When we try to add new inline functions in the code, we sometimes
run into circular include dependencies.

The main problem is that the file core.h, which really should be at
the root of the dependency chain, instead is a leaf. I.e., core.h
includes a number of header files that themselves should be allowed
to include core.h. In reality this is unnecessary, because core.h does
not need to know the full signature of any of the structs it refers to,
only their type declaration.

In this commit, we remove all dependencies from core.h towards any
other tipc header file.

As a consequence of this change, we can now move the function
tipc_own_addr(net) from addr.c to addr.h, and make it inline.

There are no functional changes in this commit.

Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-05-15 00:24:45 +0800
75b44b018 tipc: simplify link timer handling ... Browse Code »

Prior to this commit, the link timer has been running at a "continuity
interval" of configured link tolerance/4. When a timer wakes up and
discovers that there has been no sign of life from the peer during the
previous interval, it divides its own timer interval by another factor
four, and starts sending one probe per new interval. When the configured
link tolerance time has passed without answer, i.e. after 16 unacked
probes, the link is declared faulty and reset.

This is unnecessary complex. It is sufficient to continue with the
original continuity interval, and instead reset the link after four
missed probe responses. This makes the timer handling in the link
simpler, and opens up for some planned later changes in this area.
This commit implements this change.

Reviewed-by: Richard Alpe
Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-05-15 00:24:45 +0800
b1c29f6b1 tipc: simplify resetting and disabling of bearers ... Browse Code »

Since commit 4b475e3f2f8e4e241de101c8240f1d74d0470494
("tipc: eliminate delayed link deletion at link failover") the extra
boolean parameter "shutting_down" is not any longer needed for the
functions bearer_disable() and tipc_link_delete_list().

Furhermore, the function tipc_link_reset_links(), called from
bearer_reset() is now unnecessary. We can just as well delete
all the links, as we do in bearer_disable(), and start over with
creating new links.

This commit introduces those changes.

Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-05-15 00:24:45 +0800

11 May, 2015

1 commit

11aa9c28b net: Pass kern from net_proto_family.create to sk_alloc ... Browse Code »

In preparation for changing how struct net is refcounted
on kernel sockets pass the knowledge that we are creating
a kernel socket from sock_create_kern through to sk_alloc.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2015-05-11 22:50:17 +0800

10 May, 2015

3 commits

b063bc5ea tipc: send explicit not supported error in nl compat ... Browse Code »

The legacy netlink API treated EPERM (permission denied) as
"operation not supported".

Reported-by: Tomi Ollila
Signed-off-by: Richard Alpe
Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Richard Alpe
2015-05-10 04:40:03 +0800
670f4f881 tipc: add broadcast link window set/get to nl api ... Browse Code »

Add the ability to get or set the broadcast link window through the
new netlink API. The functionality was unintentionally missing from
the new netlink API. Adding this means that we also fix the breakage
in the old API when coming through the compat layer.

Fixes: 37e2d4843f9e (tipc: convert legacy nl link prop set to nl compat)
Reported-by: Tomi Ollila
Signed-off-by: Richard Alpe
Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Richard Alpe
2015-05-10 04:40:02 +0800
c3d6fb85b tipc: fix default link prop regression in nl compat ... Browse Code »

Default link properties can be set for media or bearer. This
functionality was missed when introducing the NL compatibility layer.

This patch implements this functionality in the compat netlink
layer. It works the same way as it did in the old API. We search for
media and bearers matching the "link name". If we find a matching
media or bearer the link tolerance, priority or window is used as
default for new links on that media or bearer.

Fixes: 37e2d4843f9e (tipc: convert legacy nl link prop set to nl compat)
Reported-by: Tomi Ollila
Signed-off-by: Richard Alpe
Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: David S. Miller

Richard Alpe
2015-05-10 04:40:02 +0800

05 May, 2015

5 commits

90bdfcb76 tipc: deal with return value of tipc_conn_new callback ... Browse Code »

Once tipc_conn_new() returns NULL, the connection should be shut
down immediately, otherwise, oops may happen due to the NULL pointer.

Signed-off-by: Ying Xue
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-05-05 03:04:01 +0800
a13683f29 tipc: adjust locking policy of subscription ... Browse Code »

Currently subscriber's lock protects not only subscriber's subscription
list but also all subscriptions linked into the list. However, as all
members of subscription are never changed after they are initialized,
it's unnecessary for subscription to be protected under subscriber's
lock. If the lock is used to only protect subscriber's subscription
list, the adjustment not only makes the locking policy simpler, but
also helps to avoid a deadlock which may happen once creating a
subscription is failed.

Signed-off-by: Ying Xue
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-05-05 03:04:01 +0800
00bc00a93 tipc: involve reference counter for subscriber ... Browse Code »

At present subscriber's lock is used to protect the subscription list
of subscriber as well as subscriptions linked into the list. While one
or all subscriptions are deleted through iterating the list, the
subscriber's lock must be held. Meanwhile, as deletion of subscription
may happen in subscription timer's handler, the lock must be grabbed
in the function as well. When subscription's timer is terminated with
del_timer_sync() during above iteration, subscriber's lock has to be
temporarily released, otherwise, deadlock may occur. However, the
temporary release may cause the double free of a subscription as the
subscription is not disconnected from the subscription list.

Now if a reference counter is introduced to subscriber, subscription's
timer can be asynchronously stopped with del_timer(). As a result, the
issue is not only able to be fixed, but also relevant code is pretty
readable and understandable.

Signed-off-by: Ying Xue
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-05-05 03:04:01 +0800
1b764828a tipc: introduce tipc_subscrb_create routine ... Browse Code »

Introducing a new function makes the purpose of tipc_subscrb_connect_cb
callback routine more clear.

Signed-off-by: Ying Xue
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-05-05 03:04:01 +0800
57f1d1868 tipc: rename functions defined in subscr.c ... Browse Code »

When a topology server accepts a connection request from its client,
it allocates a connection instance and a tipc_subscriber structure
object. The former is used to communicate with client, and the latter
is often treated as a subscriber which manages all subscription events
requested from a same client. When a topology server receives a request
of subscribing name services from a client through the connection, it
creates a tipc_subscription structure instance which is seen as a
subscription recording what name services are subscribed. In order to
manage all subscriptions from a same client, topology server links
them into the subscrp_list of the subscriber. So subscriber and
subscription completely represents different meanings respectively,
but function names associated with them make us so confused that we
are unable to easily tell which function is against subscriber and
which is to subscription. So we want to eliminate the confusion by
renaming them.

Signed-off-by: Ying Xue
Reviewed-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-05-05 03:04:00 +0800

30 Apr, 2015

2 commits

0d699f28e tipc: fix problem with parallel link synchronization mechanism ... Browse Code »

Currently, we try to accumulate arrived packets in the links's
'deferred' queue during the parallel link syncronization phase.

This entails two problems:

- With an unlucky combination of arriving packets the algorithm
may go into a lockstep with the out-of-sequence handling function,
where the synch mechanism is adding a packet to the deferred queue,
while the out-of-sequence handling is retrieving it again, thus
ending up in a loop inside the node_lock scope.

- Even if this is avoided, the link will very often send out
unnecessary protocol messages, in the worst case leading to
redundant retransmissions.

We fix this by just dropping arriving packets on the upcoming link
during the synchronization phase, thus relying on the retransmission
protocol to resolve the situation once the two links have arrived to
a synchronized state.

Reviewed-by: Erik Hugne
Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-04-30 03:08:59 +0800
f2f67390a tipc: remove wrong use of NLM_F_MULTI ... Browse Code »

NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
it is sent only at the end of a dump.

Libraries like libnl will wait forever for NLMSG_DONE.

Fixes: 35b9dd7607f0 ("tipc: add bearer get/dump to new netlink api")
Fixes: 7be57fc69184 ("tipc: add link get/dump to new netlink api")
Fixes: 46f15c6794fb ("tipc: add media get/dump to new netlink api")
CC: Richard Alpe
CC: Jon Maloy
CC: Ying Xue
CC: tipc-discussion@lists.sourceforge.net
Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2015-04-30 02:59:17 +0800

23 Apr, 2015

3 commits

73a317377 tipc: fix node refcount issue ... Browse Code »

When link statistics is dumped over netlink, we iterate over
the list of peer nodes and append each links statistics to
the netlink msg. In the case where the dump is resumed after
filling up a nlmsg, the node refcnt is decremented without
having been incremented previously which may cause the node
reference to be freed. When this happens, the following
info/stacktrace will be generated, followed by a crash or
undefined behavior.
We fix this by removing the erroneous call to tipc_node_put
inside the loop that iterates over nodes.

[ 384.312303] INFO: trying to register non-static key.
[ 384.313110] the code is fine but needs lockdep annotation.
[ 384.313290] turning off the locking correctness validator.
[ 384.313290] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.0.0+ #13
[ 384.313290] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 384.313290] ffff88003c6d0290 ffff88003cc03ca8 ffffffff8170adf1 0000000000000007
[ 384.313290] ffffffff82728730 ffff88003cc03d38 ffffffff810a6a6d 00000000001d7200
[ 384.313290] ffff88003c6d0ab0 ffff88003cc03ce8 0000000000000285 0000000000000001
[ 384.313290] Call Trace:
[ 384.313290] [] dump_stack+0x4c/0x65
[ 384.313290] [] __lock_acquire+0xf3d/0xf50
[ 384.313290] [] lock_acquire+0xd5/0x290
[ 384.313290] [] ? link_timeout+0x1c/0x170 [tipc]
[ 384.313290] [] ? link_state_event+0x4e0/0x4e0 [tipc]
[ 384.313290] [] _raw_spin_lock_bh+0x40/0x80
[ 384.313290] [] ? link_timeout+0x1c/0x170 [tipc]
[ 384.313290] [] link_timeout+0x1c/0x170 [tipc]
[ 384.313290] [] call_timer_fn+0xb8/0x490
[ 384.313290] [] ? process_timeout+0x10/0x10
[ 384.313290] [] run_timer_softirq+0x21c/0x420
[ 384.313290] [] ? link_state_event+0x4e0/0x4e0 [tipc]
[ 384.313290] [] __do_softirq+0xf4/0x630
[ 384.313290] [] irq_exit+0x5d/0x60
[ 384.313290] [] smp_apic_timer_interrupt+0x41/0x50
[ 384.313290] [] apic_timer_interrupt+0x70/0x80
[ 384.313290] [] ? default_idle+0x20/0x210
[ 384.313290] [] ? default_idle+0x1e/0x210
[ 384.313290] [] arch_cpu_idle+0xa/0x10
[ 384.313290] [] cpu_startup_entry+0x2c3/0x530
[ 384.313290] [] ? clockevents_register_device+0x113/0x200
[ 384.313290] [] start_secondary+0x13f/0x170

Fixes: 8a0f6ebe8494 ("tipc: involve reference counter for node structure")
Signed-off-by: Erik Hugne
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Erik Hugne
2015-04-23 23:50:34 +0800
9871b27f6 tipc: fix random link reset problem ... Browse Code »

In the function tipc_sk_rcv(), the stack variable 'err'
is only initialized to TIPC_ERR_NO_PORT for the first
iteration over the link input queue. If a chain of messages
are received from a link, failure to lookup the socket for
any but the first message will cause the message to bounce back
out on a random link.
We fix this by properly initializing err.

Signed-off-by: Erik Hugne
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Erik Hugne
2015-04-23 23:50:34 +0800
def81f69b tipc: fix topology server broken issue ... Browse Code »

When a new topology server is launched in a new namespace, its
listening socket is inserted into the "init ns" namespace's socket
hash table rather than the one owned by the new namespace. Although
the socket's namespace is forcedly changed to the new namespace later,
the socket is still stored in the socket hash table of "init ns"
namespace. When a client created in the new namespace connects
its own topology server, the connection is failed as its server's
socket could not be found from its own namespace's socket table.

If __sock_create() instead of original sock_create_kern() is used
to create the server's socket through specifying an expected namesapce,
the socket will be inserted into the specified namespace's socket
table, thereby avoiding to the topology server broken issue.

Fixes: 76100a8a64bc ("tipc: fix netns refcnt leak")

Reported-by: Erik Hugne
Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Ying Xue
2015-04-23 23:50:34 +0800

08 Apr, 2015

1 commit

79b16aade udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb(). ... Browse Code »

That was we can make sure the output path of ipv4/ipv6 operate on
the UDP socket rather than whatever random thing happens to be in
skb->sk.

Based upon a patch by Jiri Pirko.

Signed-off-by: David S. Miller
Acked-by: Hannes Frederic Sowa

David Miller
2015-04-08 03:29:08 +0800

03 Apr, 2015

4 commits

ed193ece2 tipc: simplify link mtu negotiation ... Browse Code »

When a link is being established, the two endpoints advertise their
respective interface MTU in the transmitted RESET and ACTIVATE messages.
If there is any difference, the lower of the two MTUs will be selected
for use by both endpoints.

However, as a remnant of earlier attempts to introduce TIPC level
routing. there also exists an MTU discovery mechanism. If an intermediate
node has a lower MTU than the two endpoints, they will discover this
through a bisectional approach, and finally adopt this MTU for common use.

Since there is no TIPC level routing, and probably never will be,
this mechanism doesn't make any sense, and only serves to make the
link level protocol unecessarily complex.

In this commit, we eliminate the MTU discovery algorithm,and fall back
to the simple MTU advertising approach. This change is fully backwards
compatible.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-04-03 04:27:12 +0800
dff29b1a8 tipc: eliminate delayed link deletion at link failover ... Browse Code »

When a bearer is disabled manually, all its links have to be reset
and deleted. However, if there is a remaining, parallel link ready
to take over a deleted link's traffic, we currently delay the delete
of the removed link until the failover procedure is finished. This
is because the remaining link needs to access state from the reset
link, such as the last received packet number, and any partially
reassembled buffer, in order to perform a successful failover.

In this commit, we do instead move the state data over to the new
link, so that it can fulfill the procedure autonomously, without
accessing any data on the old link. This means that we can now
proceed and delete all pertaining links immediately when a bearer
is disabled. This saves us from some unnecessary complexity in such
situations.

We also choose to change the confusing definitions CHANGEOVER_PROTOCOL,
ORIGINAL_MSG and DUPLICATE_MSG to the more descriptive TUNNEL_PROTOCOL,
FAILOVER_MSG and SYNCH_MSG respectively.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-04-03 04:27:12 +0800
2da714251 tipc: drop tunneled packet duplicates at reception ... Browse Code »

In commit 8b4ed8634f8b3f9aacfc42b4a872d30c36b9e255
("tipc: eliminate race condition at dual link establishment")
we introduced a parallel link synchronization mechanism that
guarentees sequential delivery even for users switching from
an old to a newly established link. The new mechanism makes it
unnecessary to deliver the tunneled duplicate packets back to
the old link, as we are currently doing. It is now sufficient
to use the last tunneled packet's inner sequence number as
synchronization point between the two parallel links, whereafter
it can be dropped.

In this commit, we drop the duplicate packets arriving on the new
link, after updating the synchronization point at each new arrival.

Although it would now have been sufficient for the other endpoint
to only tunnel the last packet in its send queue, and not the
entire queue, we must still do this to maintain compatibility
with older nodes.

This commit makes it possible to get rid if some complex
interaction between the two parallel links.

Reviewed-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-04-03 04:27:12 +0800
9f0d34bc3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/usb/asix_common.c
drivers/net/usb/sr9800.c
drivers/net/usb/usbnet.c
include/linux/usb/usbnet.h
net/ipv4/tcp_ipv4.c
net/ipv6/tcp_ipv6.c

The TCP conflicts were overlapping changes. In 'net' we added a
READ_ONCE() to the socket cached RX route read, whilst in 'net-next'
Eric Dumazet touched the surrounding code dealing with how mini
sockets are handled.

With USB, it's a case of the same bug fix first going into net-next
and then I cherry picked it back into net.

Signed-off-by: David S. Miller

David S. Miller
2015-04-03 04:16:53 +0800

01 Apr, 2015

1 commit

7e4369057 tipc: fix a slab object leak ... Browse Code »

When remove TIPC module, there is a warning to remind us that a slab
object is leaked like:

root@localhost:~# rmmod tipc
[ 19.056226] =============================================================================
[ 19.057549] BUG TIPC (Not tainted): Objects remaining in TIPC on kmem_cache_close()
[ 19.058736] -----------------------------------------------------------------------------
[ 19.058736]
[ 19.060287] INFO: Slab 0xffffea0000519a00 objects=23 used=1 fp=0xffff880014668b00 flags=0x100000000004080
[ 19.061915] INFO: Object 0xffff880014668000 @offset=0
[ 19.062717] kmem_cache_destroy TIPC: Slab cache still has objects

This is because the listening socket of TIPC topology server is not
closed before TIPC proto handler is unregistered with proto_unregister().
However, as the socket is closed in tipc_exit_net() which is called by
unregister_pernet_subsys() during unregistering TIPC namespace operation,
the warning can be eliminated if calling unregister_pernet_subsys() is
moved before calling proto_unregister().

Fixes: e05b31f4bf89 ("tipc: make tipc socket support net namespace")
Reviewed-by: Erik Hugne
Signed-off-by: Ying Xue
Signed-off-by: David S. Miller

Ying Xue
2015-04-01 11:10:08 +0800

30 Mar, 2015

3 commits

d482994fc tipc: fix two bugs in secondary destination lookup ... Browse Code »

A message sent to a node after a successful name table lookup may still
find that the destination socket has disappeared, because distribution
of name table updates is non-atomic. If so, the message will be rejected
back to the sender with error code TIPC_ERR_NO_PORT. If the source
socket of the message has disappeared in the meantime, the message
should be dropped.

However, in the currrent code, the message will instead be subject to an
unwanted tertiary lookup, because the function tipc_msg_lookup_dest()
doesn't check if there is an error code present in the message before
performing the lookup. In the worst case, the message may now find the
old destination again, and be redirected once more, instead of being
dropped directly as it should be.

A second bug in this function is that the "prev_node" field in the message
is not updated after successful lookup, something that may have
unpredictable consequences.

The problems arising from those bugs occur very infrequently.

The third change in this function; the test on msg_reroute_msg_cnt() is
purely cosmetic, reflecting that the returned value never can be negative.

This commit corrects the two bugs described above.

Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-03-30 04:47:36 +0800
8a0f6ebe8 tipc: involve reference counter for node structure ... Browse Code »

TIPC node hash node table is protected with rcu lock on read side.
tipc_node_find() is used to look for a node object with node address
through iterating the hash node table. As the entire process of what
tipc_node_find() traverses the table is guarded with rcu read lock,
it's safe for us. However, when callers use the node object returned
by tipc_node_find(), there is no rcu read lock applied. Therefore,
this is absolutely unsafe for callers of tipc_node_find().

Now we introduce a reference counter for node structure. Before
tipc_node_find() returns node object to its caller, it first increases
the reference counter. Accordingly, after its caller used it up,
it decreases the counter again. This can prevent a node being used by
one thread from being freed by another thread.

Reviewed-by: Erik Hugne
Reviewed-by: Jon Maloy
Signed-off-by: Ying Xue
Signed-off-by: David S. Miller

Ying Xue
2015-03-30 03:40:28 +0800
b952b2bef tipc: fix potential deadlock when all links are reset ... Browse Code »

[ 60.988363] ======================================================
[ 60.988754] [ INFO: possible circular locking dependency detected ]
[ 60.989152] 3.19.0+ #194 Not tainted
[ 60.989377] -------------------------------------------------------
[ 60.989781] swapper/3/0 is trying to acquire lock:
[ 60.990079] (&(&n_ptr->lock)->rlock){+.-...}, at: [] tipc_link_retransmit+0x1aa/0x240 [tipc]
[ 60.990743]
[ 60.990743] but task is already holding lock:
[ 60.991106] (&(&bclink->lock)->rlock){+.-...}, at: [] tipc_bclink_lock+0x8e/0xa0 [tipc]
[ 60.991738]
[ 60.991738] which lock already depends on the new lock.
[ 60.991738]
[ 60.992174]
[ 60.992174] the existing dependency chain (in reverse order) is:
[ 60.992174]
-> #1 (&(&bclink->lock)->rlock){+.-...}:
[ 60.992174] [] lock_acquire+0x9c/0x140
[ 60.992174] [] _raw_spin_lock_bh+0x3f/0x50
[ 60.992174] [] tipc_bclink_lock+0x8e/0xa0 [tipc]
[ 60.992174] [] tipc_bclink_add_node+0x97/0xf0 [tipc]
[ 60.992174] [] tipc_node_link_up+0xf5/0x110 [tipc]
[ 60.992174] [] link_state_event+0x2b3/0x4f0 [tipc]
[ 60.992174] [] tipc_link_proto_rcv+0x24c/0x418 [tipc]
[ 60.992174] [] tipc_rcv+0x827/0xac0 [tipc]
[ 60.992174] [] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
[ 60.992174] [] __netif_receive_skb_core+0x746/0x980
[ 60.992174] [] __netif_receive_skb+0x21/0x70
[ 60.992174] [] netif_receive_skb_internal+0x35/0x130
[ 60.992174] [] napi_gro_receive+0x158/0x1d0
[ 60.992174] [] e1000_clean_rx_irq+0x155/0x490
[ 60.992174] [] e1000_clean+0x267/0x990
[ 60.992174] [] net_rx_action+0x150/0x360
[ 60.992174] [] __do_softirq+0x123/0x360
[ 60.992174] [] irq_exit+0x8e/0xb0
[ 60.992174] [] do_IRQ+0x65/0x110
[ 60.992174] [] ret_from_intr+0x0/0x13
[ 60.992174] [] arch_cpu_idle+0xf/0x20
[ 60.992174] [] cpu_startup_entry+0x2f6/0x3f0
[ 60.992174] [] start_secondary+0x13a/0x150
[ 60.992174]
-> #0 (&(&n_ptr->lock)->rlock){+.-...}:
[ 60.992174] [] __lock_acquire+0x163d/0x1ca0
[ 60.992174] [] lock_acquire+0x9c/0x140
[ 60.992174] [] _raw_spin_lock_bh+0x3f/0x50
[ 60.992174] [] tipc_link_retransmit+0x1aa/0x240 [tipc]
[ 60.992174] [] tipc_bclink_rcv+0x611/0x640 [tipc]
[ 60.992174] [] tipc_rcv+0x616/0xac0 [tipc]
[ 60.992174] [] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
[ 60.992174] [] __netif_receive_skb_core+0x746/0x980
[ 60.992174] [] __netif_receive_skb+0x21/0x70
[ 60.992174] [] netif_receive_skb_internal+0x35/0x130
[ 60.992174] [] napi_gro_receive+0x158/0x1d0
[ 60.992174] [] e1000_clean_rx_irq+0x155/0x490
[ 60.992174] [] e1000_clean+0x267/0x990
[ 60.992174] [] net_rx_action+0x150/0x360
[ 60.992174] [] __do_softirq+0x123/0x360
[ 60.992174] [] irq_exit+0x8e/0xb0
[ 60.992174] [] do_IRQ+0x65/0x110
[ 60.992174] [] ret_from_intr+0x0/0x13
[ 60.992174] [] arch_cpu_idle+0xf/0x20
[ 60.992174] [] cpu_startup_entry+0x2f6/0x3f0
[ 60.992174] [] start_secondary+0x13a/0x150
[ 60.992174]
[ 60.992174] other info that might help us debug this:
[ 60.992174]
[ 60.992174] Possible unsafe locking scenario:
[ 60.992174]
[ 60.992174] CPU0 CPU1
[ 60.992174] ---- ----
[ 60.992174] lock(&(&bclink->lock)->rlock);
[ 60.992174] lock(&(&n_ptr->lock)->rlock);
[ 60.992174] lock(&(&bclink->lock)->rlock);
[ 60.992174] lock(&(&n_ptr->lock)->rlock);
[ 60.992174]
[ 60.992174] *** DEADLOCK ***
[ 60.992174]
[ 60.992174] 3 locks held by swapper/3/0:
[ 60.992174] #0: (rcu_read_lock){......}, at: [] __netif_receive_skb_core+0x71/0x980
[ 60.992174] #1: (rcu_read_lock){......}, at: [] tipc_l2_rcv_msg+0x5/0xd0 [tipc]
[ 60.992174] #2: (&(&bclink->lock)->rlock){+.-...}, at: [] tipc_bclink_lock+0x8e/0xa0 [tipc]
[ 60.992174]

The correct the sequence of grabbing n_ptr->lock and bclink->lock
should be that the former is first held and the latter is then taken,
which exactly happened on CPU1. But especially when the retransmission
of broadcast link is failed, bclink->lock is first held in
tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure()
called by tipc_link_retransmit() subsequently, which is demonstrated on
CPU0. As a result, deadlock occurs.

If the order of holding the two locks happening on CPU0 is reversed, the
deadlock risk will be relieved. Therefore, the node lock taken in
link_retransmit_failure() originally is moved to tipc_bclink_rcv()
so that it's obtained before bclink lock. But the precondition of
the adjustment of node lock is that responding to bclink reset event
must be moved from tipc_bclink_unlock() to tipc_node_unlock().

Reviewed-by: Erik Hugne
Signed-off-by: Ying Xue
Signed-off-by: David S. Miller

Ying Xue
2015-03-30 03:40:27 +0800

26 Mar, 2015

2 commits

8b4ed8634 tipc: eliminate race condition at dual link establishment ... Browse Code »

Despite recent improvements, the establishment of dual parallel
links still has a small glitch where messages can bypass each
other. When the second link in a dual-link configuration is
established, part of the first link's traffic will be steered over
to the new link. Although we do have a mechanism to ensure that
packets sent before and after the establishment of the new link
arrive in sequence to the destination node, this is not enough.
The arriving messages will still be delivered upwards in different
threads, something entailing a risk of message disordering during
the transition phase.

To fix this, we introduce a synchronization mechanism between the
two parallel links, so that traffic arriving on the new link cannot
be added to its input queue until we are guaranteed that all
pre-establishment messages have been delivered on the old, parallel
link.

This problem seems to always have been around, but its occurrence is
so rare that it has not been noticed until recent intensive testing.

Reviewed-by: Ying Xue
Reviewed-by: Erik Hugne
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-03-26 02:05:56 +0800
3127a0200 tipc: clean up handling of link congestion ... Browse Code »

After the recent changes in message importance handling it becomes
possible to simplify handling of messages and sockets when we
encounter link congestion.

We merge the function tipc_link_cong() into link_schedule_user(),
and simplify the code of the latter. The code should now be
easier to follow, especially regarding return codes and handling
of the message that caused the situation.

In case the scheduling function is unable to pre-allocate a wakeup
message buffer, it now returns -ENOBUFS, which is a more correct
code than the previously used -EHOSTUNREACH.

Reviewed-by: Ying Xue
Reviewed-by: Erik Hugne
Signed-off-by: Jon Maloy
Signed-off-by: David S. Miller

Jon Paul Maloy
2015-03-26 02:05:56 +0800