Doug / smarc-fsl-linux-kernel | Embedian Git Server

08 Dec, 2012

8 commits

0fef8f205 tipc: refactor accept() code for improved readability ... Browse Code »

In TIPC's accept() routine, there is a large block of code relating
to initialization of a new socket, all within an if condition checking
if the allocation succeeded.

Here, we simply flip the check of the if, so that the main execution
path stays at the same indentation level, which improves readability.
If the allocation fails, we jump to an already existing exit label.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2012-12-08 06:23:24 +0800
258f8667a tipc: add lock nesting notation to quiet lockdep warning ... Browse Code »

TIPC accept() call grabs the socket lock on a newly allocated
socket while holding the socket lock on an old socket. But lockdep
worries that this might be a recursive lock attempt:

[ INFO: possible recursive locking detected ]
---------------------------------------------
kworker/u:0/6 is trying to acquire lock:
(sk_lock-AF_TIPC){+.+.+.}, at: [] accept+0x15c/0x310 [tipc]

but task is already holding lock:
(sk_lock-AF_TIPC){+.+.+.}, at: [] accept+0x28/0x310 [tipc]

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(sk_lock-AF_TIPC);
lock(sk_lock-AF_TIPC);

*** DEADLOCK ***

May be due to missing lock nesting notation
[...]

Tell lockdep that this locking is safe by using lock_sock_nested().
This is similar to what was done in commit 5131a184a3458d9 for
SCTP code ("SCTP: lock_sock_nested in sctp_sock_migrate").

Also note that this is isn't something that is seen normally,
as it was uncovered with some experimental work-in-progress
code not yet ready for mainline. So no need for stable
backports or similar of this commit.

Signed-off-by: Ying Xue
Signed-off-by: Paul Gortmaker

Ying Xue
2012-12-08 06:23:23 +0800
cbab36879 tipc: eliminate connection setup for implied connect in recv_msg() ... Browse Code »

As connection setup is now completed asynchronously in BH context,
in the function filter_connect(), the corresponding code in recv_msg()
becomes redundant.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker

Ying Xue
2012-12-08 06:23:22 +0800
584d24b39 tipc: introduce non-blocking socket connect ... Browse Code »

TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.

With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.

The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.

It is also now possible to call select() or poll() to wait for the
completion of a connection.

An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker

Ying Xue
2012-12-08 06:23:21 +0800
7e6c131e1 tipc: consolidate connection-oriented message reception in one function ... Browse Code »

Handling of connection-related message reception is currently scattered
around at different places in the code. This makes it harder to verify
that things are handled correctly in all possible scenarios.
So we consolidate the existing processing of connection-oriented
message reception in a single routine. In the process, we convert the
chain of if/else into a switch/case for improved readability.

A cast on the socket_state in the switch is needed to avoid compile
warnings on 32 bit, like "net/tipc/socket.c:1252:2: warning: case value
‘4294967295’ not in enumerated type". This happens because existing
tipc code pseudo extends the default linux socket state values with:

#define SS_LISTENING -1 /* socket is listening */
#define SS_READY -2 /* socket is connectionless */

It may make sense to add these as _positive_ values to the existing
socket state enum list someday, vs. these already existing defines.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
[PG: add cast to fix warning; remove returns from middle of switch]
Signed-off-by: Paul Gortmaker

Ying Xue
2012-12-08 06:23:20 +0800
bc879117d tipc: standardize across connect/disconnect function naming ... Browse Code »

Currently we have tipc_disconnect and tipc_disconnect_port. It is
not clear from the names alone, what they do or how they differ.
It turns out that tipc_disconnect just deals with the port locking
and then calls tipc_disconnect_port which does all the work.

If we rename as follows: tipc_disconnect_port --> __tipc_disconnect
then we will be following typical linux convention, where:

__tipc_disconnect: "raw" function that does all the work.

tipc_disconnect: wrapper that deals with locking and then calls
the real core __tipc_disconnect function

With this, the difference is immediately evident, and locking
violations are more apt to be spotted by chance while working on,
or even just while reading the code.

On the connect side of things, we currently only have the single
"tipc_connect2port" function. It does both the locking at enter/exit,
and the core of the work. Pending changes will make it desireable to
have the connect be a two part locking wrapper + worker function,
just like the disconnect is already.

Here, we make the connect look just like the updated disconnect case,
for the above reason, and for consistency. In the process, we also
get rid of the "2port" suffix that was on the original name, since
it adds no descriptive value.

On close examination, one might notice that the above connect
changes implicitly move the call to tipc_link_get_max_pkt() to be
within the scope of tipc_port_lock() protected region; when it was
not previously. We don't see any issues with this, and it is in
keeping with __tipc_connect doing the work and tipc_connect just
handling the locking.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2012-12-08 06:23:19 +0800
e643df156 tipc: change sk_receive_queue upper limit ... Browse Code »

The sk_recv_queue upper limit for connectionless sockets has empirically
turned out to be too low. When we double the current limit we get much
fewer rejected messages and no noticable negative side-effects.

Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker

Jon Maloy
2012-12-08 06:23:18 +0800
9da3d4758 tipc: eliminate aggregate sk_receive_queue limit ... Browse Code »

As a complement to the per-socket sk_recv_queue limit, TIPC keeps a
global atomic counter for the sum of sk_recv_queue sizes across all
tipc sockets. When incremented, the counter is compared to an upper
threshold value, and if this is reached, the message is rejected
with error code TIPC_OVERLOAD.

This check was originally meant to protect the node against
buffer exhaustion and general CPU overload. However, all experience
indicates that the feature not only is redundant on Linux, but even
harmful. Users run into the limit very often, causing disturbances
for their applications, while removing it seems to have no negative
effects at all. We have also seen that overall performance is
boosted significantly when this bottleneck is removed.

Furthermore, we don't see any other network protocols maintaining
such a mechanism, something strengthening our conviction that this
control can be eliminated.

As a result, the atomic variable tipc_queue_size is now unused
and so it can be deleted. There is a getsockopt call that used
to allow reading it; we retain that but just return zero for
maximum compatibility.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Cc: Neil Horman
[PG: phase out tipc_queue_size as pointed out by Neil Horman]
Signed-off-by: Paul Gortmaker

Ying Xue
2012-12-08 03:19:52 +0800

07 Dec, 2012

1 commit

c00841385 tipc: remove obsolete flush of stale reassembly buffer ... Browse Code »

Each link instance has a periodic job checking if there is a stale
ongoing message reassembly associated to the link. If no new
fragment has been received during the last 4*[link_tolerance] period,
it is assumed the missing fragment will never arrive. As a consequence,
the reassembly buffer is discarded, and a gap in the message sequence
occurs.

This assumption is wrong. After we abandoned our ambition to develop
packet routing for multi-cluster networks, only single-hop packet
transfer remains as an option. For those, all packets are guaranteed
to be delivered in sequence to the defragmentation layer. Any failure
to achieve sequenced delivery will eventually lead to link reset, and
the reassembly buffer will be flushed anyway.

So we just remove this periodic check, which is now obsolete.

Signed-off-by: Erik Hugne
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
[PG: also delete get/inc_timer count, since they are now unused]
Signed-off-by: Paul Gortmaker

Erik Hugne
2012-12-07 06:20:19 +0800

23 Nov, 2012

3 commits

94fc9c471 tipc: delete TIPC_ADVANCED Kconfig variable ... Browse Code »

There used to be a time when TIPC had lots of Kconfig knobs the
end user could alter, but they have all been made automatic or
obsolete, with the exception of CONFIG_TIPC_PORTS. This
previously existing set of options was all hidden under the
TIPC_ADVANCED setting, which does not exist in any code, but
only in Kconfig scope.

Having this now, just to hide the one remaining "advanced"
option no longer makes sense. Remove it. Also get rid of the
ifdeffery in the TIPC code that allowed for TIPC_PORTS to be
possibly undefined.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2012-11-23 03:33:29 +0800
4cb7d55ab tipc: eliminate an unnecessary cast of node variable ... Browse Code »

As the variable:node is currently defined to u32 type, it is
unnecessary to cast its type to u32 again when using it.

Signed-off-by: Ying Xue
Signed-off-by: Paul Gortmaker

Ying Xue
2012-11-23 03:33:28 +0800
c64f7a6a1 tipc: introduce message to synchronize broadcast link ... Browse Code »

Upon establishing a first link between two nodes, there is
currently a risk that the two endpoints will disagree on exactly
which sequence number reception and acknowleding of broadcast
packets should start.

The following scenarios may happen:

1: Node A sends an ACTIVATE message to B, telling it to start acking
packets from sequence number N.
2: Node A sends out broadcast N, but does not expect an acknowledge
from B, since B is not yet in its broadcast receiver's list.
3: Node A receives ACK for N from all nodes except B, and releases
packet N.
4: Node B receives the ACTIVATE, activates its link endpoint, and
stores the value N as sequence number of first expected packet.
5: Node B sends a NAME_DISTR message to A.
6: Node A receives the NAME_DISTR message, and activates its endpoint.
At this moment B is added to A's broadcast receiver's set.
Node A also sets sequence number 0 as the first broadcast packet
to be received from B.
7: Node A sends broadcast N+1.
8: B receives N+1, determines there is a gap in the sequence, since
it is expecting N, and sends a NACK for N back to A.
9: Node A has already released N, so no retransmission is possible.
The broadcast link in direction A->B is stale.

In addition to, or instead of, 7-9 above, the following may happen:

10: Node B sends broadcast M > 0 to A.
11: Node A receives M, falsely decides there must be a gap, since
it is expecting packet 0, and asks for retransmission of packets
[0,M-1].
12: Node B has already released these packets, so the broadcast
link is stale in direction B->A.

We solve this problem by introducing a new unicast message type,
BCAST_PROTOCOL/STATE, to convey the sequence number of the next
sent broadcast packet to the other endpoint, at exactly the moment
that endpoint is added to the own node's broadcast receivers list,
and before any other unicast messages are permitted to be sent.

Furthermore, we don't allow any node to start receiving and
processing broadcast packets until this new synchronization
message has been received.

To maintain backwards compatibility, we still open up for
broadcast reception if we receive a NAME_DISTR message without
any preceding broadcast sync message. In this case, we must
assume that the other end has an older code version, and will
never send out the new synchronization message. Hence, for mixed
old and new nodes, the issue arising in 7-12 of the above may
happen with the same probability as before.

Signed-off-by: Jon Maloy
Signed-off-by: Ying Xue
Signed-off-by: Paul Gortmaker

Jon Maloy
2012-11-23 03:33:21 +0800

22 Nov, 2012

6 commits

389dd9bcf tipc: rename supported flag to recv_permitted ... Browse Code »

Rename the "supported" flag in bclink structure to "recv_permitted"
to better reflect what it is used for. When this flag is set for a
given node, we are permitted to receive and acknowledge broadcast
messages from that node. Convert it to a bool at the same time,
since it is not used to store any numerical values.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker

Ying Xue
2012-11-22 20:50:51 +0800
818f4da52 tipc: remove supportable flag from bclink structure ... Browse Code »

The "supportable" flag in bclink structure is a compatibility flag
indicating whether a peer node is capable of receiving TIPC broadcast
messages. However, all TIPC versions since tipc-1.5, and after the
inclusion in the upstream Linux kernel in 2006, support this capability.
It is highly unlikely that anybody is still using such an old
version of TIPC, let alone that they want to mix it with TIPC-2.0
nodes. Therefore, we now remove the "supportable" flag.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker

Ying Xue
2012-11-22 20:50:50 +0800
3c294cb37 tipc: remove the bearer congestion mechanism ... Browse Code »

Currently at the TIPC bearer layer there is the following congestion
mechanism:

Once sending packets has failed via that bearer, the bearer will be
flagged as being in congested state at once. During bearer congestion,
all packets arriving at link will be queued on the link's outgoing
buffer. When we detect that the state of bearer congestion has
relaxed (e.g. some packets are received from the bearer) we will try
our best to push all packets in the link's outgoing buffer until the
buffer is empty, or until the bearer is congested again.

However, in fact the TIPC bearer never receives any feedback from the
device layer whether a send was successful or not, so it must always
assume it was successful. Therefore, the bearer congestion mechanism
as it exists currently is of no value.

But the bearer blocking state is still useful for us. For example,
when the physical media goes down/up, we need to change the state of
the links bound to the bearer. So the code maintaing the state
information is not removed.

Signed-off-by: Ying Xue
Signed-off-by: Paul Gortmaker

Ying Xue
2012-11-22 09:07:25 +0800
750311510 tipc: wake up all waiting threads at socket shutdown ... Browse Code »

When a socket is shut down, we should wake up all thread sleeping on
it, instead of just one of them. Otherwise, when several threads are
polling the same socket, and one of them does shutdown(), the
remaining threads may end up sleeping forever.

Also, to align socket usage with common practice in other stacks, we
use one of the common socket callback handlers, sk_state_change(),
to wake up pending users. This is similar to the usage in e.g.
inet_shutdown(). [net/ipv4/af_inet.c].

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker

Ying Xue
2012-11-22 09:06:29 +0800
c4fc298ab tipc: return POLLOUT for sockets in an unconnected state ... Browse Code »

If an implied connect is attempted on a nonblocking STREAM/SEQPACKET
socket during link congestion, the connect message will be discarded
and sendmsg will return EAGAIN. This is normal behavior, and the
application is expected to poll the socket until POLLOUT is set,
after which the connection attempt can be retried.
However, the POLLOUT flag is never set for unconnected sockets and
poll() always returns a zero mask. The application is then left without
a trigger for when it can make another attempt at sending the message.

The solution is to check if we're polling on an unconnected socket
and set the POLLOUT flag if the TIPC port owned by this socket
is not congested. The TIPC ports waiting on a specific link will be
marked as 'not congested' when the link congestion have abated.

Signed-off-by: Erik Hugne
Signed-off-by: Paul Gortmaker

Erik Hugne
2012-11-22 03:54:32 +0800
f288bef46 tipc: fix race/inefficiencies in poll/wait behaviour ... Browse Code »

When an application blocks at poll/select on a TIPC socket
while requesting a specific event mask, both the filter_rcv() and
wakeupdispatch() case will wake it up unconditionally whenever
the state changes (i.e an incoming message arrives, or congestion
has subsided). No mask is used.

To avoid this, we populate sk->sk_data_ready and sk->sk_write_space
with tipc_data_ready and tipc_write_space respectively, which makes
tipc more in alignment with the rest of the networking code. These
pass the exact set of possible events to the waker in fs/select.c
hence avoiding waking up blocked processes unnecessarily.

In doing so, we uncover another issue -- that there needs to be a
memory barrier in these poll/receive callbacks, otherwise we are
subject to the the same race as documented above wq_has_sleeper()
[in commit a57de0b4 "net: adding memory barrier to the poll and
receive callbacks"]. So we need to replace poll_wait() with
sock_poll_wait() and use rcu protection for the sk->sk_wq pointer
in these two new functions.

Signed-off-by: Ying Xue
Signed-off-by: Paul Gortmaker

Ying Xue
2012-11-22 03:54:31 +0800

04 Nov, 2012

1 commit

d145f7ec2 tipc: do not use tasklet_disable before tasklet_kill ... Browse Code »

If tasklet_disable() is called before related tasklet handled,
tasklet_kill will never be finished. tasklet_kill is enough.

Signed-off-by: Xiaotian Feng
Cc: Jon Maloy
Cc: Allan Stephens
Cc: "David S. Miller"
Cc: netdev@vger.kernel.org
Cc: tipc-discussion@lists.sourceforge.net
Signed-off-by: David S. Miller

Xiaotian Feng
2012-11-04 03:10:14 +0800

05 Oct, 2012

1 commit

e57edf6b6 tipc: prevent dropped connections due to rcvbuf overflow ... Browse Code »

When large buffers are sent over connected TIPC sockets, it
is likely that the sk_backlog will be filled up on the
receiver side, but the TIPC flow control mechanism is happily
unaware of this since that is based on message count.

The sender will receive a TIPC_ERR_OVERLOAD message when this occurs
and drop it's side of the connection, leaving it stale on
the receiver end.

By increasing the sk_rcvbuf to a 'worst case' value, we avoid the
overload caused by a full backlog queue and the flow control
will work properly.

This worst case value is the max TIPC message size times
the flow control window, multiplied by two because a sender
will transmit up to double the window size before a port is marked
congested.
We multiply this by 2 to account for the sk_buff and other overheads.

Signed-off-by: Erik Hugne
Signed-off-by: David S. Miller

Erik Hugne
2012-10-05 03:53:48 +0800

19 Sep, 2012

1 commit

adccff34d net/tipc/name_table.c: Remove unecessary semicolon ... Browse Code »

Found by http://coccinelle.lip6.fr/

Signed-off-by: Peter Senna Tschudin
Signed-off-by: David S. Miller

Peter Senna Tschudin
2012-09-19 04:08:19 +0800

11 Sep, 2012

1 commit

15e473046 netlink: Rename pid to portid to avoid confusion ... Browse Code »

It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.

I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.

I have successfully built an allyesconfig kernel with this change.

Signed-off-by: "Eric W. Biederman"
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller

Eric W. Biederman
2012-09-11 03:30:41 +0800

20 Aug, 2012

9 commits

e6a04b1d3 tipc: eliminate configuration for maximum number of name publications ... Browse Code »

Gets rid of the need for users to specify the maximum number of
name publications supported by TIPC. TIPC now automatically provides
support for the maximum number of name publications to 65535.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker
Signed-off-by: David S. Miller

Ying Xue
2012-08-20 17:26:31 +0800
34f256cc7 tipc: eliminate configuration for maximum number of name subscriptions ... Browse Code »

Gets rid of the need for users to specify the maximum number of
name subscriptions supported by TIPC. TIPC now automatically provides
support for the maximum number of name subscriptions to 65535.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker
Signed-off-by: David S. Miller

Ying Xue
2012-08-20 17:26:31 +0800
61cdd4d80 tipc: add __read_mostly annotations to several global variables ... Browse Code »

Added to the following:

- tipc_random
- tipc_own_addr
- tipc_max_ports
- tipc_net_id
- tipc_remote_management
- handler_enabled

The above global variables are read often, but written rarely. Use
__read_mostly to prevent them being on the same cacheline as another
variable which is written to often, which would cause cacheline
bouncing.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker
Signed-off-by: David S. Miller

Ying Xue
2012-08-20 17:26:31 +0800
f046e7d9b tipc: convert tipc_nametbl_size type from variable to macro ... Browse Code »

There is nothing changing this variable dynamically, so change
it to a macro to make that more obvious when reading the code.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker
Signed-off-by: David S. Miller

Ying Xue
2012-08-20 17:26:31 +0800
379c0456a tipc: change tipc_net_start routine return value type ... Browse Code »

Since now tipc_net_start() always returns a success code - 0, its
return value type should be changed from integer to void, which can
avoid unnecessary check for its return value.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker
Signed-off-by: David S. Miller

Ying Xue
2012-08-20 17:26:30 +0800
381294331 tipc: manually inline single use media_name_valid routine ... Browse Code »

After eliminating the mechanism which checks whether all letters
in media name string are within a given character set, the
media_name_valid routine becomes trivial. It is also only
used once, so it is unnecessary to keep it as a separate function.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker
Signed-off-by: David S. Miller

Ying Xue
2012-08-20 17:26:30 +0800
fc0739385 tipc: remove pointless name sanity check and tipc_alphabet array ... Browse Code »

There is no real reason to check whether all letters in the given
media name and network interface name are within the character set
defined in tipc_alphabet array. Even if we eliminate the checking,
the rest of checking conditions in tipc_enable_bearer() can ensure
we do not enable an invalid or illegal bearer.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker
Signed-off-by: David S. Miller

Ying Xue
2012-08-20 17:26:30 +0800
4225a398c tipc: fix lockdep warning during bearer initialization ... Browse Code »

When the lockdep validator is enabled, it will report the below
warning when we enable a TIPC bearer:

[ INFO: possible irq lock inversion dependency detected ]
---------------------------------------------------------
Possible interrupt unsafe locking scenario:

CPU0 CPU1
---- ----
lock(ptype_lock);
local_irq_disable();
lock(tipc_net_lock);
lock(ptype_lock);

lock(tipc_net_lock);

*** DEADLOCK ***

the shortest dependencies between 2nd lock and 1st lock:
-> (ptype_lock){+.+...} ops: 10 {
[...]
SOFTIRQ-ON-W at:
[] __lock_acquire+0x528/0x13e0
[] lock_acquire+0x90/0x100
[] _raw_spin_lock+0x38/0x50
[] dev_add_pack+0x3a/0x60
[] arp_init+0x1a/0x48
[] inet_init+0x181/0x27e
[] do_one_initcall+0x34/0x170
[] kernel_init+0x110/0x1b2
[] kernel_thread_helper+0x6/0x10
[...]
... key at: [] ptype_lock+0x10/0x20
... acquired at:
[] lock_acquire+0x90/0x100
[] _raw_spin_lock+0x38/0x50
[] dev_add_pack+0x3a/0x60
[] enable_bearer+0xf2/0x140 [tipc]
[] tipc_enable_bearer+0x1ba/0x450 [tipc]
[] tipc_cfg_do_cmd+0x5c4/0x830 [tipc]
[] handle_cmd+0x42/0xd0 [tipc]
[] genl_rcv_msg+0x232/0x280
[] netlink_rcv_skb+0x86/0xb0
[] genl_rcv+0x1c/0x30
[] netlink_unicast+0x174/0x1f0
[] netlink_sendmsg+0x1eb/0x2d0
[] sock_aio_write+0x161/0x170
[] do_sync_write+0xac/0xf0
[] vfs_write+0x156/0x170
[] sys_write+0x42/0x70
[] sysenter_do_call+0x12/0x38
[...]
}
-> (tipc_net_lock){+..-..} ops: 4 {
[...]
IN-SOFTIRQ-R at:
[] __lock_acquire+0x64a/0x13e0
[] lock_acquire+0x90/0x100
[] _raw_read_lock_bh+0x3d/0x50
[] tipc_recv_msg+0x1d/0x830 [tipc]
[] recv_msg+0x3f/0x50 [tipc]
[] __netif_receive_skb+0x22a/0x590
[] netif_receive_skb+0x2b/0xf0
[] pcnet32_poll+0x292/0x780
[] net_rx_action+0xfa/0x1e0
[] __do_softirq+0xae/0x1e0
[...]
}

>From the log, we can see three different call chains between
CPU0 and CPU1:

Time 0 on CPU0:

kernel_init()->inet_init()->dev_add_pack()

At time 0, the ptype_lock is held by CPU0 in dev_add_pack();

Time 1 on CPU1:

tipc_enable_bearer()->enable_bearer()->dev_add_pack()

At time 1, tipc_enable_bearer() first holds tipc_net_lock, and then
wants to take ptype_lock to register TIPC protocol handler into the
networking stack. But the ptype_lock has been taken by dev_add_pack()
on CPU0, so at this time the dev_add_pack() running on CPU1 has to be
busy looping.

Time 2 on CPU0:

netif_receive_skb()->recv_msg()->tipc_recv_msg()

At time 2, an incoming TIPC packet arrives at CPU0, hence
tipc_recv_msg() will be invoked. In tipc_recv_msg(), it first wants
to hold tipc_net_lock. At the moment, below scenario happens:

On CPU0, below is our sequence of taking locks:

lock(ptype_lock)->lock(tipc_net_lock)

On CPU1, our sequence of taking locks looks like:

lock(tipc_net_lock)->lock(ptype_lock)

Obviously deadlock may happen in this case.

But please note the deadlock possibly doesn't occur at all when the
first TIPC bearer is enabled. Before enable_bearer() -- running on
CPU1 does not hold ptype_lock, so the TIPC receive handler (i.e.
recv_msg()) is not registered successfully via dev_add_pack(), so
the tipc_recv_msg() cannot be called by recv_msg() even if a TIPC
message comes to CPU0. But when the second TIPC bearer is
registered, the deadlock can perhaps really happen.

To fix it, we will push the work of registering TIPC protocol
handler into workqueue context. After the change, both paths taking
ptype_lock are always in process contexts, thus, the deadlock should
never occur.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker
Signed-off-by: David S. Miller

Ying Xue
2012-08-20 17:26:30 +0800
fa7f86f1b tipc: optimize the initialization of network device notifier ... Browse Code »

Ethernet media initialization is only done when TIPC is started or
switched to network mode. So the initialization of the network device
notifier structure can be moved out of this function and done
statically instead.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker
Signed-off-by: David S. Miller

Ying Xue
2012-08-20 17:26:30 +0800

14 Jul, 2012

6 commits

869dd4662 tipc: remove print_buf and deprecated log buffer code ... Browse Code »

The internal log buffer handling functions can now safely be
removed since there is no code using it anymore. Requests to
interact with the internal tipc log buffer over netlink (in
config.c) will report 'obsolete command'.

This represents the final removal of any references to a
struct print_buf, and the removal of the struct itself.
We also get rid of a TIPC specific Kconfig in the process.

Finally, log.h is removed since it is not needed anymore.

Signed-off-by: Erik Hugne
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker

Erik Hugne
2012-07-14 07:34:43 +0800
dc1aed37d tipc: phase out most of the struct print_buf usage ... Browse Code »

The tipc_printf is renamed to tipc_snprintf, as the new name
describes more what the function actually does. It is also
changed to take a buffer and length parameter and return
number of characters written to the buffer. All callers of
this function that used to pass a print_buf are updated.

Final removal of the struct print_buf itself will be done
synchronously with the pending removal of the deprecated
logging code that also was using it.

Functions that build up a response message with a list of
ports, nametable contents etc. are changed to return the number
of characters written to the output buffer. This information
was previously hidden in a field of the print_buf struct, and
the number of chars written was fetched with a call to
tipc_printbuf_validate. This function is removed since it
is no longer referenced nor needed.

A generic max size ULTRA_STRING_MAX_LEN is defined, named
in keeping with the existing TIPC_TLV_ULTRA_STRING, and the
various definitions in port, link and nametable code that
largely duplicated this information are removed. This means
that amount of link statistics that can be returned is now
increased from 2k to 32k.

The buffer overflow check is now done just before the reply
message is passed over netlink or TIPC to a remote node and
the message indicating a truncated buffer is changed to a less
dramatic one (less CAPS), placed at the end of the message.

Signed-off-by: Erik Hugne
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker

Erik Hugne
2012-07-14 07:33:28 +0800
e2dbd6013 tipc: simplify print buffer handling in tipc_printf ... Browse Code »

tipc_printf was previously used both to construct debug traces
and to append data to buffers that should be sent over netlink
to the tipc-config application. A global print_buffer was
used to format the string before it was copied to the actual
output buffer. This could lead to concurrent access of the
global print_buffer, which then had to be lock protected.
This is simplified by changing tipc_printf to append data
directly to the output buffer using vscnprintf.

With the new implementation of tipc_printf, there is no longer
any risk of concurrent access to the internal log buffer, so
the lock (and the comments describing it) are no longer
strictly necessary. However, there are still a few functions
that do grab this lock before resizing/dumping the log
buffer. We leave the lock, and these functions untouched since
they will be removed with a subsequent commit that drops the
deprecated log buffer handling code

Signed-off-by: Erik Hugne
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker

Erik Hugne
2012-07-14 07:28:28 +0800
5deedde9f tipc: simplify link_print by divorcing it from using tipc_printf ... Browse Code »

To pave the way for a pending cleanup of tipc_printf, and
removal of struct print_buf entirely, we make that task simpler
by converting link_print to issue its messages with standard
printk infrastructure. [Original idea separated from a larger
patch from Erik Hugne ]

Cc: Erik Hugne
Signed-off-by: Paul Gortmaker

Paul Gortmaker
2012-07-14 07:27:57 +0800
568fc588f tipc: remove TIPC packet debugging functions and macros ... Browse Code »

The link queue traces and packet level debug functions served
a purpose during early development, but are now redundant
since there are other, more capable tools available for
debugging at the packet level.

The TIPC_DEBUG Kconfig option is removed since it does not
provide any extra debugging features anymore.

This gets rid of a lot of tipc_printf usages, which will
make the pending cleanup work of that function easier.

Signed-off-by: Erik Hugne
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker

Erik Hugne
2012-07-14 07:25:16 +0800
2cf8aa19f tipc: use standard printk shortcut macros (pr_err etc.) ... Browse Code »

All messages should go directly to the kernel log. The TIPC
specific error, warning, info and debug trace macro's are
removed and all references replaced with pr_err, pr_warn,
pr_info and pr_debug.

Commonly used sub-strings are explicitly declared as a const
char to reduce .text size.

Note that this means the debug messages (changed to pr_debug),
are now enabled through dynamic debugging, instead of a TIPC
specific Kconfig option (TIPC_DEBUG). The latter will be
phased out completely

Signed-off-by: Erik Hugne
Signed-off-by: Jon Maloy
[PG: use pr_fmt as suggested by Joe Perches ]
Signed-off-by: Paul Gortmaker

Erik Hugne
2012-07-14 07:24:44 +0800

12 Jul, 2012

2 commits

f705ab956 tipc: limit error messages relating to memory leak to one line ... Browse Code »

With the default name table size of 1024, it is possible that
the sanity check in tipc_nametbl_stop could spam out 1024
essentially identical error messages if memory was corrupted
or similar. Limit it to issuing no more than a single message.

The actual chain number (i.e. 0 --> 1023) wouldn't provide any
useful insight if/when such an instance happened, so don't
bother printing out that value.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2012-07-12 22:13:12 +0800
7e9cab58e tipc: factor stats struct out of the larger link struct ... Browse Code »

This is done to improve readability, and so that we can give
the struct a name that will allow us to declare a local
pointer to it in code, instead of having to always redirect
through the link struct to get to it.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2012-07-12 21:53:57 +0800

11 Jul, 2012

1 commit

ae86b9e38 net: Fix non-kernel-doc comments with kernel-doc start marker ... Browse Code »

Signed-off-by: Ben Hutchings
Signed-off-by: David S. Miller

Ben Hutchings
2012-07-11 14:13:45 +0800