Doug / smarc-fsl-linux-kernel | Embedian Git Server

07 May, 2013

2 commits

6bf15191f tipc: potential divide by zero in tipc_link_recv_fragment() ... Browse Code »

The worry here is that fragm_sz could be zero since it comes from
skb->data.

Signed-off-by: Dan Carpenter
Signed-off-by: David S. Miller

Dan Carpenter
2013-05-07 04:16:52 +0800
cb4b102f0 tipc: add a bounds check in link_recv_changeover_msg() ... Browse Code »

The bearer_id here comes from skb->data and it can be a number from 0 to
7. The problem is that the ->links[] array has only 2 elements so I
have added a range check.

Signed-off-by: Dan Carpenter
Signed-off-by: David S. Miller

Dan Carpenter
2013-05-07 04:16:52 +0800

04 May, 2013

3 commits

488fc9af8 tipc: pskb_copy() buffers when sending on more than one bearer ... Browse Code »

When sending packets, TIPC bearers use skb_clone() before writing their
hardware header. This will however NOT copy the data buffer.
So when the same packet is sent over multiple bearers (to reach multiple
nodes), the same socket buffer data will be treated by multiple
tipc_media drivers which will write their own hardware header through
dev_hard_header().
Most of the time this is not a problem, because by the time the
packet is processed by the second media, it has already been sent over
the first one. However, when the first transmission is delayed (e.g.
because of insufficient bandwidth or through a shaper), the next bearer
will overwrite the hardware header, resulting in the packet being sent:
a) with the wrong source address, when bearers of the same type,
e.g. ethernet, are involved
b) with a completely corrupt header, or even dropped, when bearers of
different types are involved.

So when the same socket buffer is to be sent multiple times, send a
pskb_copy() instead (from the second instance on), and release it
afterwards (the bearer will skb_clone() it anyway).

Signed-off-by: Gerlando Falauto
Signed-off-by: David S. Miller

Gerlando Falauto
2013-05-04 04:08:58 +0800
77861d9c0 tipc: tipc_bcbearer_send(): simplify bearer selection ... Browse Code »

Signed-off-by: Gerlando Falauto
Signed-off-by: David S. Miller

Gerlando Falauto
2013-05-04 04:08:58 +0800
e61607109 tipc: cosmetic: clean up comments and break a long line ... Browse Code »

Signed-off-by: Gerlando Falauto
Signed-off-by: David S. Miller

Gerlando Falauto
2013-05-04 04:08:58 +0800

18 Apr, 2013

4 commits

a29a194a1 tipc: add InfiniBand media type ... Browse Code »

Add InfiniBand media type based on the ethernet media type.

The only real difference is that in case of InfiniBand, we need the entire
20 bytes of space reserved for media addresses, so the TIPC media type ID is
not explicitly stored in the packet payload.

Sample output of tipc-config:

# tipc-config -v -addr -netid -nt=all -p -m -b -n -ls

node address:
current network id: 4711
Type Lower Upper Port Identity Publication Scope
0 167776257 167776257 1855512578 cluster
167776260 167776260 1216454658 zone
1 1 1 1216479236 node
Ports:
1216479235: bound to {1,1}
1216454657: bound to {0,167776260}
Media:
eth
ib
Bearers:
ib:ib0
Nodes known:
: up
Link
Window:20 packets
RX packets:0 fragments:0/0 bundles:0/0
TX packets:0 fragments:0/0 bundles:0/0
RX naks:0 defs:0 dups:0
TX naks:0 acks:0 dups:0
Congestion bearer:0 link:0 Send queue max:0 avg:0

Link
ACTIVE MTU:2044 Priority:10 Tolerance:1500 ms Window:50 packets
RX packets:80 fragments:0/0 bundles:0/0
TX packets:40 fragments:0/0 bundles:0/0
TX profile sample:22 packets average:54 octets
0-64:100% -256:0% -1024:0% -4096:0% -16384:0% -32768:0% -66000:0%
RX states:410 probes:213 naks:0 defs:0 dups:0
TX states:410 probes:197 naks:0 acks:0 dups:0
Congestion bearer:0 link:0 Send queue max:1 avg:0

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2013-04-18 02:18:33 +0800
76f5c6f35 tipc: set skb->protocol in eth_media packet transmission ... Browse Code »

The skb->protocol field is used by packet classifiers and for AF_PACKET
cooked format, TIPC needs to set it properly.

Fixes packet classification and ethertype of 0x0000 in cooked captures:

Out 20:c9:d0:43:12:d9 ethertype Unknown (0x0000), length 56:
0x0000: 5b50 0028 0000 30d4 0100 1000 0100 1001 [P.(..0.........
0x0010: 0000 03e8 0000 0001 20c9 d043 12d9 0000 ...........C....
0x0020: 0000 0000 0000 0000 ........

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2013-04-18 02:18:33 +0800
8aeb89f21 tipc: move bcast_addr from struct tipc_media to struct tipc_bearer ... Browse Code »

Some network protocols, like InfiniBand, don't have a fixed broadcast
address but one that depends on the configuration. Move the bcast_addr
to struct tipc_bearer and initialize it with the broadcast address of
the network device when the bearer is enabled.

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2013-04-18 02:18:33 +0800
ccc4ba2ea tipc: remove unused str2addr media callback ... Browse Code »

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2013-04-18 02:18:33 +0800

08 Apr, 2013

2 commits

d978a6361 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/nfc/microread/mei.c
net/netfilter/nfnetlink_queue_core.c

Pull in 'net' to get Eric Biederman's AF_UNIX fix, upon which
some cleanups are going to go on-top.

Signed-off-by: David S. Miller

David S. Miller
2013-04-08 06:37:01 +0800
60085c3d0 tipc: fix info leaks via msg_name in recv_msg/recv_stream ... Browse Code »

The code in set_orig_addr() does not initialize all of the members of
struct sockaddr_tipc when filling the sockaddr info -- namely the union
is only partly filled. This will make recv_msg() and recv_stream() --
the only users of this function -- leak kernel stack memory as the
msg_name member is a local variable in net/socket.c.

Additionally to that both recv_msg() and recv_stream() fail to update
the msg_namelen member to 0 while otherwise returning with 0, i.e.
"success". This is the case for, e.g., non-blocking sockets. This will
lead to a 128 byte kernel stack leak in net/socket.c.

Fix the first issue by initializing the memory of the union with
memset(0). Fix the second one by setting msg_namelen to 0 early as it
will be updated later if we're going to fill the msg_name member.

Cc: Jon Maloy
Cc: Allan Stephens
Signed-off-by: Mathias Krause
Signed-off-by: David S. Miller

Mathias Krause
2013-04-08 04:28:02 +0800

29 Mar, 2013

1 commit

573ce260b net-next: replace obsolete NLMSG_* with type safe nlmsg_* ... Browse Code »

Signed-off-by: Hong Zhiguo
Signed-off-by: David S. Miller

Hong zhi guo
2013-03-29 02:25:25 +0800

28 Feb, 2013

1 commit

b67bfe0d4 hlist: drop the node parameter from iterators ... Browse Code »

I'm not sure why, but the hlist for each entry iterators were conceived

list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin
Acked-by: Paul E. McKenney
Signed-off-by: Sasha Levin
Cc: Wu Fengguang
Cc: Marcelo Tosatti
Cc: Gleb Natapov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sasha Levin
2013-02-28 11:10:24 +0800

22 Feb, 2013

1 commit

06991c28f Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core patches from Greg Kroah-Hartman:
"Here is the big driver core merge for 3.9-rc1

There are two major series here, both of which touch lots of drivers
all over the kernel, and will cause you some merge conflicts:

- add a new function called devm_ioremap_resource() to properly be
able to check return values.

- remove CONFIG_EXPERIMENTAL

Other than those patches, there's not much here, some minor fixes and
updates"

Fix up trivial conflicts

* tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
base: memory: fix soft/hard_offline_page permissions
drivercore: Fix ordering between deferred_probe and exiting initcalls
backlight: fix class_find_device() arguments
TTY: mark tty_get_device call with the proper const values
driver-core: constify data for class_find_device()
firmware: Ignore abort check when no user-helper is used
firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
firmware: Make user-mode helper optional
firmware: Refactoring for splitting user-mode helper code
Driver core: treat unregistered bus_types as having no devices
watchdog: Convert to devm_ioremap_resource()
thermal: Convert to devm_ioremap_resource()
spi: Convert to devm_ioremap_resource()
power: Convert to devm_ioremap_resource()
mtd: Convert to devm_ioremap_resource()
mmc: Convert to devm_ioremap_resource()
mfd: Convert to devm_ioremap_resource()
media: Convert to devm_ioremap_resource()
iommu: Convert to devm_ioremap_resource()
drm: Convert to devm_ioremap_resource()
...

Linus Torvalds
2013-02-22 04:05:51 +0800

19 Feb, 2013

1 commit

6338a53a2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net into net ... Browse Code »

Pull in 'net' to take in the bug fixes that didn't make it into
3.8-final.

Also, deal with the semantic conflict of the change made to
net/ipv6/xfrm6_policy.c A missing rt6->n neighbour release
was added to 'net', but in 'net-next' we no longer cache the
neighbour entries in the ipv6 routes so that change is not
appropriate there.

Signed-off-by: David S. Miller

David S. Miller
2013-02-19 12:34:21 +0800

16 Feb, 2013

4 commits

97f8b87e9 tipc: remove redundant checking for the number of iovecs in a send request ... Browse Code »

As the number of iovecs in a send request is already limited within
UIO_MAXIOV(i.e. 1024) in __sys_sendmsg(), it's unnecessary to check it
again in TIPC stack.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker

Ying Xue
2013-02-16 06:03:32 +0800
aba79f332 tipc: byte-based overload control on socket receive queue ... Browse Code »

Change overload control to be purely byte-based, using
sk->sk_rmem_alloc as byte counter, and compare it to a calculated
upper limit for the socket receive queue.

For all connection messages, irrespective of message importance,
the overload limit is set to a constant value (i.e, 67MB). This
limit should normally never be reached because of the lower
limit used by the flow control algorithm, and is there only
as a last resort in case a faulty peer doesn't respect the send
window limit.

For datagram messages, message importance is taken into account
when calculating the overload limit. The calculation is based
on sk->sk_rcvbuf, and is hence configurable via the socket option
SO_RCVBUF.

Cc: Neil Horman
Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker

Ying Xue
2013-02-16 06:01:58 +0800
57467e562 tipc: eliminate duplicated discard_rx_queue routine ... Browse Code »

The tipc function discard_rx_queue() is just a duplicated
implementation of __skb_queue_purge(). Remove the former
and directly invoke __skb_queue_purge().

In doing so, the underscores convey to the code reader, more
information about the current locking state that is assumed.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker

Ying Xue
2013-02-16 05:10:13 +0800
c5c73dca5 tipc: fix missing spinlock init in broadcast code ... Browse Code »

After commit 3c294cb3 "tipc: remove the bearer congestion mechanism",
we try to grab the broadcast bearer lock when sending multicast
messages over the broadcast link. This will cause an oops because
the lock is never initialized. This is an old bug, but the lock
was never actually used before commit 3c294cb3, so that why it was
not visible until now. The oops will look something like:

BUG: spinlock bad magic on CPU#2, daemon/147
lock: bcast_bearer+0x48/0xffffffffffffd19a [tipc],
.magic: 00000000, .owner: /-1, .owner_cpu: 0
Pid: 147, comm: daemon Not tainted 3.8.0-rc3+ #206
Call Trace:
spin_dump+0x8a/0x8f
spin_bug+0x21/0x26
do_raw_spin_lock+0x114/0x150
_raw_spin_lock_bh+0x19/0x20
tipc_bearer_blocked+0x1f/0x40 [tipc]
tipc_link_send_buf+0x82/0x280 [tipc]
? __alloc_skb+0x9f/0x2b0
tipc_bclink_send_msg+0x77/0xa0 [tipc]
tipc_multicast+0x11b/0x1b0 [tipc]
send_msg+0x225/0x530 [tipc]
sock_sendmsg+0xca/0xe0

The above can be triggered by running the multicast demo program.

Signed-off-by: Erik Hugne
Signed-off-by: Paul Gortmaker
Signed-off-by: David S. Miller

Erik Hugne
2013-02-16 04:40:56 +0800

12 Jan, 2013

1 commit

f887cc48c net/tipc: remove depends on CONFIG_EXPERIMENTAL ... Browse Code »

The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: Jon Maloy
CC: Allan Stephens
CC: "David S. Miller"
Signed-off-by: Kees Cook
Acked-by: David S. Miller

Kees Cook
2013-01-12 03:40:02 +0800

08 Dec, 2012

8 commits

0fef8f205 tipc: refactor accept() code for improved readability ... Browse Code »

In TIPC's accept() routine, there is a large block of code relating
to initialization of a new socket, all within an if condition checking
if the allocation succeeded.

Here, we simply flip the check of the if, so that the main execution
path stays at the same indentation level, which improves readability.
If the allocation fails, we jump to an already existing exit label.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2012-12-08 06:23:24 +0800
258f8667a tipc: add lock nesting notation to quiet lockdep warning ... Browse Code »

TIPC accept() call grabs the socket lock on a newly allocated
socket while holding the socket lock on an old socket. But lockdep
worries that this might be a recursive lock attempt:

[ INFO: possible recursive locking detected ]
---------------------------------------------
kworker/u:0/6 is trying to acquire lock:
(sk_lock-AF_TIPC){+.+.+.}, at: [] accept+0x15c/0x310 [tipc]

but task is already holding lock:
(sk_lock-AF_TIPC){+.+.+.}, at: [] accept+0x28/0x310 [tipc]

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(sk_lock-AF_TIPC);
lock(sk_lock-AF_TIPC);

*** DEADLOCK ***

May be due to missing lock nesting notation
[...]

Tell lockdep that this locking is safe by using lock_sock_nested().
This is similar to what was done in commit 5131a184a3458d9 for
SCTP code ("SCTP: lock_sock_nested in sctp_sock_migrate").

Also note that this is isn't something that is seen normally,
as it was uncovered with some experimental work-in-progress
code not yet ready for mainline. So no need for stable
backports or similar of this commit.

Signed-off-by: Ying Xue
Signed-off-by: Paul Gortmaker

Ying Xue
2012-12-08 06:23:23 +0800
cbab36879 tipc: eliminate connection setup for implied connect in recv_msg() ... Browse Code »

As connection setup is now completed asynchronously in BH context,
in the function filter_connect(), the corresponding code in recv_msg()
becomes redundant.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker

Ying Xue
2012-12-08 06:23:22 +0800
584d24b39 tipc: introduce non-blocking socket connect ... Browse Code »

TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.

With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.

The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.

It is also now possible to call select() or poll() to wait for the
completion of a connection.

An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker

Ying Xue
2012-12-08 06:23:21 +0800
7e6c131e1 tipc: consolidate connection-oriented message reception in one function ... Browse Code »

Handling of connection-related message reception is currently scattered
around at different places in the code. This makes it harder to verify
that things are handled correctly in all possible scenarios.
So we consolidate the existing processing of connection-oriented
message reception in a single routine. In the process, we convert the
chain of if/else into a switch/case for improved readability.

A cast on the socket_state in the switch is needed to avoid compile
warnings on 32 bit, like "net/tipc/socket.c:1252:2: warning: case value
‘4294967295’ not in enumerated type". This happens because existing
tipc code pseudo extends the default linux socket state values with:

#define SS_LISTENING -1 /* socket is listening */
#define SS_READY -2 /* socket is connectionless */

It may make sense to add these as _positive_ values to the existing
socket state enum list someday, vs. these already existing defines.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
[PG: add cast to fix warning; remove returns from middle of switch]
Signed-off-by: Paul Gortmaker

Ying Xue
2012-12-08 06:23:20 +0800
bc879117d tipc: standardize across connect/disconnect function naming ... Browse Code »

Currently we have tipc_disconnect and tipc_disconnect_port. It is
not clear from the names alone, what they do or how they differ.
It turns out that tipc_disconnect just deals with the port locking
and then calls tipc_disconnect_port which does all the work.

If we rename as follows: tipc_disconnect_port --> __tipc_disconnect
then we will be following typical linux convention, where:

__tipc_disconnect: "raw" function that does all the work.

tipc_disconnect: wrapper that deals with locking and then calls
the real core __tipc_disconnect function

With this, the difference is immediately evident, and locking
violations are more apt to be spotted by chance while working on,
or even just while reading the code.

On the connect side of things, we currently only have the single
"tipc_connect2port" function. It does both the locking at enter/exit,
and the core of the work. Pending changes will make it desireable to
have the connect be a two part locking wrapper + worker function,
just like the disconnect is already.

Here, we make the connect look just like the updated disconnect case,
for the above reason, and for consistency. In the process, we also
get rid of the "2port" suffix that was on the original name, since
it adds no descriptive value.

On close examination, one might notice that the above connect
changes implicitly move the call to tipc_link_get_max_pkt() to be
within the scope of tipc_port_lock() protected region; when it was
not previously. We don't see any issues with this, and it is in
keeping with __tipc_connect doing the work and tipc_connect just
handling the locking.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2012-12-08 06:23:19 +0800
e643df156 tipc: change sk_receive_queue upper limit ... Browse Code »

The sk_recv_queue upper limit for connectionless sockets has empirically
turned out to be too low. When we double the current limit we get much
fewer rejected messages and no noticable negative side-effects.

Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker

Jon Maloy
2012-12-08 06:23:18 +0800
9da3d4758 tipc: eliminate aggregate sk_receive_queue limit ... Browse Code »

As a complement to the per-socket sk_recv_queue limit, TIPC keeps a
global atomic counter for the sum of sk_recv_queue sizes across all
tipc sockets. When incremented, the counter is compared to an upper
threshold value, and if this is reached, the message is rejected
with error code TIPC_OVERLOAD.

This check was originally meant to protect the node against
buffer exhaustion and general CPU overload. However, all experience
indicates that the feature not only is redundant on Linux, but even
harmful. Users run into the limit very often, causing disturbances
for their applications, while removing it seems to have no negative
effects at all. We have also seen that overall performance is
boosted significantly when this bottleneck is removed.

Furthermore, we don't see any other network protocols maintaining
such a mechanism, something strengthening our conviction that this
control can be eliminated.

As a result, the atomic variable tipc_queue_size is now unused
and so it can be deleted. There is a getsockopt call that used
to allow reading it; we retain that but just return zero for
maximum compatibility.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Cc: Neil Horman
[PG: phase out tipc_queue_size as pointed out by Neil Horman]
Signed-off-by: Paul Gortmaker

Ying Xue
2012-12-08 03:19:52 +0800

07 Dec, 2012

1 commit

c00841385 tipc: remove obsolete flush of stale reassembly buffer ... Browse Code »

Each link instance has a periodic job checking if there is a stale
ongoing message reassembly associated to the link. If no new
fragment has been received during the last 4*[link_tolerance] period,
it is assumed the missing fragment will never arrive. As a consequence,
the reassembly buffer is discarded, and a gap in the message sequence
occurs.

This assumption is wrong. After we abandoned our ambition to develop
packet routing for multi-cluster networks, only single-hop packet
transfer remains as an option. For those, all packets are guaranteed
to be delivered in sequence to the defragmentation layer. Any failure
to achieve sequenced delivery will eventually lead to link reset, and
the reassembly buffer will be flushed anyway.

So we just remove this periodic check, which is now obsolete.

Signed-off-by: Erik Hugne
Acked-by: Ying Xue
Signed-off-by: Jon Maloy
[PG: also delete get/inc_timer count, since they are now unused]
Signed-off-by: Paul Gortmaker

Erik Hugne
2012-12-07 06:20:19 +0800

23 Nov, 2012

3 commits

94fc9c471 tipc: delete TIPC_ADVANCED Kconfig variable ... Browse Code »

There used to be a time when TIPC had lots of Kconfig knobs the
end user could alter, but they have all been made automatic or
obsolete, with the exception of CONFIG_TIPC_PORTS. This
previously existing set of options was all hidden under the
TIPC_ADVANCED setting, which does not exist in any code, but
only in Kconfig scope.

Having this now, just to hide the one remaining "advanced"
option no longer makes sense. Remove it. Also get rid of the
ifdeffery in the TIPC code that allowed for TIPC_PORTS to be
possibly undefined.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2012-11-23 03:33:29 +0800
4cb7d55ab tipc: eliminate an unnecessary cast of node variable ... Browse Code »

As the variable:node is currently defined to u32 type, it is
unnecessary to cast its type to u32 again when using it.

Signed-off-by: Ying Xue
Signed-off-by: Paul Gortmaker

Ying Xue
2012-11-23 03:33:28 +0800
c64f7a6a1 tipc: introduce message to synchronize broadcast link ... Browse Code »

Upon establishing a first link between two nodes, there is
currently a risk that the two endpoints will disagree on exactly
which sequence number reception and acknowleding of broadcast
packets should start.

The following scenarios may happen:

1: Node A sends an ACTIVATE message to B, telling it to start acking
packets from sequence number N.
2: Node A sends out broadcast N, but does not expect an acknowledge
from B, since B is not yet in its broadcast receiver's list.
3: Node A receives ACK for N from all nodes except B, and releases
packet N.
4: Node B receives the ACTIVATE, activates its link endpoint, and
stores the value N as sequence number of first expected packet.
5: Node B sends a NAME_DISTR message to A.
6: Node A receives the NAME_DISTR message, and activates its endpoint.
At this moment B is added to A's broadcast receiver's set.
Node A also sets sequence number 0 as the first broadcast packet
to be received from B.
7: Node A sends broadcast N+1.
8: B receives N+1, determines there is a gap in the sequence, since
it is expecting N, and sends a NACK for N back to A.
9: Node A has already released N, so no retransmission is possible.
The broadcast link in direction A->B is stale.

In addition to, or instead of, 7-9 above, the following may happen:

10: Node B sends broadcast M > 0 to A.
11: Node A receives M, falsely decides there must be a gap, since
it is expecting packet 0, and asks for retransmission of packets
[0,M-1].
12: Node B has already released these packets, so the broadcast
link is stale in direction B->A.

We solve this problem by introducing a new unicast message type,
BCAST_PROTOCOL/STATE, to convey the sequence number of the next
sent broadcast packet to the other endpoint, at exactly the moment
that endpoint is added to the own node's broadcast receivers list,
and before any other unicast messages are permitted to be sent.

Furthermore, we don't allow any node to start receiving and
processing broadcast packets until this new synchronization
message has been received.

To maintain backwards compatibility, we still open up for
broadcast reception if we receive a NAME_DISTR message without
any preceding broadcast sync message. In this case, we must
assume that the other end has an older code version, and will
never send out the new synchronization message. Hence, for mixed
old and new nodes, the issue arising in 7-12 of the above may
happen with the same probability as before.

Signed-off-by: Jon Maloy
Signed-off-by: Ying Xue
Signed-off-by: Paul Gortmaker

Jon Maloy
2012-11-23 03:33:21 +0800

22 Nov, 2012

6 commits

389dd9bcf tipc: rename supported flag to recv_permitted ... Browse Code »

Rename the "supported" flag in bclink structure to "recv_permitted"
to better reflect what it is used for. When this flag is set for a
given node, we are permitted to receive and acknowledge broadcast
messages from that node. Convert it to a bool at the same time,
since it is not used to store any numerical values.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker

Ying Xue
2012-11-22 20:50:51 +0800
818f4da52 tipc: remove supportable flag from bclink structure ... Browse Code »

The "supportable" flag in bclink structure is a compatibility flag
indicating whether a peer node is capable of receiving TIPC broadcast
messages. However, all TIPC versions since tipc-1.5, and after the
inclusion in the upstream Linux kernel in 2006, support this capability.
It is highly unlikely that anybody is still using such an old
version of TIPC, let alone that they want to mix it with TIPC-2.0
nodes. Therefore, we now remove the "supportable" flag.

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker

Ying Xue
2012-11-22 20:50:50 +0800
3c294cb37 tipc: remove the bearer congestion mechanism ... Browse Code »

Currently at the TIPC bearer layer there is the following congestion
mechanism:

Once sending packets has failed via that bearer, the bearer will be
flagged as being in congested state at once. During bearer congestion,
all packets arriving at link will be queued on the link's outgoing
buffer. When we detect that the state of bearer congestion has
relaxed (e.g. some packets are received from the bearer) we will try
our best to push all packets in the link's outgoing buffer until the
buffer is empty, or until the bearer is congested again.

However, in fact the TIPC bearer never receives any feedback from the
device layer whether a send was successful or not, so it must always
assume it was successful. Therefore, the bearer congestion mechanism
as it exists currently is of no value.

But the bearer blocking state is still useful for us. For example,
when the physical media goes down/up, we need to change the state of
the links bound to the bearer. So the code maintaing the state
information is not removed.

Signed-off-by: Ying Xue
Signed-off-by: Paul Gortmaker

Ying Xue
2012-11-22 09:07:25 +0800
750311510 tipc: wake up all waiting threads at socket shutdown ... Browse Code »

When a socket is shut down, we should wake up all thread sleeping on
it, instead of just one of them. Otherwise, when several threads are
polling the same socket, and one of them does shutdown(), the
remaining threads may end up sleeping forever.

Also, to align socket usage with common practice in other stacks, we
use one of the common socket callback handlers, sk_state_change(),
to wake up pending users. This is similar to the usage in e.g.
inet_shutdown(). [net/ipv4/af_inet.c].

Signed-off-by: Ying Xue
Signed-off-by: Jon Maloy
Signed-off-by: Paul Gortmaker

Ying Xue
2012-11-22 09:06:29 +0800
c4fc298ab tipc: return POLLOUT for sockets in an unconnected state ... Browse Code »

If an implied connect is attempted on a nonblocking STREAM/SEQPACKET
socket during link congestion, the connect message will be discarded
and sendmsg will return EAGAIN. This is normal behavior, and the
application is expected to poll the socket until POLLOUT is set,
after which the connection attempt can be retried.
However, the POLLOUT flag is never set for unconnected sockets and
poll() always returns a zero mask. The application is then left without
a trigger for when it can make another attempt at sending the message.

The solution is to check if we're polling on an unconnected socket
and set the POLLOUT flag if the TIPC port owned by this socket
is not congested. The TIPC ports waiting on a specific link will be
marked as 'not congested' when the link congestion have abated.

Signed-off-by: Erik Hugne
Signed-off-by: Paul Gortmaker

Erik Hugne
2012-11-22 03:54:32 +0800
f288bef46 tipc: fix race/inefficiencies in poll/wait behaviour ... Browse Code »

When an application blocks at poll/select on a TIPC socket
while requesting a specific event mask, both the filter_rcv() and
wakeupdispatch() case will wake it up unconditionally whenever
the state changes (i.e an incoming message arrives, or congestion
has subsided). No mask is used.

To avoid this, we populate sk->sk_data_ready and sk->sk_write_space
with tipc_data_ready and tipc_write_space respectively, which makes
tipc more in alignment with the rest of the networking code. These
pass the exact set of possible events to the waker in fs/select.c
hence avoiding waking up blocked processes unnecessarily.

In doing so, we uncover another issue -- that there needs to be a
memory barrier in these poll/receive callbacks, otherwise we are
subject to the the same race as documented above wq_has_sleeper()
[in commit a57de0b4 "net: adding memory barrier to the poll and
receive callbacks"]. So we need to replace poll_wait() with
sock_poll_wait() and use rcu protection for the sk->sk_wq pointer
in these two new functions.

Signed-off-by: Ying Xue
Signed-off-by: Paul Gortmaker

Ying Xue
2012-11-22 03:54:31 +0800

04 Nov, 2012

1 commit

d145f7ec2 tipc: do not use tasklet_disable before tasklet_kill ... Browse Code »

If tasklet_disable() is called before related tasklet handled,
tasklet_kill will never be finished. tasklet_kill is enough.

Signed-off-by: Xiaotian Feng
Cc: Jon Maloy
Cc: Allan Stephens
Cc: "David S. Miller"
Cc: netdev@vger.kernel.org
Cc: tipc-discussion@lists.sourceforge.net
Signed-off-by: David S. Miller

Xiaotian Feng
2012-11-04 03:10:14 +0800

05 Oct, 2012

1 commit

e57edf6b6 tipc: prevent dropped connections due to rcvbuf overflow ... Browse Code »

When large buffers are sent over connected TIPC sockets, it
is likely that the sk_backlog will be filled up on the
receiver side, but the TIPC flow control mechanism is happily
unaware of this since that is based on message count.

The sender will receive a TIPC_ERR_OVERLOAD message when this occurs
and drop it's side of the connection, leaving it stale on
the receiver end.

By increasing the sk_rcvbuf to a 'worst case' value, we avoid the
overload caused by a full backlog queue and the flow control
will work properly.

This worst case value is the max TIPC message size times
the flow control window, multiplied by two because a sender
will transmit up to double the window size before a port is marked
congested.
We multiply this by 2 to account for the sk_buff and other overheads.

Signed-off-by: Erik Hugne
Signed-off-by: David S. Miller

Erik Hugne
2012-10-05 03:53:48 +0800