Doug / smarc-fsl-linux-kernel | Embedian Git Server

30 Aug, 2013

2 commits

f55d112e5 net: packet: use reciprocal_divide in fanout_demux_hash ... Browse Code »

Instead of hard-coding reciprocal_divide function, use the inline
function from reciprocal_div.h.

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2013-08-30 04:43:29 +0800
5df0ddfbc net: packet: add randomized fanout scheduler ... Browse Code »

We currently allow for different fanout scheduling policies in pf_packet
such as scheduling by skb's rxhash, round-robin, by cpu, and rollover.
Also allow for a random, equidistributed selection of the socket from the
fanout process group.

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2013-08-30 04:43:29 +0800

27 Aug, 2013

1 commit

b05930f5d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/wireless/iwlwifi/pcie/trans.c
include/linux/inetdevice.h

The inetdevice.h conflict involves moving the IPV4_DEVCONF values
into a UAPI header, overlapping additions of some new entries.

The iwlwifi conflict is a context overlap.

Signed-off-by: David S. Miller

David S. Miller
2013-08-27 04:37:08 +0800

21 Aug, 2013

1 commit

8bcdeaff5 packet: restore packet statistics tp_packets to include drops ... Browse Code »

getsockopt PACKET_STATISTICS returns tp_packets + tp_drops. Commit
ee80fbf301 ("packet: account statistics only in tpacket_stats_u")
cleaned up the getsockopt PACKET_STATISTICS code.
This also changed semantics. Historically, tp_packets included
tp_drops on return. The commit removed the line that adds tp_drops
into tp_packets.

This patch reinstates the old semantics.

Signed-off-by: Willem de Bruijn
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Willem de Bruijn
2013-08-21 08:23:58 +0800

10 Aug, 2013

1 commit

28d642710 net: attempt high order allocations in sock_alloc_send_pskb() ... Browse Code »

Adding paged frags skbs to af_unix sockets introduced a performance
regression on large sends because of additional page allocations, even
if each skb could carry at least 100% more payload than before.

We can instruct sock_alloc_send_pskb() to attempt high order
allocations.

Most of the time, it does a single page allocation instead of 8.

I added an additional parameter to sock_alloc_send_pskb() to
let other users to opt-in for this new feature on followup patches.

Tested:

Before patch :

$ netperf -t STREAM_STREAM
STREAM STREAM TEST
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

2304 212992 212992 10.00 46861.15

After patch :

$ netperf -t STREAM_STREAM
STREAM STREAM TEST
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

2304 212992 212992 10.00 57981.11

Signed-off-by: Eric Dumazet
Cc: David Rientjes
Signed-off-by: David S. Miller

Eric Dumazet
2013-08-10 16:16:44 +0800

08 Aug, 2013

1 commit

09effa67a packet: Revert recent header parsing changes. ... Browse Code »

This reverts commits:

0f75b09c798ed00c30d7d5551b896be883bc2aeb
cbd89acb9eb257ed3b2be867142583fdcf7fdc5b
c483e02614551e44ced3fe6eedda8e36d3277ccc

Amongst other things, it's modifies the SKB header
to pull the ethernet headers off via eth_type_trans()
on the output path which is bogus.

It's causing serious regressions for people.

Signed-off-by: David S. Miller

David S. Miller
2013-08-08 08:11:00 +0800

03 Aug, 2013

3 commits

c483e0261 af_packet: simplify VLAN frame check in packet_snd ... Browse Code »

For ethernet frames, eth_type_trans() already parses the header, so one
can skip this when checking the frame size.

Signed-off-by: Phil Sutter
Signed-off-by: David S. Miller

Phil Sutter
2013-08-03 05:58:32 +0800
cbd89acb9 af_packet: fix for sending VLAN frames via packet_mmap ... Browse Code »

Since tpacket_fill_skb() parses the protocol field in ethernet frames'
headers, it's easy to see if any passed frame is a VLAN one and account
for the extended size.

But as the real protocol does not turn up before tpacket_fill_skb()
runs which in turn also checks the frame length, move the max frame
length calculation into the function.

Signed-off-by: Phil Sutter
Signed-off-by: David S. Miller

Phil Sutter
2013-08-03 05:58:32 +0800
0f75b09c7 af_packet: when sending ethernet frames, parse header for skb->protocol ... Browse Code »

This may be necessary when the SKB is passed to other layers on the go,
which check the protocol field on their own. An example is a VLAN packet
sent out using AF_PACKET on a bridge interface. The bridging code checks
the SKB size, accounting for any VLAN header only if the protocol field
is set accordingly.

Note that eth_type_trans() sets skb->dev to the passed argument, so this
can be skipped in packet_snd() for ethernet frames, as well.

Signed-off-by: Phil Sutter
Signed-off-by: David S. Miller

Phil Sutter
2013-08-03 05:58:32 +0800

23 Jul, 2013

1 commit

cb820f8e4 net: Provide a generic socket error queue delivery method for Tx time stamps. ... Browse Code »

This patch moves the private error queue delivery function from the
af_packet code to the core socket method. In this way, network layers
only needing the error queue for transmit time stamping can share common
code.

Signed-off-by: Richard Cochran
Signed-off-by: David S. Miller

Richard Cochran
2013-07-23 05:58:19 +0800

20 Jun, 2013

1 commit

d98cae64e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/wireless/ath/ath9k/Kconfig
drivers/net/xen-netback/netback.c
net/batman-adv/bat_iv_ogm.c
net/wireless/nl80211.c

The ath9k Kconfig conflict was a change of a Kconfig option name right
next to the deletion of another option.

The xen-netback conflict was overlapping changes involving the
handling of the notify list in xen_netbk_rx_action().

Batman conflict resolution provided by Antonio Quartulli, basically
keep everything in both conflict hunks.

The nl80211 conflict is a little more involved. In 'net' we added a
dynamic memory allocation to nl80211_dump_wiphy() to fix a race that
Linus reported. Meanwhile in 'net-next' the handlers were converted
to use pre and post doit handlers which use a flag to determine
whether to hold the RTNL mutex around the operation.

However, the dump handlers to not use this logic. Instead they have
to explicitly do the locking. There were apparent bugs in the
conversion of nl80211_dump_wiphy() in that we were not dropping the
RTNL mutex in all the return paths, and it seems we very much should
be doing so. So I fixed that whilst handling the overlapping changes.

To simplify the initial returns, I take the RTNL mutex after we try
to allocate 'tb'.

Signed-off-by: David S. Miller

David S. Miller
2013-06-20 07:49:39 +0800

13 Jun, 2013

1 commit

2dc85bf32 packet: packet_getname_spkt: make sure string is always 0-terminated ... Browse Code »

uaddr->sa_data is exactly of size 14, which is hard-coded here and
passed as a size argument to strncpy(). A device name can be of size
IFNAMSIZ (== 16), meaning we might leave the destination string
unterminated. Thus, use strlcpy() and also sizeof() while we're
at it. We need to memset the data area beforehand, since strlcpy
does not padd the remaining buffer with zeroes for user space, so
that we do not possibly leak anything.

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2013-06-13 16:38:36 +0800

29 May, 2013

1 commit

351638e7d net: pass info struct via netdevice notifier ... Browse Code »

So far, only net_device * could be passed along with netdevice notifier
event. This patch provides a possibility to pass custom structure
able to provide info that event listener needs to know.

Signed-off-by: Jiri Pirko

v2->v3: fix typo on simeth
shortened dev_getter
shortened notifier_info struct name
v1->v2: fix notifier_call parameter in call_netdevice_notifier()
Signed-off-by: David S. Miller

Jiri Pirko
2013-05-29 04:11:01 +0800

04 May, 2013

1 commit

8da3056c0 packet: tpacket_v3: do not trigger bug() on wrong header status ... Browse Code »

Jakub reported that it is fairly easy to trigger the BUG() macro
from user space with TPACKET_V3's RX_RING by just giving a wrong
header status flag. We already had a similar situation in commit
7f5c3e3a80e6654 (``af_packet: remove BUG statement in
tpacket_destruct_skb'') where this was the case in the TX_RING
side that could be triggered from user space. So really, don't use
BUG() or BUG_ON() unless there's really no way out, and i.e.
don't use it for consistency checking when there's user space
involved, no excuses, especially not if you're slapping the user
with WARN + dump_stack + BUG all at once. The two functions are
of concern:

prb_retire_current_block() [when block status != TP_STATUS_KERNEL]
prb_open_block() [when block_status != TP_STATUS_KERNEL]

Calls to prb_open_block() are guarded by ealier checks if block_status
is really TP_STATUS_KERNEL (racy!), but the first one BUG() is easily
triggable from user space. System behaves still stable after they are
removed. Also remove that yoda condition entirely, since it's already
guarded.

Reported-by: Jakub Zawadzki
Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2013-05-04 04:10:33 +0800

30 Apr, 2013

3 commits

e8d9612c1 sock_diag: allow to dump bpf filters ... Browse Code »

This patch allows to dump BPF filters attached to a socket with
SO_ATTACH_FILTER.
Note that we check CAP_SYS_ADMIN before allowing to dump this info.

For now, only AF_PACKET sockets use this feature.

Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2013-04-30 01:21:30 +0800
76d0eeb1a packet_diag: disclose meminfo values ... Browse Code »

sk_rmem_alloc is disclosed via /proc/net/packet but not via netlink messages.
The goal is to have the same level of information.

Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2013-04-30 01:21:30 +0800
626419038 packet_diag: disclose uid value ... Browse Code »

This value is disclosed via /proc/net/packet but not via netlink messages.
The goal is to have the same level of information.

Signed-off-by: Nicolas Dichtel
Signed-off-by: David S. Miller

Nicolas Dichtel
2013-04-30 01:21:30 +0800

25 Apr, 2013

5 commits

ee80fbf30 packet: account statistics only in tpacket_stats_u ... Browse Code »

Currently, packet_sock has a struct tpacket_stats stats member for
TPACKET_V1 and TPACKET_V2 statistic accounting, and with TPACKET_V3
``union tpacket_stats_u stats_u'' was introduced, where however only
statistics for TPACKET_V3 are held, and when copied to user space,
TPACKET_V3 does some hackery and access also tpacket_stats' stats,
although everything could have been done within the union itself.

Unify accounting within the tpacket_stats_u union so that we can
remove 8 bytes from packet_sock that are there unnecessary. Note that
even if we switch to TPACKET_V3 and would use non mmap(2)ed option,
this still works due to the union with same types + offsets, that are
exposed to the user space.

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2013-04-25 13:29:43 +0800
0578edc56 packet: reorder a member in packet_ring_buffer ... Browse Code »

There's a 4 byte hole in packet_ring_buffer structure before
prb_bdqc, that can be filled with 'pending' member, thus we can
reduce the overall structure size from 224 bytes to 216 bytes.
This also has the side-effect, that in struct packet_sock 2*4 byte
holes after the embedded packet_ring_buffer members are removed,
and overall, packet_sock can be reduced by 1 cacheline:

Before: size: 1344, cachelines: 21, members: 24
After: size: 1280, cachelines: 20, members: 24

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2013-04-25 13:29:43 +0800
b9c32fb27 packet: if hw/sw ts enabled in rx/tx ring, report which ts we got ... Browse Code »

Currently, there is no way to find out which timestamp is reported in
tpacket{,2,3}_hdr's tp_sec, tp_{n,u}sec members. It can be one of
SOF_TIMESTAMPING_SYS_HARDWARE, SOF_TIMESTAMPING_RAW_HARDWARE,
SOF_TIMESTAMPING_SOFTWARE, or a fallback variant late call from the
PF_PACKET code in software.

Therefore, report in the tp_status member of the ring buffer which
timestamp has been reported for RX and TX path. This should not break
anything for the following reasons: i) in RX ring path, the user needs
to test for tp_status & TP_STATUS_USER, and later for other flags as
well such as TP_STATUS_VLAN_VALID et al, so adding other flags will
do no harm; ii) in TX ring path, time stamps with PACKET_TIMESTAMP
socketoption are not available resp. had no effect except that the
application setting this is buggy. Next to TP_STATUS_AVAILABLE, the
user also should check for other flags such as TP_STATUS_WRONG_FORMAT
to reclaim frames to the application. Thus, in case TX ts are turned
off (default case), nothing happens to the application logic, and in
case we want to use this new feature, we now can also check which of
the ts source is reported in the status field as provided in the docs.

Reported-by: Richard Cochran
Signed-off-by: Daniel Borkmann
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller

Daniel Borkmann
2013-04-25 13:22:22 +0800
7a51384cc packet: enable hardware tx timestamping on tpacket ring ... Browse Code »

Currently, we only have software timestamping for the TX ring buffer
path, but this limitation stems rather from the implementation. By
just reusing tpacket_get_timestamp(), we can also allow hardware
timestamping just as in the RX path.

Signed-off-by: Daniel Borkmann
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller

Daniel Borkmann
2013-04-25 13:22:22 +0800
2e31396fa packet: tx timestamping on tpacket ring ... Browse Code »

When transmit timestamping is enabled at the socket level, record a
timestamp on packets written to a PACKET_TX_RING. Tx timestamps are
always looped to the application over the socket error queue. Software
timestamps are also written back into the packet frame header in the
packet ring.

Reported-by: Paul Chavent
Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller

Willem de Bruijn
2013-04-25 13:22:22 +0800

20 Apr, 2013

1 commit

4b457bdf1 packet: move hw/sw timestamp extraction into a small helper ... Browse Code »

This patch introduces a small, internal helper function, that is used by
PF_PACKET. Based on the flags that are passed, it extracts the packet
timestamp in the receive path. This is merely a refactoring to remove
some duplicate code in tpacket_rcv(), to make it more readable, and to
enable others to use this function in PF_PACKET as well, e.g. for TX.

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2013-04-20 04:39:13 +0800

17 Apr, 2013

1 commit

184f489e9 packet: minor: add generic tpacket_uhdr to access packet headers ... Browse Code »

There is no need to add a dozen unions each time at the start
of the function. So, do this once and use it instead. Thus, we
can remove some duplicate code and make it more readable.

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2013-04-17 04:43:34 +0800

15 Apr, 2013

1 commit

bf84a0106 net: sock: make sock_tx_timestamp void ... Browse Code »

Currently, sock_tx_timestamp() always returns 0. The comment that
describes the sock_tx_timestamp() function wrongly says that it
returns an error when an invalid argument is passed (from commit
20d4947353be, ``net: socket infrastructure for SO_TIMESTAMPING'').
Make the function void, so that we can also remove all the unneeded
if conditions that check for such a _non-existant_ error case in the
output path.

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2013-04-15 03:41:49 +0800

28 Mar, 2013

1 commit

40893fd0f net: switch to use skb_probe_transport_header() ... Browse Code »

Switch to use the new help skb_probe_transport_header() to do the l4 header
probing for untrusted sources. For packets with partial csum, the header should
already been set by skb_partial_csum_set().

Cc: Eric Dumazet
Signed-off-by: Jason Wang
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Jason Wang
2013-03-28 00:48:31 +0800

27 Mar, 2013

1 commit

c1aad275b packet: set transport header before doing xmit ... Browse Code »

Set the transport header for 1) some drivers (e.g ixgbe needs l4 header to do
atr) 2) precise packet length estimation (introduced in 1def9238) needs l4
header to compute header length.

So this patch first tries to get l4 header for packet socket through
skb_flow_dissect(), and pretend no l4 header if skb_flow_dissect() fails.

Cc: Eric Dumazet
Signed-off-by: Jason Wang
Signed-off-by: David S. Miller

Jason Wang
2013-03-27 00:44:43 +0800

20 Mar, 2013

1 commit

77f65ebdc packet: packet fanout rollover during socket overload ... Browse Code »

Changes:
v3->v2: rebase (no other changes)
passes selftest
v2->v1: read f->num_members only once
fix bug: test rollover mode + flag

Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.

The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.

Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.

To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.

Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).

Also, passes psock_fanout.c unit test added to selftests.

Signed-off-by: Willem de Bruijn
Reviewed-by: Eric Dumazet
Signed-off-by: David S. Miller

Willem de Bruijn
2013-03-20 05:15:04 +0800

28 Feb, 2013

1 commit

b67bfe0d4 hlist: drop the node parameter from iterators ... Browse Code »

I'm not sure why, but the hlist for each entry iterators were conceived

list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin
Acked-by: Paul E. McKenney
Signed-off-by: Sasha Levin
Cc: Wu Fengguang
Cc: Marcelo Tosatti
Cc: Gleb Natapov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sasha Levin
2013-02-28 11:10:24 +0800

19 Feb, 2013

2 commits

ece31ffd5 net: proc: change proc_net_remove to remove_proc_entry ... Browse Code »

proc_net_remove is only used to remove proc entries
that under /proc/net,it's not a general function for
removing proc entries of netns. if we want to remove
some proc entries which under /proc/net/stat/, we still
need to call remove_proc_entry.

this patch use remove_proc_entry to replace proc_net_remove.
we can remove proc_net_remove after this patch.

Signed-off-by: Gao feng
Signed-off-by: David S. Miller

Gao feng
2013-02-19 03:53:08 +0800
d4beaa66a net: proc: change proc_net_fops_create to proc_create ... Browse Code »

Right now, some modules such as bonding use proc_create
to create proc entries under /proc/net/, and other modules
such as ipv4 use proc_net_fops_create.

It looks a little chaos.this patch changes all of
proc_net_fops_create to proc_create. we can remove
proc_net_fops_create after this patch.

Signed-off-by: Gao feng
Signed-off-by: David S. Miller

Gao feng
2013-02-19 03:53:08 +0800

04 Feb, 2013

1 commit

9665d5d62 packet: fix leakage of tx_ring memory ... Browse Code »

When releasing a packet socket, the routine packet_set_ring() is reused
to free rings instead of allocating them. But when calling it for the
first time, it fills req->tp_block_nr with the value of rb->pg_vec_len
which in the second invocation makes it bail out since req->tp_block_nr
is greater zero but req->tp_block_size is zero.

This patch solves the problem by passing a zeroed auto-variable to
packet_set_ring() upon each invocation from packet_release().

As far as I can tell, this issue exists even since 69e3c75 (net: TX_RING
and packet mmap), i.e. the original inclusion of TX ring support into
af_packet, but applies only to sockets with both RX and TX ring
allocated, which is probably why this was unnoticed all the time.

Signed-off-by: Phil Sutter
Cc: Johann Baudy
Cc: Daniel Borkmann
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Phil Sutter
2013-02-04 05:15:23 +0800

19 Nov, 2012

1 commit

df008c91f net: Allow userns root to control llc, netfilter, netlink, packet, and xfrm ... Browse Code »

Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Allow creation of af_key sockets.
Allow creation of llc sockets.
Allow creation of af_packet sockets.

Allow sending xfrm netlink control messages.

Allow binding to netlink multicast groups.
Allow sending to netlink multicast groups.
Allow adding and dropping netlink multicast groups.
Allow sending to all netlink multicast groups and port ids.

Allow reading the netfilter SO_IP_SET socket option.
Allow sending netfilter netlink messages.
Allow setting and getting ip_vs netfilter socket options.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2012-11-19 09:32:45 +0800

08 Nov, 2012

1 commit

5920cd3a4 packet: tx_ring: allow the user to choose tx data offset ... Browse Code »

The tx data offset of packet mmap tx ring used to be :
(TPACKET2_HDRLEN - sizeof(struct sockaddr_ll))

The problem is that, with SOCK_RAW socket, the payload (14 bytes after
the beginning of the user data) is misaligned.

This patch allows to let the user gives an offset for it's tx data if
he desires.

Set sock option PACKET_TX_HAS_OFF to 1, then specify in each frame of
your tx ring tp_net for SOCK_DGRAM, or tp_mac for SOCK_RAW.

Signed-off-by: Paul Chavent
Signed-off-by: David S. Miller

Paul Chavent
2012-11-08 07:54:30 +0800

26 Oct, 2012

1 commit

342567ccf packet: minor: remove unused err assignment ... Browse Code »

This tiny patch removes two unused err assignments. In those two cases the
err variable is either overwritten with another value at a later point in
time without having read the previous assigment, or it is assigned and the
function returns without using/reading err after the assignment.

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2012-10-26 14:17:20 +0800

11 Sep, 2012

1 commit

15e473046 netlink: Rename pid to portid to avoid confusion ... Browse Code »

It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.

I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.

I have successfully built an allyesconfig kernel with this change.

Signed-off-by: "Eric W. Biederman"
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller

Eric W. Biederman
2012-09-11 03:30:41 +0800

01 Sep, 2012

1 commit

c32f38619 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Merge the 'net' tree to get the recent set of netfilter bug fixes in
order to assist with some merge hassles Pablo is going to have to deal
with for upcoming changes.

Signed-off-by: David S. Miller

David S. Miller
2012-09-01 03:14:18 +0800

25 Aug, 2012

1 commit

e6acb3848 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

This is an initial merge in of Eric Biederman's work to start adding
user namespace support to the networking.

Signed-off-by: David S. Miller

David S. Miller
2012-08-25 06:54:37 +0800

24 Aug, 2012

1 commit

a0dfb2634 af_packet: match_fanout_group() can be static ... Browse Code »

cc: Eric Leblond
Signed-off-by: Fengguang Wu
Signed-off-by: David S. Miller

Fengguang Wu
2012-08-24 00:27:12 +0800

23 Aug, 2012

1 commit

0fa7fa98d packet: Protect packet sk list with mutex (v2) ... Browse Code »

Change since v1:

* Fixed inuse counters access spotted by Eric

In patch eea68e2f (packet: Report socket mclist info via diag module) I've
introduced a "scheduling in atomic" problem in packet diag module -- the
socket list is traversed under rcu_read_lock() while performed under it sk
mclist access requires rtnl lock (i.e. -- mutex) to be taken.

[152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
[152363.820573] 4 locks held by crtools/12517:
[152363.820581] #0: (sock_diag_mutex){+.+.+.}, at: [] sock_diag_rcv+0x1f/0x3e
[152363.820613] #1: (sock_diag_table_mutex){+.+.+.}, at: [] sock_diag_rcv_msg+0xdb/0x11a
[152363.820644] #2: (nlk->cb_mutex){+.+.+.}, at: [] netlink_dump+0x23/0x1ab
[152363.820693] #3: (rcu_read_lock){.+.+..}, at: [] packet_diag_dump+0x0/0x1af

Similar thing was then re-introduced by further packet diag patches (fanount
mutex and pgvec mutex for rings) :(

Apart from being terribly sorry for the above, I propose to change the packet
sk list protection from spinlock to mutex. This lock currently protects two
modifications:

* sklist
* prot inuse counters

The sklist modifications can be just reprotected with mutex since they already
occur in a sleeping context. The inuse counters modifications are trickier -- the
__this_cpu_-s are used inside, thus requiring the caller to handle the potential
issues with contexts himself. Since packet sockets' counters are modified in two
places only (packet_create and packet_release) we only need to protect the context
from being preempted. BH disabling is not required in this case.

Signed-off-by: Pavel Emelyanov
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Pavel Emelyanov
2012-08-23 13:58:27 +0800