Eric Lee / smarc-fsl-linux-kernel

31 Dec, 2011

1 commit

7f8e3234c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2011-12-31 02:04:14 +0800

28 Dec, 2011

1 commit

aef950b4b packet: fix possible dev refcnt leak when bind fail ... Browse Code »

If bind is fail when bind is called after set PACKET_FANOUT
sock option, the dev refcnt will leak.

Signed-off-by: Wei Yongjun
Signed-off-by: David S. Miller

Wei Yongjun
2011-12-28 11:32:41 +0800

24 Dec, 2011

1 commit

abb434cb0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
net/bluetooth/l2cap_core.c

Just two overlapping changes, one added an initialization of
a local variable, and another change added a new local variable.

Signed-off-by: David S. Miller

David S. Miller
2011-12-24 06:13:56 +0800

23 Dec, 2011

1 commit

0fd7bac6b net: relax rcvbuf limits ... Browse Code »

skb->truesize might be big even for a small packet.

Its even bigger after commit 87fb4b7b533 (net: more accurate skb
truesize) and big MTU.

We should allow queueing at least one packet per receiver, even with a
low RCVBUF setting.

Reported-by: Michal Simek
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-23 15:15:14 +0800

19 Nov, 2011

2 commits

4ce409125 packet: Add needed_tailroom to packet_sendmsg_spkt ... Browse Code »

packet: Add needed_tailroom to packet_sendmsg_spkt

While auditing LL_ALLOCATED_SPACE I noticed that packet_sendmsg_spkt
did not include needed_tailroom when allocating an skb. This isn't
a fatal error as we should always tolerate inadequate tail room but
it isn't optimal.

This patch fixes that.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2011-11-19 03:37:10 +0800
ae641949d net: Remove all uses of LL_ALLOCATED_SPACE ... Browse Code »

net: Remove all uses of LL_ALLOCATED_SPACE

The macro LL_ALLOCATED_SPACE was ill-conceived. It applies the
alignment to the sum of needed_headroom and needed_tailroom. As
the amount that is then reserved for head room is needed_headroom
with alignment, this means that the tail room left may be too small.

This patch replaces all uses of LL_ALLOCATED_SPACE with the macro
LL_RESERVED_SPACE and direct reference to needed_tailroom.

This also fixes the problem with needed_headroom changing between
allocating the skb and reserving the head room.

Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Herbert Xu
2011-11-19 03:37:09 +0800

04 Nov, 2011

1 commit

eea49cc90 af_packet: de-inline some helper functions ... Browse Code »

This popped some compiler errors due to mismatched prototypes. Just
remove most manual inlines, the compiler should be able to figure out
what makes sense to inline and not.

net/packet/af_packet.c:252: warning: 'prb_curr_blk_in_use' declared inline after being called
net/packet/af_packet.c:252: warning: previous declaration of 'prb_curr_blk_in_use' was here
net/packet/af_packet.c:258: warning: 'prb_queue_frozen' declared inline after being called
net/packet/af_packet.c:258: warning: previous declaration of 'prb_queue_frozen' was here
net/packet/af_packet.c:248: warning: 'packet_previous_frame' declared inline after being called
net/packet/af_packet.c:248: warning: previous declaration of 'packet_previous_frame' was here
net/packet/af_packet.c:251: warning: 'packet_increment_head' declared inline after being called
net/packet/af_packet.c:251: warning: previous declaration of 'packet_increment_head' was here

Signed-off-by: Olof Johansson
Cc: Chetan Loke
Signed-off-by: David S. Miller

Olof Johansson
2011-11-04 06:11:51 +0800

19 Oct, 2011

1 commit

bc416d976 macvlan: handle fragmented multicast frames ... Browse Code »
43

Fragmented multicast frames are delivered to a single macvlan port,
because ip defrag logic considers other samples are redundant.

Implement a defrag step before trying to send the multicast frame.

Reported-by: Ben Greear
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-10-19 11:22:07 +0800

11 Oct, 2011

1 commit

95f5f803b af_packet: remove unnecessary BUG_ON() in tpacket_destruct_skb ... Browse Code »

If skb is NULL, then stack trace is thrown anyway on dereference.
Therefore, the stack trace triggered by BUG_ON is duplicate.

Signed-off-by: Daniel Borkmann
Cc: Eric Dumazet
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

danborkmann@iogearbox.net
2011-10-11 02:09:08 +0800

08 Oct, 2011

1 commit

88c5100c2 Merge branch 'master' of github.com:davem330/net ... Browse Code »

Conflicts:
net/batman-adv/soft-interface.c

David S. Miller
2011-10-08 01:38:43 +0800

04 Oct, 2011

1 commit

7091fbd82 make PACKET_STATISTICS getsockopt report consistently between ring and non-ring ... Browse Code »
1

This is a minor change.

Up until kernel 2.6.32, getsockopt(fd, SOL_PACKET, PACKET_STATISTICS,
...) would return total and dropped packets since its last invocation. The
introduction of socket queue overflow reporting [1] changed drop
rate calculation in the normal packet socket path, but not when using a
packet ring. As a result, the getsockopt now returns different statistics
depending on the reception method used. With a ring, it still returns the
count since the last call, as counts are incremented in tpacket_rcv and
reset in getsockopt. Without a ring, it returns 0 if no drops occurred
since the last getsockopt and the total drops over the lifespan of
the socket otherwise. The culprit is this line in packet_rcv, executed
on a drop:

drop_n_acct:
po->stats.tp_drops = atomic_inc_return(&sk->sk_drops);

As it shows, the new drop number it taken from the socket drop counter,
which is not reset at getsockopt. I put together a small example
that demonstrates the issue [2]. It runs for 10 seconds and overflows
the queue/ring on every odd second. The reported drop rates are:
ring: 16, 0, 16, 0, 16, ...
non-ring: 0, 15, 0, 30, 0, 46, 0, 60, 0 , 74.

Note how the even ring counts monotonically increase. Because the
getsockopt adds tp_drops to tp_packets, total counts are similarly
reported cumulatively. Long story short, reinstating the original code, as
the below patch does, fixes the issue at the cost of additional per-packet
cycles. Another solution that does not introduce per-packet overhead
is be to keep the current data path, record the value of sk_drops at
getsockopt() at call N in a new field in struct packetsock and subtract
that when reporting at call N+1. I'll be happy to code that, instead,
it's just more messy.

[1] http://patchwork.ozlabs.org/patch/35665/
[2] http://kernel.googlecode.com/files/test-packetsock-getstatistics.c

Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller

Willem de Bruijn
2011-10-04 02:18:26 +0800

16 Sep, 2011

1 commit

4bc71cb98 net: consolidate and fix ethtool_ops->get_settings calling ... Browse Code »

This patch does several things:
- introduces __ethtool_get_settings which is called from ethtool code and
from drivers as well. Put ASSERT_RTNL there.
- dev_ethtool_get_settings() is replaced by __ethtool_get_settings()
- changes calling in drivers so rtnl locking is respected. In
iboe_get_rate was previously ->get_settings() called unlocked. This
fixes it. Also prb_calc_retire_blk_tmo() in af_packet.c had the same
problem. Also fixed by calling __dev_get_by_index() instead of
dev_get_by_index() and holding rtnl_lock for both calls.
- introduces rtnl_lock in bnx2fc_vport_create() and fcoe_vport_create()
so bnx2fc_if_create() and fcoe_if_create() are called locked as they
are from other places.
- use __ethtool_get_settings() in bonding code

Signed-off-by: Jiri Pirko

v2->v3:
-removed dev_ethtool_get_settings()
-added ASSERT_RTNL into __ethtool_get_settings()
-prb_calc_retire_blk_tmo - use __dev_get_by_index() and lock
around it and __ethtool_get_settings() call
v1->v2:
add missing export_symbol
Reviewed-by: Ben Hutchings [except FCoE bits]
Acked-by: Ralf Baechle
Signed-off-by: David S. Miller

Jiri Pirko
2011-09-16 05:32:26 +0800

27 Aug, 2011

1 commit

bc59ba399 af_packet: Prefixed tpacket_v3 structs to avoid name space collision ... Browse Code »

structs introduced in tpacket_v3 implementation are prefixed with 'tpacket'
to avoid namespace collision.

Compile tested.

Signed-off-by: Chetan Loke
Signed-off-by: David S. Miller

chetan loke
2011-08-27 00:38:44 +0800

25 Aug, 2011

1 commit

f6fb8f100 af-packet: TPACKET_V3 flexible buffer implementation. ... Browse Code »

1) Blocks can be configured with non-static frame-size.
2) Read/poll is at a block-level(as opposed to packet-level).
3) Added poll timeout to avoid indefinite user-space wait on idle links.
4) Added user-configurable knobs:
4.1) block::timeout.
4.2) tpkt_hdr::sk_rxhash.

Changes:
C1) tpacket_rcv()
C1.1) packet_current_frame() is replaced by packet_current_rx_frame()
The bulk of the processing is then moved in the following chain:
packet_current_rx_frame()
__packet_lookup_frame_in_block
fill_curr_block()
or
retire_current_block
dispatch_next_block
or
return NULL(queue is plugged/paused)

Signed-off-by: Chetan Loke
Signed-off-by: David S. Miller

chetan loke
2011-08-25 10:40:40 +0800

14 Jul, 2011

1 commit

cc9f01b24 af-packet: fix - avoid reading stale data ... Browse Code »

Currently we flush tp_status and then flush the remainder of the header+payload.
tp_status should be flushed in the end to avoid stale data being read by user-space.

Incorrectly re-ordered barriers in v1.

Signed-off-by: Chetan Loke
Signed-off-by: David S. Miller

Chetan Loke
2011-07-14 23:36:33 +0800

07 Jul, 2011

2 commits

31817df02 packet: Fix build with INET disabled. ... Browse Code »

af_packet.c:(.text+0x3d130): undefined reference to `ip_defrag'
or
ERROR: "ip_defrag" [net/packet/af_packet.ko] undefined!

Reported-by: Randy Dunlap
Signed-off-by: David S. Miller

David S. Miller
2011-07-07 23:18:04 +0800
afe62c68c af_packet: lock imbalance ... Browse Code »

fanout_add() might return with fanout_mutex held.

Reduce indentation level while we are at it

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-07-07 21:41:29 +0800

06 Jul, 2011

5 commits

aec27311c packet: Fix leak in pre-defrag support. ... Browse Code »

When we clone the SKB, we forget about the original
one. Avoid this problem by using skb_share_check().

Reported-by: Penttilä Mika
Signed-off-by: David S. Miller

David S. Miller
2011-07-06 22:30:59 +0800
95ec3eb41 packet: Add 'cpu' fanout policy. ... Browse Code »

Unfortunately we have to use a real modulus here as
the multiply trick won't work as effectively with cpu
numbers as it does with rxhash values.

Signed-off-by: David S. Miller

David S. Miller
2011-07-06 16:56:38 +0800
7736d33f4 packet: Add pre-defragmentation support for ipv4 fanouts. ... Browse Code »

The skb->rxhash cannot be properly computed if the
packet is a fragment. To alleviate this, allow the
AF_PACKET client to ask for defragmentation to be
done at demux time.

Signed-off-by: David S. Miller

David S. Miller
2011-07-06 13:34:52 +0800
dc99f6006 packet: Add fanout support. ... Browse Code »

Fanouts allow packet capturing to be demuxed to a set of AF_PACKET
sockets. Two fanout policies are implemented:

1) Hashing based upon skb->rxhash

2) Pure round-robin

An AF_PACKET socket must be fully bound before it tries to add itself
to a fanout. All AF_PACKET sockets trying to join the same fanout
must all have the same bind settings.

Fanouts are identified (within a network namespace) by a 16-bit ID.
The first socket to try to add itself to a fanout with a particular
ID, creates that fanout. When the last socket leaves the fanout
(which happens only when the socket is closed), that fanout is
destroyed.

Signed-off-by: David S. Miller

David S. Miller
2011-07-06 13:34:52 +0800
ce06b03e6 packet: Add helpers to register/unregister ->prot_hook ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2011-07-06 13:34:52 +0800

21 Jun, 2011

1 commit

9f6ec8d69 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/wireless/iwlwifi/iwl-agn-rxon.c
drivers/net/wireless/rtlwifi/pci.c
net/netfilter/ipvs/ip_vs_core.c

David S. Miller
2011-06-21 13:29:08 +0800

12 Jun, 2011

1 commit

10a8d94a9 virtio_net: introduce VIRTIO_NET_HDR_F_DATA_VALID ... Browse Code »

There's no need for the guest to validate the checksum if it have been
validated by host nics. So this patch introduces a new flag -
VIRTIO_NET_HDR_F_DATA_VALID which is used to bypass the checksum
examing in guest. The backend (tap/macvtap) may set this flag when
met skbs with CHECKSUM_UNNECESSARY to save cpu utilization.

No feature negotiation is needed as old driver just ignore this flag.

Iperf shows 12%-30% performance improvement for UDP traffic. For TCP,
when gro is on no difference as it produces skb with partial
checksum. But when gro is disabled, 20% or even higher improvement
could be measured by netperf.

Signed-off-by: Jason Wang
Acked-by: Michael S. Tsirkin
Signed-off-by: David S. Miller

Jason Wang
2011-06-12 06:57:47 +0800

07 Jun, 2011

1 commit

13fcb7bd3 af_packet: prevent information leak ... Browse Code »

In 2.6.27, commit 393e52e33c6c2 (packet: deliver VLAN TCI to userspace)
added a small information leak.

Add padding field and make sure its zeroed before copy to user.

Signed-off-by: Eric Dumazet
CC: Patrick McHardy
Signed-off-by: David S. Miller

Eric Dumazet
2011-06-07 13:42:06 +0800

06 Jun, 2011

2 commits

827d97803 af-packet: Use existing netdev reference for bound sockets. ... Browse Code »

This saves a network device lookup on each packet transmitted,
for sockets that are bound to a network device.

Signed-off-by: Ben Greear
Signed-off-by: David S. Miller

Ben Greear
2011-06-06 05:16:28 +0800
160ff18a0 af-packet: Hold reference to bound network devices. ... Browse Code »

Old code was probably safe, but with this change we
can actually use the netdev object, not just compare
the pointer values.

Signed-off-by: Ben Greear
Signed-off-by: David S. Miller

Ben Greear
2011-06-06 05:16:28 +0800

02 Jun, 2011

1 commit

a3bcc23e8 af-packet: Add flag to distinguish VID 0 from no-vlan. ... Browse Code »

Currently, user-space cannot determine if a 0 tcp_vlan_tci
means there is no VLAN tag or the VLAN ID was zero.

Add flag to make this explicit. User-space can check for
TP_STATUS_VLAN_VALID || tp_vlan_tci > 0, which will be backwards
compatible. Older could would have just checked for tp_vlan_tci,
so it will work no worse than before.

Signed-off-by: Ben Greear
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Ben Greear
2011-06-02 12:18:03 +0800

24 May, 2011

1 commit

71338aa7d net: convert %p usage to %pK ... Browse Code »

The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces. Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers. The behavior of %pK depends on the kptr_restrict sysctl.

If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs. If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
(currently in the LSM tree), kernel pointers using %pK are printed as 0's.
If kptr_restrict is set to 2, kernel pointers using %pK are printed as
0's regardless of privileges. Replacing with 0's was chosen over the
default "(null)", which cannot be parsed by userland %p, which expects
"(nil)".

The supporting code for kptr_restrict and %pK are currently in the -mm
tree. This patch converts users of %p in net/ to %pK. Cases of printing
pointers to the syslog are not covered, since this would eliminate useful
information for postmortem debugging and the reading of the syslog is
already optionally protected by the dmesg_restrict sysctl.

Signed-off-by: Dan Rosenberg
Cc: James Morris
Cc: Eric Dumazet
Cc: Thomas Graf
Cc: Eugene Teo
Cc: Kees Cook
Cc: Ingo Molnar
Cc: David S. Miller
Cc: Peter Zijlstra
Cc: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller

Dan Rosenberg
2011-05-24 13:13:12 +0800

28 Apr, 2011

1 commit

0a14842f5 net: filter: Just In Time compiler for x86-64 ... Browse Code »

In order to speedup packet filtering, here is an implementation of a
JIT compiler for x86_64

It is disabled by default, and must be enabled by the admin.

echo 1 >/proc/sys/net/core/bpf_jit_enable

It uses module_alloc() and module_free() to get memory in the 2GB text
kernel range since we call helpers functions from the generated code.

EAX : BPF A accumulator
EBX : BPF X accumulator
RDI : pointer to skb (first argument given to JIT function)
RBP : frame pointer (even if CONFIG_FRAME_POINTER=n)
r9d : skb->len - skb->data_len (headlen)
r8 : skb->data

To get a trace of generated code, use :

echo 2 >/proc/sys/net/core/bpf_jit_enable

Example of generated code :

# tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24

flen=18 proglen=147 pass=3 image=ffffffffa00b5000
JIT code: ffffffffa00b5000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 60
JIT code: ffffffffa00b5010: 44 2b 4f 64 4c 8b 87 b8 00 00 00 be 0c 00 00 00
JIT code: ffffffffa00b5020: e8 24 7b f7 e0 3d 00 08 00 00 75 28 be 1a 00 00
JIT code: ffffffffa00b5030: 00 e8 fe 7a f7 e0 24 00 3d 00 14 a8 c0 74 49 be
JIT code: ffffffffa00b5040: 1e 00 00 00 e8 eb 7a f7 e0 24 00 3d 00 14 a8 c0
JIT code: ffffffffa00b5050: 74 36 eb 3b 3d 06 08 00 00 74 07 3d 35 80 00 00
JIT code: ffffffffa00b5060: 75 2d be 1c 00 00 00 e8 c8 7a f7 e0 24 00 3d 00
JIT code: ffffffffa00b5070: 14 a8 c0 74 13 be 26 00 00 00 e8 b5 7a f7 e0 24
JIT code: ffffffffa00b5080: 00 3d 00 14 a8 c0 75 07 b8 ff ff 00 00 eb 02 31
JIT code: ffffffffa00b5090: c0 c9 c3

BPF program is 144 bytes long, so native program is almost same size ;)

(000) ldh [12]
(001) jeq #0x800 jt 2 jf 8
(002) ld [26]
(003) and #0xffffff00
(004) jeq #0xc0a81400 jt 16 jf 5
(005) ld [30]
(006) and #0xffffff00
(007) jeq #0xc0a81400 jt 16 jf 17
(008) jeq #0x806 jt 10 jf 9
(009) jeq #0x8035 jt 10 jf 17
(010) ld [28]
(011) and #0xffffff00
(012) jeq #0xc0a81400 jt 16 jf 13
(013) ld [38]
(014) and #0xffffff00
(015) jeq #0xc0a81400 jt 16 jf 17
(016) ret #65535
(017) ret #0

Signed-off-by: Eric Dumazet
Cc: Arnaldo Carvalho de Melo
Cc: Ben Hutchings
Cc: Hagen Paul Pfeifer
Signed-off-by: David S. Miller

Eric Dumazet
2011-04-28 14:05:08 +0800

08 Mar, 2011

1 commit

e143038f4 af_packet: struct socket declared/assigned but unused ... Browse Code »

Signed-off-by: Hagen Paul Pfeifer
Signed-off-by: David S. Miller

Hagen Paul Pfeifer
2011-03-08 07:51:13 +0800

12 Feb, 2011

1 commit

57f89bfa2 network: Allow af_packet to transmit +4 bytes for VLAN packets. ... Browse Code »

This allows user-space to send a '1500' MTU VLAN packet on a
1500 MTU ethernet frame. The extra 4 bytes of a VLAN header is
not usually charged against the MTU when other parts of the
network stack is transmitting vlans...

Signed-off-by: Ben Greear
Reviewed-by: Eric Dumazet
Signed-off-by: David S. Miller

Ben Greear
2011-02-12 13:26:32 +0800

20 Jan, 2011

1 commit

441c793a5 net: cleanup unused macros in net directory ... Browse Code »

Clean up some unused macros in net/*.
1. be left for code change. e.g. PGV_FROM_VMALLOC, PGV_FROM_VMALLOC, KMEM_SAFETYZONE.
2. never be used since introduced to kernel.
e.g. P9_RDMA_MAX_SGE, UTIL_CTRL_PKT_SIZE.

Signed-off-by: Shan Wei
Acked-by: Sjur Braendeland
Signed-off-by: David S. Miller

Shan Wei
2011-01-20 15:20:04 +0800

19 Jan, 2011

1 commit

80f8f1027 net: filter: dont block softirqs in sk_run_filter() ... Browse Code »

Packet filter (BPF) doesnt need to disable softirqs, being fully
re-entrant and lock-less.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-01-19 13:33:05 +0800

17 Dec, 2010

1 commit

55508d601 net: Use skb_checksum_start_offset() ... Browse Code »

Replace skb->csum_start - skb_headroom(skb) with skb_checksum_start_offset().

Note for usb/smsc95xx: skb->data - skb->head == skb_headroom(skb).

Signed-off-by: Michał Mirosław
Signed-off-by: David S. Miller

Michał Mirosław
2010-12-17 06:43:14 +0800

11 Dec, 2010

1 commit

c053fd96d af_packet: use swap() instead of the open coded macro XC() ... Browse Code »

Signed-off-by: Changli Gao
Signed-off-by: David S. Miller

Changli Gao
2010-12-11 08:02:20 +0800

09 Dec, 2010

3 commits

920b8d913 af_packet: fix freeing pg_vec twice on error path ... Browse Code »

It is introduced in:
commit 0e3125c755445664f00ad036e4fc2cd32fd52877
Author: Neil Horman
Date: Tue Nov 16 10:26:47 2010 -0800

packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4)

Signed-off-by: Changli Gao
Signed-off-by: David S. Miller

Changli Gao
2010-12-09 02:43:41 +0800
f6dafa95d af_packet: eliminate pgv_to_page on some arches ... Browse Code »

Some arches don't need flush_dcache_page(), and don't implement it, so
we can eliminate pgv_to_page() calls on those arches.

Signed-off-by: Changli Gao
Signed-off-by: David S. Miller

Changli Gao
2010-12-09 02:43:41 +0800
62ab08121 filter: constify sk_run_filter() ... Browse Code »

sk_run_filter() doesnt write on skb, change its prototype to reflect
this.

Fix two af_packet comments.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-12-09 02:30:34 +0800

07 Dec, 2010

1 commit

c56b4d901 af_packet: remove pgv.flags ... Browse Code »

As we can check if an address is vmalloc address with is_vmalloc_addr(),
we remove pgv.flags. Then we may get more pg_vecs.

Signed-off-by: Changli Gao
Signed-off-by: David S. Miller

Changli Gao
2010-12-07 04:59:07 +0800