Eric Lee / smarc-fsl-linux-kernel

21 Dec, 2011

1 commit

ef0002b57 macvtap: Fix macvtap_get_queue to use rxhash first ... Browse Code »

It was reported that the macvtap device selects a
Acked-by: Michael S. Tsirkin

Signed-off-by: David S. Miller

Krishna Kumar
2011-12-21 02:45:55 +0800

24 Nov, 2011

1 commit

2cfa5a047 net: treewide use of RCU_INIT_POINTER ... Browse Code »

rcu_assign_pointer(ptr, NULL) can be safely replaced by
RCU_INIT_POINTER(ptr, NULL)

(old rcu_assign_pointer() macro was testing the NULL value and could
omit the smp_wmb(), but this had to be removed because of compiler
warnings)

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-11-24 07:48:19 +0800

21 Oct, 2011

5 commits

e09eff7fc macvtap: Fix the minor device number allocation ... Browse Code »

On systems that create and delete lots of dynamic devices the
31bit linux ifindex fails to fit in the 16bit macvtap minor,
resulting in unusable macvtap devices. I have systems running
automated tests that that hit this condition in just a few days.

Use a linux idr allocator to track which mavtap minor numbers
are available and and to track the association between macvtap
minor numbers and macvtap network devices.

Remove the unnecessary unneccessary check to see if the network
device we have found is indeed a macvtap device. With macvtap
specific data structures it is impossible to find any other
kind of networking device.

Increase the macvtap minor range from 65536 to the full 20 bits
that is supported by linux device numbers. It doesn't solve the
original problem but there is no penalty for a larger minor
device range.

Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller

Eric W. Biederman
2011-10-21 14:53:07 +0800
9bf1907f4 macvtap: Rewrite macvtap_newlink so the error handling works. ... Browse Code »

Place macvlan_common_newlink at the end of macvtap_newlink because
failing in newlink after registering your network device is not
supported.

Move device_create into a netdevice creation notifier. The network device
notifier is the only hook that is called after the network device has been
registered with the device layer and before register_network_device returns
success.

Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller

Eric W. Biederman
2011-10-21 14:53:07 +0800
2259fef0b macvtap: Don't leak unreceived packets when we delete a macvtap device. ... Browse Code »

To avoid leaking packets in the receive queue. Add a socket destructor
that will run whenever destroy a macvtap socket.

Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller

Eric W. Biederman
2011-10-21 14:53:07 +0800
047af9cfe macvtap: Fix macvtap_open races in the zero copy enable code. ... Browse Code »

To see if it is appropriate to enable the macvtap zero copy feature
don't test the lowerdev network device flags. Instead test the
macvtap network device flags which are a direct copy of the lowerdev
flags. This is important because nothing holds a reference to lowerdev
and on a very bad day we lowerdev could be a pointer to stale memory.

Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller

Eric W. Biederman
2011-10-21 14:53:07 +0800
99f34b38c macvtap: Close a race between macvtap_open and macvtap_dellink. ... Browse Code »

There is a small window in macvtap_open between looking up a
networking device and calling macvtap_set_queue in which
macvtap_del_queues called from macvtap_dellink. After
calling macvtap_del_queues it is totally incorrect to
allow macvtap_set_queue to proceed so prevent success by
reporting that all of the available queues are in use.

Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller

Eric W. Biederman
2011-10-21 14:53:06 +0800

21 Sep, 2011

1 commit

653fc9155 macvtap: fix the uninitialized var using in macvtap_alloc_skb() ... Browse Code »

Commit d1b08284 use new frag API but would leave f to be used
uninitialized, this patch fix it.

Signed-off-by: Jason Wang
Acked-by: Michael S. Tsirkin
Acked-by: Ian Campbell
Signed-off-by: David S. Miller

Jason Wang
2011-09-21 02:37:22 +0800

16 Sep, 2011

1 commit

d1b08284a macvtap: convert to SKB paged frag API. ... Browse Code »

Signed-off-by: Ian Campbell
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller

Ian Campbell
2011-09-16 03:34:59 +0800

07 Jul, 2011

1 commit

97bc3633b macvtap: macvtapTX zero-copy support ... Browse Code »

Only 128 bytes is copied, the rest of data is DMA mapped directly from
userspace.

Signed-off-by: Shirley Ma
Signed-off-by: David S. Miller

Shirley Ma
2011-07-07 19:41:24 +0800

12 Jun, 2011

1 commit

10a8d94a9 virtio_net: introduce VIRTIO_NET_HDR_F_DATA_VALID ... Browse Code »

There's no need for the guest to validate the checksum if it have been
validated by host nics. So this patch introduces a new flag -
VIRTIO_NET_HDR_F_DATA_VALID which is used to bypass the checksum
examing in guest. The backend (tap/macvtap) may set this flag when
met skbs with CHECKSUM_UNNECESSARY to save cpu utilization.

No feature negotiation is needed as old driver just ignore this flag.

Iperf shows 12%-30% performance improvement for UDP traffic. For TCP,
when gro is on no difference as it produces skb with partial
checksum. But when gro is disabled, 20% or even higher improvement
could be measured by netperf.

Signed-off-by: Jason Wang
Acked-by: Michael S. Tsirkin
Signed-off-by: David S. Miller

Jason Wang
2011-06-12 06:57:47 +0800

11 Mar, 2011

1 commit

33175d84e Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/bnx2x/bnx2x_cmn.c

David S. Miller
2011-03-11 06:26:00 +0800

08 Mar, 2011

1 commit

ce3c86928 drivers/net/macvtap: fix error check ... Browse Code »

'len' is unsigned of type size_t and can't be negative.

Signed-off-by: Nicolas Kaiser
Acked-by: Arnd Bergmann
Signed-off-by: David S. Miller

Nicolas Kaiser
2011-03-08 07:57:58 +0800

28 Jan, 2011

1 commit

13707f9e5 drivers/net: remove some rcu sparse warnings ... Browse Code »

Add missing __rcu annotations and helpers.
minor : Fix some rcu_dereference() calls in macvtap

Signed-off-by: Eric Dumazet
Acked-by: Arnd Bergmann
Acked-by: Michael Chan
Signed-off-by: David S. Miller

Eric Dumazet
2011-01-28 07:02:57 +0800

14 Jan, 2011

1 commit

1ac9ad139 net: remove dev_txq_stats_fold() ... Browse Code »

After recent changes, (percpu stats on vlan/tunnels...), we dont need
anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters.

Only remaining users are ixgbe, sch_teql, gianfar & macvlan :

1) ixgbe can be converted to use existing tx_ring counters.

2) macvlan incremented txq->tx_dropped, it can use the
dev->stats.tx_dropped counter.

3) sch_teql : almost revert ab35cd4b8f42 (Use net_device internal stats)
Now we have ndo_get_stats64(), use it, even for "unsigned long"
fields (No need to bring back a struct net_device_stats)

4) gianfar adds a stats structure per tx queue to hold
tx_bytes/tx_packets

This removes a lockdep warning (and possible lockup) in rndis gadget,
calling dev_get_stats() from hard IRQ context.

Ref: http://www.spinics.net/lists/netdev/msg149202.html

Reported-by: Neil Jones
Signed-off-by: Eric Dumazet
CC: Jarek Poplawski
CC: Alexander Duyck
CC: Jeff Kirsher
CC: Sandeep Gopalpet
CC: Michal Nazarewicz
Signed-off-by: David S. Miller

Eric Dumazet
2011-01-14 13:44:34 +0800

17 Dec, 2010

1 commit

55508d601 net: Use skb_checksum_start_offset() ... Browse Code »

Replace skb->csum_start - skb_headroom(skb) with skb_checksum_start_offset().

Note for usb/smsc95xx: skb->data - skb->head == skb_headroom(skb).

Signed-off-by: Michał Mirosław
Signed-off-by: David S. Miller

Michał Mirosław
2010-12-17 06:43:14 +0800

17 Aug, 2010

1 commit

1565c7c1c macvtap: Implement multiqueue for macvtap driver ... Browse Code »

Implement multiqueue facility for macvtap driver. The idea is that
a macvtap device can be opened multiple times and the fd's can be
used to register eg, as backend for vhost.

Signed-off-by: Krishna Kumar
Signed-off-by: David S. Miller

Krishna Kumar
2010-08-17 12:06:25 +0800

28 Jul, 2010

1 commit

bb7e95c8f Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/bnx2x_main.c

Merge bnx2x bug fixes in by hand... :-/

Signed-off-by: David S. Miller

David S. Miller
2010-07-28 12:01:35 +0800

23 Jul, 2010

1 commit

8a35747a5 macvtap: Limit packet queue length ... Browse Code »

Mark Wagner reported OOM symptoms when sending UDP traffic over
a macvtap link to a kvm receiver.

This appears to be caused by the fact that macvtap packet queues
are unlimited in length. This means that if the receiver can't
keep up with the rate of flow, then we will hit OOM. Of course
it gets worse if the OOM killer then decides to kill the receiver.

This patch imposes a cap on the packet queue length, in the same
way as the tuntap driver, using the device TX queue length.

Please note that macvtap currently has no way of giving congestion
notification, that means the software device TX queue cannot be
used and packets will always be dropped once the macvtap driver
queue fills up.

This shouldn't be a great problem for the scenario where macvtap
is used to feed a kvm receiver, as the traffic is most likely
external in origin so congestion notification can't be applied
anyway.

Of course, if anybody decides to complain about guest-to-guest
UDP packet loss down the track, then we may have to revisit this.

Incidentally, this patch also fixes a real memory leak when
macvtap_get_queue fails.

Chris Wright noticed that for this patch to work, we need a
non-zero TX queue length. This patch includes his work to change
the default macvtap TX queue length to 500.

Reported-by: Mark Wagner
Signed-off-by: Herbert Xu
Acked-by: Chris Wright
Acked-by: Arnd Bergmann
Signed-off-by: David S. Miller

Herbert Xu
2010-07-23 04:08:56 +0800

11 Jul, 2010

1 commit

1ebed71ae macvtap: Use dev_t for macvtap_major. ... Browse Code »

Reported-by: "Robert P. J. Day"
Signed-off-by: David S. Miller

David S. Miller
2010-07-11 10:25:50 +0800

04 May, 2010

1 commit

55afbd081 macvtap: add ioctl to modify vnet header size ... Browse Code »

This adds TUNSETVNETHDRSZ/TUNGETVNETHDRSZ support
to macvtap.

Signed-off-by: Michael S. Tsirkin
Acked-by: Arnd Bergmann
Acked-by: David S. Miller

Michael S. Tsirkin
2010-05-04 06:35:47 +0800

02 May, 2010

1 commit

438154823 net: sock_def_readable() and friends RCU conversion ... Browse Code »

sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
need two atomic operations (and associated dirtying) per incoming
packet.

RCU conversion is pretty much needed :

1) Add a new structure, called "struct socket_wq" to hold all fields
that will need rcu_read_lock() protection (currently: a
wait_queue_head_t and a struct fasync_struct pointer).

[Future patch will add a list anchor for wakeup coalescing]

2) Attach one of such structure to each "struct socket" created in
sock_alloc_inode().

3) Respect RCU grace period when freeing a "struct socket_wq"

4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
socket_wq"

5) Change sk_sleep() function to use new sk->sk_wq instead of
sk->sk_sleep

6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
a rcu_read_lock() section.

7) Change all sk_has_sleeper() callers to :
- Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
- Use wq_has_sleeper() to eventually wakeup tasks.
- Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)

8) sock_wake_async() is modified to use rcu protection as well.

9) Exceptions :
macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
instead of dynamically allocated ones. They dont need rcu freeing.

Some cleanups or followups are probably needed, (possible
sk_callback_lock conversion to a spinlock for example...).

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-05-02 06:00:15 +0800

27 Apr, 2010

1 commit

4a4771a58 net: use sk_sleep() ... Browse Code »

Commit aa395145 (net: sk_sleep() helper) missed three files in the
conversion.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-04-27 02:18:45 +0800

21 Apr, 2010

1 commit

aa3951451 net: sk_sleep() helper ... Browse Code »

Define a new function to return the waitqueue of a "struct sock".

static inline wait_queue_head_t *sk_sleep(struct sock *sk)
{
return sk->sk_sleep;
}

Change all read occurrences of sk_sleep by a call to this function.

Needed for a future RCU conversion. sk_sleep wont be a field directly
available.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-04-21 07:37:13 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

19 Feb, 2010

3 commits

b9fb9ee07 macvtap: add GSO/csum offload support ... Browse Code »

Added flags field to macvtap_queue to enable/disable processing of
virtio_net_hdr via IFF_VNET_HDR. This flag is checked to prepend virtio_net_hdr
in the receive path and process/skip virtio_net_hdr in the send path.

Original patch by Sridhar, further changes by Arnd.

Signed-off-by: Sridhar Samudrala
Signed-off-by: Arnd Bergmann
Signed-off-by: David S. Miller

Arnd Bergmann
2010-02-19 06:08:38 +0800
501c774cb net/macvtap: add vhost support ... Browse Code »

This adds support for passing a macvtap file descriptor into
vhost-net, much like we already do for tun/tap.

Most of the new code is taken from the respective patch
in the tun driver and may get consolidated in the future.

Signed-off-by: Arnd Bergmann
Acked-by: Sridhar Samudrala
Signed-off-by: David S. Miller

Arnd Bergmann
2010-02-19 06:08:38 +0800
02df55d28 macvtap: rework object lifetime rules ... Browse Code »

This reworks the change done by the previous patch
in a more complete way.

The original macvtap code has a number of problems
resulting from the use of RCU for protecting the
access to struct macvtap_queue from open files.

This includes
- need for GFP_ATOMIC allocations for skbs
- potential deadlocks when copy_*_user sleeps
- inability to work with vhost-net

Changing the lifetime of macvtap_queue to always
depend on the open file solves all these. The
RCU reference simply moves one step down to
the reference on the macvlan_dev, which we
only need for nonblocking operations.

Signed-off-by: Arnd Bergmann
Acked-by: Sridhar Samudrala
Signed-off-by: David S. Miller

Arnd Bergmann
2010-02-19 06:08:37 +0800

16 Feb, 2010

1 commit

564517e80 net/macvtap: fix reference counting ... Browse Code »

The RCU usage in the original code was broken because
there are cases where we possibly sleep with rcu_read_lock
held. As a fix, change the macvtap_file_get_queue to
get a reference on the socket and the netdev instead of
taking the full rcu_read_lock.

Also, change macvtap_file_get_queue failure case to
not require a subsequent macvtap_file_put_queue, as
pointed out by Ed Swierk.

Signed-off-by: Arnd Bergmann
Cc: Ed Swierk
Cc: Sridhar Samudrala
Acked-by: Sridhar Samudrala
Acked-by: Ed Swierk
Signed-off-by: David S. Miller

Arnd Bergmann
2010-02-16 13:49:49 +0800

04 Feb, 2010

1 commit

20d29d7a9 net: macvtap driver ... Browse Code »

In order to use macvlan with qemu and other tools that require
a tap file descriptor, the macvtap driver adds a small backend
with a character device with the same interface as the tun
driver, with a minimum set of features.

Macvtap interfaces are created in the same way as macvlan
interfaces using ip link, but the netif is just used as a
handle for configuration and accounting, while the data
goes through the chardev. Each macvtap interface has its
own character device, simplifying permission management
significantly over the generic tun/tap driver.

Cc: Patrick McHardy
Cc: Stephen Hemminger
Cc: David S. Miller"
Cc: "Michael S. Tsirkin"
Cc: Herbert Xu
Cc: Or Gerlitz
Cc: netdev@vger.kernel.org
Cc: bridge@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Arnd Bergmann
Signed-off-by: David S. Miller

Arnd Bergmann
2010-02-04 12:20:33 +0800