Eric Lee / smarc-fsl-linux-kernel

26 Jun, 2015

1 commit

e2f15f9a7 netconsole: implement extended console support ... Browse Code »

printk logbuf keeps various metadata and optional key=value dictionary for
structured messages, both of which are stripped when messages are handed
to regular console drivers.

It can be useful to have this metadata and dictionary available to
netconsole consumers. This obviously makes logging via netconsole more
complete and the sequence number in particular is useful in environments
where messages may be lost or reordered in transit - e.g. when netconsole
is used to collect messages in a large cluster where packets may have to
travel congested hops to reach the aggregator. The lost and reordered
messages can easily be identified and handled accordingly using the
sequence numbers.

printk recently added extended console support which can be selected by
setting CON_EXTENDED flag. From console driver side, not much changes.
The only difference is that the text passed to the write callback is
formatted the same way as /dev/kmsg.

This patch implements extended console support for netconsole which can be
enabled by either prepending "+" to a netconsole boot param entry or
echoing 1 to "extended" file in configfs. When enabled, netconsole
transmits extended log messages with headers identical to /dev/kmsg
output.

There's one complication due to message fragments. netconsole limits the
maximum message size to 1k and messages longer than that are split into
multiple fragments. As all extended console messages should carry
matching headers and be uniquely identifiable, each extended message
fragment carries full copy of the metadata and an extra header field to
identify the specific fragment. The optional header is of the form
"ncfrag=OFF/LEN" where OFF is the byte offset into the message body and
LEN is the total length.

To avoid unnecessarily making printk format extended messages, Extended
netconsole is registered with printk when the first extended netconsole is
configured.

Signed-off-by: Tejun Heo
Cc: David Miller
Cc: Kay Sievers
Cc: Petr Mladek
Cc: Tetsuo Handa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2015-06-26 08:00:39 +0800

25 Jun, 2015

1 commit

1e467e68e Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6 ... Browse Code »

Pull documentation updates from Jonathan Corbet:
"The main thing here is Ingo's big subdirectory documenting feature
support for each architecture. Beyond that, it's the usual pile of
fixes, tweaks, and small additions"

* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (79 commits)
doc:md: fix typo in md.txt.
Documentation/mic/mpssd: don't build x86 userspace when cross compiling
Documentation/prctl: don't build tsc tests when cross compiling
Documentation/vDSO: don't build tests when cross compiling
Doc:ABI/testing: Fix typo in sysfs-bus-fcoe
Doc: Docbook: Change wikipedia's URL from http to https in scsi.tmpl
Doc: Change wikipedia's URL from http to https
Documentation/kernel-parameters: add missing pciserial to the earlyprintk
Doc:pps: Fix typo in pps.txt
kbuild : Fix documentation of INSTALL_HDR_PATH
Documentation: filesystems: updated struct file_operations documentation in vfs.txt
kbuild: edit explanation of clean-files variable
Doc: ja_JP: Fix typo in HOWTO
Move freefall program from Documentation/ to tools/
Documentation: ARM: EXYNOS: Describe boot loaders interface
Doc:nfc: Fix typo in nfc-hci.txt
vfs: Minor documentation fix
Doc: networking: txtimestamp: fix printf format warning
Documentation, intel_pstate: Improve legacy mode internal governors description
Documentation: extend use case for EXPORT_SYMBOL_GPL()
...

Linus Torvalds
2015-06-25 11:01:36 +0800

23 Jun, 2015

1 commit

ae13c65bc Doc: Change wikipedia's URL from http to https ... Browse Code »

Recently wikipedia announced to secure access to the servers.
Now all http access re-route to https.

Signed-off-by: Masanari Iida
Signed-off-by: Jonathan Corbet

Masanari Iida
2015-06-23 00:14:05 +0800

16 Jun, 2015

1 commit

b4ad7baa0 bridge: del external_learned fdbs from device on flush or ageout ... Browse Code »

We need to delete from offload the device externally learnded fdbs when any
one of these events happen:

1) Bridge ages out fdb. (When bridge is doing ageing vs. device doing
ageing. If device is doing ageing, it would send SWITCHDEV_FDB_DEL
directly).

2) STP state change flushes fdbs on port.

3) User uses sysfs interface to flush fdbs from bridge or bridge port:

echo 1 >/sys/class/net/BR_DEV/bridge/flush
echo 1 >/sys/class/net/BR_PORT/brport/flush

4) Offload driver send event SWITCHDEV_FDB_DEL to delete fdb entry.

For rocker, we can now get called to delete fdb entry in wait and nowait
contexts, so set NOWAIT flag when deleting fdb entry.

Signed-off-by: Scott Feldman
Signed-off-by: David S. Miller

Scott Feldman
2015-06-16 08:08:49 +0800

14 Jun, 2015

1 commit

25c43bf13 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
2015-06-14 14:56:52 +0800

13 Jun, 2015

1 commit

b07d49617 Doc: networking: Fix URL for wiki.wireshark.org in udplite.txt ... Browse Code »

This patch fix URL (http to https) for wiki.wireshark.org.

Signed-off-by: Masanari Iida
Signed-off-by: David S. Miller

Masanari Iida
2015-06-13 05:21:29 +0800

05 Jun, 2015

1 commit

03e8f01a6 Doc: networking: txtimestamp: fix printf format warning ... Browse Code »

Documentation/networking/timestamping/txtimestamp.c: In function ‘__print_timestamp’:
Documentation/networking/timestamping/txtimestamp.c:99:3: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘int64_t’ [-Wformat=]
fprintf(stderr, " (%+ld us)", cur_ms - prev_ms);

int64_t differs per platform, so a type specifier that differs along
with it is required.

Signed-off-by: Frans Klaver
Signed-off-by: Jonathan Corbet

Frans Klaver
2015-06-05 06:59:10 +0800

04 Jun, 2015

4 commits

7616dcbb2 switchdev: documentation: use switchdev_port_obj_xxx for IPv4 FIB add/modify/delete ops ... Browse Code »

Clarify in documentation and code that IPV4 FIB add operation is used for
both adding a new FIB entry to the device and for modifying an existing FIB
entry on the device.

Also, remove left-over references to ipv4_fib ops and replace with details
on SWITCHDEV_PORT_IPV4_FIB object.

Signed-off-by: Scott Feldman
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Scott Feldman
2015-06-04 14:47:23 +0800
4b5364fbd switchdev: documentation: for static FDB ops, use switchdev_port_fdb_xxx ops ... Browse Code »

Signed-off-by: Scott Feldman
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Scott Feldman
2015-06-04 14:47:23 +0800
d290f1fc7 switchdev: documentation: fix grammer error ... Browse Code »

Signed-off-by: Scott Feldman
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Scott Feldman
2015-06-04 14:47:23 +0800
f5ed2febd switchdev: documentation: fix longer-than-80-char lines ... Browse Code »

Signed-off-by: Scott Feldman
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Scott Feldman
2015-06-04 14:47:23 +0800

31 May, 2015

1 commit

9d52bf0a2 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/blu… ... Browse Code »

…etooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2015-05-28

Here's a set of patches intended for 4.2. The majority of the changes
are on the 802.15.4 side of things rather than Bluetooth related:

- All sorts of cleanups & fixes to ieee802154 and related drivers
- Rework of tx power support in ieee802154 and its drivers
- Support for setting ieee802154 tx power through nl802154
- New IDs for the btusb driver
- Various cleanups & smaller fixes to btusb
- New btrtl driver for Realtec devices
- Fix suspend/resume for Realtek devices

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

David S. Miller
2015-05-31 14:26:45 +0800

28 May, 2015

1 commit

07f4c9006 tcp/dccp: try to not exhaust ip_local_port_range in connect() ... Browse Code »

A long standing problem on busy servers is the tiny available TCP port
range (/proc/sys/net/ipv4/ip_local_port_range) and the default
sequential allocation of source ports in connect() system call.

If a host is having a lot of active TCP sessions, chances are
very high that all ports are in use by at least one flow,
and subsequent bind(0) attempts fail, or have to scan a big portion of
space to find a slot.

In this patch, I changed the starting point in __inet_hash_connect()
so that we try to favor even [1] ports, leaving odd ports for bind()
users.

We still perform a sequential search, so there is no guarantee, but
if connect() targets are very different, end result is we leave
more ports available to bind(), and we spread them all over the range,
lowering time for both connect() and bind() to find a slot.

This strategy only works well if /proc/sys/net/ipv4/ip_local_port_range
is even, ie if start/end values have different parity.

Therefore, default /proc/sys/net/ipv4/ip_local_port_range was changed to
32768 - 60999 (instead of 32768 - 61000)

There is no change on security aspects here, only some poor hashing
schemes could be eventually impacted by this change.

[1] : The odd/even property depends on ip_local_port_range values parity

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2015-05-28 01:30:44 +0800

27 May, 2015

1 commit

b251d4de6 Documentation/networking/ieee802154.txt: fix various inaccuracies. ... Browse Code »

* Update the linux-zigbee git:// repository URL.

* Remove the MLME section as the current kernel does not provide a
full 802.15.4 MLME implementation.

* The hardmac example driver 'fakehard' was removed some time ago.

* The IEEE 802.15.4 device drivers live in drivers/net/ieee802154/,
not in drivers/ieee802154/.

* The IEEE 802.15.4 MTU is 127 bytes, not 128 bytes.

* Some of the 6LoWPAN code lives in net/6lowpan/.

Signed-off-by: Lennert Buytenhek
Reviewed-by: Stefan Schmidt
Acked-by: Alexander Aring
Signed-off-by: Marcel Holtmann

Lennert Buytenhek
2015-05-27 02:25:58 +0800

23 May, 2015

5 commits

282fb5894 pktgen: add sample script pktgen_sample02_multiqueue.sh ... Browse Code »

Add the pktgen samples script pktgen_sample02_multiqueue.sh that
demonstrates generating packets on multiqueue NICs.

Specifically notice the options "-t" that specifies how many
kernel threads to activate. Also notice the flag QUEUE_MAP_CPU,
which cause the SKB TX queue to be mapped to the CPU running the
kernel thread. For best scalability people are also encourage to
map NIC IRQ /proc/irq/*/smp_affinity to CPU number.

Usage example with "-t" 4 threads and help:
./pktgen_sample02_multiqueue.sh -i eth4 -m 00:1B:21:3C:9D:F8 -t 4

Usage: ./pktgen_sample02_multiqueue.sh [-vx] -i ethX
-i : ($DEV) output interface/device (required)
-s : ($PKT_SIZE) packet size
-d : ($DEST_IP) destination IP
-m : ($DST_MAC) destination MAC-addr
-t : ($THREADS) threads to start
-c : ($SKB_CLONE) SKB clones send before alloc new SKB
-b : ($BURST) HW level bursting of SKBs
-v : ($VERBOSE) verbose
-x : ($DEBUG) debug

Removing pktgen.conf-2-1 and pktgen.conf-2-2 as these examples
should be covered now.

Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2015-05-23 11:59:17 +0800
6f0947975 pktgen: add sample script pktgen_sample01_simple.sh ... Browse Code »

Add the first basic pktgen samples script pktgen_sample01_simple.sh,
which demonstrates the a simple use of the helper functions.
Removing pktgen.conf-1-1 as that example should be covered now.

The naming scheme pktgen_sampleNN, where NN is a number, should encourage
reading the samples in a specific order.

Script cause pktgen sending with a single thread and single interface,
and introduce flow variation via random UDP source port.

Usage example and help:
./pktgen_sample01_simple.sh -i eth4 -m 00:1B:21:3C:9D:F8 -d 192.168.8.2

Usage: ./pktgen_sample01_simple.sh [-vx] -i ethX
-i : ($DEV) output interface/device (required)
-s : ($PKT_SIZE) packet size
-d : ($DEST_IP) destination IP
-m : ($DST_MAC) destination MAC-addr
-c : ($SKB_CLONE) SKB clones send before alloc new SKB
-v : ($VERBOSE) verbose
-x : ($DEBUG) debug

Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2015-05-23 11:59:17 +0800
2a1ddf27e pktgen: document ability to add same device to several threads ... Browse Code »

The pktgen.txt documentation still claimed that adding same device to
multiple threads were not supported, but it have been since 2008 via
commit e6fce5b916cd7 ("pktgen: multiqueue etc.").

Document this and describe the naming scheme dev@X, as the procfile name
still need to be unique.

Fixes: e6fce5b916cd7 ("pktgen: multiqueue etc.")
Signed-off-by: Jesper Dangaard Brouer
Acked-by: Alexei Starovoitov
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2015-05-23 11:59:16 +0800
91db4b3c8 pktgen: doc were missing several config options ... Browse Code »

The pktgen.txt documentation over available config options were not complete.
Making the list complete by adding the following.

Pgcontrol commands:
reset

Device commands:
burst
queue_map_min
queue_map_max
skb_priority
tos
traffic_class
node
spi
dst6_max
dst6_min
vlan_cfi
vlan_id
vlan_p
svlan_cfi
svlan_id
svlan_p

Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2015-05-23 11:59:16 +0800
d012827e8 pktgen: remove obsolete "max_before_softirq" from pktgen doc ... Browse Code »

And cleanup some whitespaces in pktgen.txt.

Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2015-05-23 11:59:16 +0800

20 May, 2015

1 commit

492135557 tcp: add rfc3168, section 6.1.1.1. fallback ... Browse Code »

This work as a follow-up of commit f7b3bec6f516 ("net: allow setting ecn
via routing table") and adds RFC3168 section 6.1.1.1. fallback for outgoing
ECN connections. In other words, this work adds a retry with a non-ECN
setup SYN packet, as suggested from the RFC on the first timeout:

[...] A host that receives no reply to an ECN-setup SYN within the
normal SYN retransmission timeout interval MAY resend the SYN and
any subsequent SYN retransmissions with CWR and ECE cleared. [...]

Schematic client-side view when assuming the server is in tcp_ecn=2 mode,
that is, Linux default since 2009 via commit 255cac91c3c9 ("tcp: extend
ECN sysctl to allow server-side only ECN"):

1) Normal ECN-capable path:

SYN ECE CWR ----->

2) Path with broken middlebox, when client has fallback:

SYN ECE CWR ----X crappy middlebox drops packet
(timeout, rtx)
SYN ----->

In case we would not have the fallback implemented, the middlebox drop
point would basically end up as:

SYN ECE CWR ----X crappy middlebox drops packet
(timeout, rtx)
SYN ECE CWR ----X crappy middlebox drops packet
(timeout, rtx)
SYN ECE CWR ----X crappy middlebox drops packet
(timeout, rtx)

In any case, it's rather a smaller percentage of sites where there would
occur such additional setup latency: it was found in end of 2014 that ~56%
of IPv4 and 65% of IPv6 servers of Alexa 1 million list would negotiate
ECN (aka tcp_ecn=2 default), 0.42% of these webservers will fail to connect
when trying to negotiate with ECN (tcp_ecn=1) due to timeouts, which the
fallback would mitigate with a slight latency trade-off. Recent related
paper on this topic:

Brian Trammell, Mirja Kühlewind, Damiano Boppart, Iain Learmonth,
Gorry Fairhurst, and Richard Scheffenegger:
"Enabling Internet-Wide Deployment of Explicit Congestion Notification."
Proc. PAM 2015, New York.
http://ecn.ethz.ch/ecn-pam15.pdf

Thus, when net.ipv4.tcp_ecn=1 is being set, the patch will perform RFC3168,
section 6.1.1.1. fallback on timeout. For users explicitly not wanting this
which can be in DC use case, we add a net.ipv4.tcp_ecn_fallback knob that
allows for disabling the fallback.

tp->ecn_flags are not being cleared in tcp_ecn_clear_syn() on output, but
rather we let tcp_ecn_rcv_synack() take that over on input path in case a
SYN ACK ECE was delayed. Thus a spurious SYN retransmission will not prevent
ECN being negotiated eventually in that case.

Reference: https://www.ietf.org/proceedings/92/slides/slides-92-iccrg-1.pdf
Reference: https://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf
Signed-off-by: Daniel Borkmann
Signed-off-by: Florian Westphal
Signed-off-by: Mirja Kühlewind
Signed-off-by: Brian Trammell
Cc: Eric Dumazet
Cc: Dave That
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Daniel Borkmann
2015-05-20 04:53:37 +0800

14 May, 2015

2 commits

e578d9c02 net: sched: use counter to break reclassify loops ... Browse Code »

Seems all we want here is to avoid endless 'goto reclassify' loop.
tc_classify_compat even resets this counter when something other
than TC_ACT_RECLASSIFY is returned, so this skb-counter doesn't
break hypothetical loops induced by something other than perpetual
TC_ACT_RECLASSIFY return values.

skb_act_clone is now identical to skb_clone, so just use that.

Tested with following (bogus) filter:
tc filter add dev eth0 parent ffff: \
protocol ip u32 match u32 0 0 police rate 10Kbit burst \
64000 mtu 1500 action reclassify

Acked-by: Daniel Borkmann
Signed-off-by: Florian Westphal
Acked-by: Alexei Starovoitov
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Florian Westphal
2015-05-14 03:08:14 +0800
1f5dc44c8 switchdev: apply review comments on documentation ... Browse Code »

There were a few review comments on the switchdev.txt documentation that
didn't get included with the Spring Cleanup series, so include them now.

Signed-off-by: Scott Feldman
Signed-off-by: David S. Miller

Scott Feldman
2015-05-14 00:26:28 +0800

13 May, 2015

1 commit

4ceec22d6 switchdev: bring documentation up-to-date ... Browse Code »

Much need updated of switchdev documentation to cover what's been
implmented to-date. There are some XXX comments in the text for
unimplemented or broken items. I'd like to keep these in there (poor-man's
TODO list) and update the document once each issue is resolved.

Signed-off-by: Scott Feldman
Acked-by: Jiri Pirko
Signed-off-by: David S. Miller

Scott Feldman
2015-05-13 06:43:56 +0800

11 May, 2015

3 commits

d22a5fc0c bonding: Implement user key part of port_key in an AD system. ... Browse Code »

The port key has three components - user-key, speed-part, and duplex-part.
The LSBit is for the duplex-part, next 5 bits are for the speed while the
remaining 10 bits are the user defined key bits. Get these 10 bits
from the user-space (through the SysFs interface) and use it to form the
admin port-key. Allowed range for the user-key is 0 - 1023 (10 bits). If
it is not provided then use zero for the user-key-bits (default).

It can set using following example code -

# modprobe bonding mode=4
# usr_port_key=$(( RANDOM & 0x3FF ))
# echo $usr_port_key > /sys/class/net/bond0/bonding/ad_user_port_key
# echo +eth1 > /sys/class/net/bond0/bonding/slaves
...
# ip link set bond0 up

Signed-off-by: Mahesh Bandewar
Reviewed-by: Nikolay Aleksandrov
[jt: * fixed up style issues reported by checkpatch
* fixed up context from change in ad_actor_sys_prio patch]
Signed-off-by: Jonathan Toppins
Signed-off-by: David S. Miller

Mahesh Bandewar
2015-05-11 22:59:32 +0800
745149575 bonding: Allow userspace to set actors' macaddr in an AD-system. ... Browse Code »

In an AD system, the communication between actor and partner is the
business between these two entities. In the current setup anyone on the
same L2 can "guess" the LACPDU contents and then possibly send the
spoofed LACPDUs and trick the partner causing connectivity issues for
the AD system. This patch allows to use a random mac-address obscuring
it's identity making it harder for someone in the L2 is do the same thing.

This patch allows user-space to choose the mac-address for the AD-system.
This mac-address can not be NULL or a Multicast. If the mac-address is set
from user-space; kernel will honor it and will not overwrite it. In the
absence (value from user space); the logic will default to using the
masters' mac as the mac-address for the AD-system.

It can be set using example code below -

# modprobe bonding mode=4
# sys_mac_addr=$(printf '%02x:%02x:%02x:%02x:%02x:%02x' \
$(( (RANDOM & 0xFE) | 0x02 )) \
$(( RANDOM & 0xFF )) \
$(( RANDOM & 0xFF )) \
$(( RANDOM & 0xFF )) \
$(( RANDOM & 0xFF )) \
$(( RANDOM & 0xFF )))
# echo $sys_mac_addr > /sys/class/net/bond0/bonding/ad_actor_system
# echo +eth1 > /sys/class/net/bond0/bonding/slaves
...
# ip link set bond0 up

Signed-off-by: Mahesh Bandewar
Reviewed-by: Nikolay Aleksandrov
[jt: fixed up style issues reported by checkpatch]
Signed-off-by: Jonathan Toppins
Signed-off-by: David S. Miller

Mahesh Bandewar
2015-05-11 22:59:32 +0800
6791e4661 bonding: Allow userspace to set actors' system_priority in AD system ... Browse Code »

This patch allows user to randomize the system-priority in an ad-system.
The allowed range is 1 - 0xFFFF while default value is 0xFFFF. If user
does not specify this value, the system defaults to 0xFFFF, which is
what it was before this patch.

Following example code could set the value -
# modprobe bonding mode=4
# sys_prio=$(( 1 + RANDOM + RANDOM ))
# echo $sys_prio > /sys/class/net/bond0/bonding/ad_actor_sys_prio
# echo +eth1 > /sys/class/net/bond0/bonding/slaves
...
# ip link set bond0 up

Signed-off-by: Mahesh Bandewar
Reviewed-by: Nikolay Aleksandrov
[jt: * fixed up style issues reported by checkpatch
* changed how the default value is set in bond_check_params(), this
makes the default consistent between what gets set for a new bond
and what the default is claimed to be in the bonding options.]
Signed-off-by: Jonathan Toppins
Signed-off-by: David S. Miller

Mahesh Bandewar
2015-05-11 22:59:31 +0800

10 May, 2015

2 commits

62f64aed6 pktgen: introduce xmit_mode '<start_xmit|netif_receive>' ... Browse Code »

Introduce xmit_mode 'netif_receive' for pktgen which generates the
packets using familiar pktgen commands, but feeds them into
netif_receive_skb() instead of ndo_start_xmit().

Default mode is called 'start_xmit'.

It is designed to test netif_receive_skb and ingress qdisc
performace only. Make sure to understand how it works before
using it for other rx benchmarking.

Sample script 'pktgen.sh':
\#!/bin/bash
function pgset() {
local result

echo $1 > $PGDEV

result=`cat $PGDEV | fgrep "Result: OK:"`
if [ "$result" = "" ]; then
cat $PGDEV | fgrep Result:
fi
}

[ -z "$1" ] && echo "Usage: $0 DEV" && exit 1
ETH=$1

PGDEV=/proc/net/pktgen/kpktgend_0
pgset "rem_device_all"
pgset "add_device $ETH"

PGDEV=/proc/net/pktgen/$ETH
pgset "xmit_mode netif_receive"
pgset "pkt_size 60"
pgset "dst 198.18.0.1"
pgset "dst_mac 90:e2:ba:ff:ff:ff"
pgset "count 10000000"
pgset "burst 32"

PGDEV=/proc/net/pktgen/pgctrl
echo "Running... ctrl^C to stop"
pgset "start"
echo "Done"
cat /proc/net/pktgen/$ETH

Usage:
$ sudo ./pktgen.sh eth2
...
Result: OK: 232376(c232372+d3) usec, 10000000 (60byte,0frags)
43033682pps 20656Mb/sec (20656167360bps) errors: 10000000

Raw netif_receive_skb speed should be ~43 million packet
per second on 3.7Ghz x86 and 'perf report' should look like:
37.69% kpktgend_0 [kernel.vmlinux] [k] __netif_receive_skb_core
25.81% kpktgend_0 [kernel.vmlinux] [k] kfree_skb
7.22% kpktgend_0 [kernel.vmlinux] [k] ip_rcv
5.68% kpktgend_0 [pktgen] [k] pktgen_thread_worker

If fib_table_lookup is seen on top, it means skb was processed
by the stack. To benchmark netif_receive_skb only make sure
that 'dst_mac' of your pktgen script is different from
receiving device mac and it will be dropped by ip_rcv

Signed-off-by: Alexei Starovoitov
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Alexei Starovoitov
2015-05-10 10:26:06 +0800
f1f00d8ff pktgen: adjust flag NO_TIMESTAMP to be more pktgen compliant ... Browse Code »

Allow flag NO_TIMESTAMP to turn timestamping on again, like other flags,
with a negation of the flag like !NO_TIMESTAMP.

Also document the option flag NO_TIMESTAMP.

Fixes: afb84b626184 ("pktgen: add flag NO_TIMESTAMP to disable timestamping")
Signed-off-by: Jesper Dangaard Brouer
Signed-off-by: David S. Miller

Jesper Dangaard Brouer
2015-05-10 10:26:06 +0800

06 May, 2015

1 commit

a2f118359 can.h: make padding given by gcc explicit ... Browse Code »

The current definition of struct can_frame has a 16-byte size, with 8-byte
alignment, but the 3 bytes of padding are not explicit like the similar 2 bytes
of padding of struct canfd_frame. Make it explicit so it is easier to read.

Signed-off-by: Shawn Landden
Acked-by: Oliver Hartkopp
Signed-off-by: Marc Kleine-Budde

Shawn Landden
2015-05-06 14:03:19 +0800

04 May, 2015

1 commit

82a584b7c ipv6: Flow label state ranges ... Browse Code »

This patch divides the IPv6 flow label space into two ranges:
0-7ffff is reserved for flow label manager, 80000-fffff will be
used for creating auto flow labels (per RFC6438). This only affects how
labels are set on transmit, it does not affect receive. This range split
can be disbaled by systcl.

Background:

IPv6 flow labels have been an unmitigated disappointment thus far
in the lifetime of IPv6. Support in HW devices to use them for ECMP
is lacking, and OSes don't turn them on by default. If we had these
we could get much better hashing in IPv6 networks without resorting
to DPI, possibly eliminating some of the motivations to to define new
encaps in UDP just for getting ECMP.

Unfortunately, the initial specfications of IPv6 did not clarify
how they are to be used. There has always been a vague concept that
these can be used for ECMP, flow hashing, etc. and we do now have a
good standard how to this in RFC6438. The problem is that flow labels
can be either stateful or stateless (as in RFC6438), and we are
presented with the possibility that a stateless label may collide
with a stateful one. Attempts to split the flow label space were
rejected in IETF. When we added support in Linux for RFC6438, we
could not turn on flow labels by default due to this conflict.

This patch splits the flow label space and should give us
a path to enabling auto flow labels by default for all IPv6 packets.
This is an API change so we need to consider compatibility with
existing deployment. The stateful range is chosen to be the lower
values in hopes that most uses would have chosen small numbers.

Once we resolve the stateless/stateful issue, we can proceed to
look at enabling RFC6438 flow labels by default (starting with
scaled testing).

Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller

Tom Herbert
2015-05-04 09:58:01 +0800

03 May, 2015

1 commit

4749c3ef8 net: sched: remove TC_MUNGED bits ... Browse Code »

Not used.

pedit sets TC_MUNGED when packet content was altered, but all the core
does is unset MUNGED again and then set OK2MUNGE.

And the latter isn't tested anywhere. So lets remove both
TC_MUNGED and TC_OK2MUNGE.

Signed-off-by: Florian Westphal
Acked-by: Alexei Starovoitov
Acked-by: Daniel Borkmann
Acked-by: Jamal Hadi Salim
Signed-off-by: David S. Miller

Florian Westphal
2015-05-03 10:25:17 +0800

27 Apr, 2015

1 commit

a31196b07 net: rfs: fix crash in get_rps_cpus() ... Browse Code »

Commit 567e4b79731c ("net: rfs: add hash collision detection") had one
mistake :

RPS_NO_CPU is no longer the marker for invalid cpu in set_rps_cpu()
and get_rps_cpu(), as @next_cpu was the result of an AND with
rps_cpu_mask

This bug showed up on a host with 72 cpus :
next_cpu was 0x7f, and the code was trying to access percpu data of an
non existent cpu.

In a follow up patch, we might get rid of compares against nr_cpu_ids,
if we init the tables with 0. This is silly to test for a very unlikely
condition that exists only shortly after table initialization, as
we got rid of rps_reset_sock_flow() and similar functions that were
writing this RPS_NO_CPU magic value at flow dismantle : When table is
old enough, it never contains this value anymore.

Fixes: 567e4b79731c ("net: rfs: add hash collision detection")
Signed-off-by: Eric Dumazet
Cc: Tom Herbert
Cc: Ben Hutchings
Signed-off-by: David S. Miller

Eric Dumazet
2015-04-27 04:07:57 +0800

23 Apr, 2015

1 commit

37bde7997 mpls: Per-device enabling of packet input ... Browse Code »

An MPLS network is a single trust domain where the edges must be in
control of what labels make their way into the core. The simplest way
of ensuring this is for the edge device to always impose the labels,
and not allow forward labeled traffic from untrusted neighbours. This
is achieved by allowing a per-device configuration of whether MPLS
traffic input from that interface should be processed or not.

To be secure by default, the default state is changed to MPLS being
disabled on all interfaces unless explicitly enabled and no global
option is provided to change the default. Whilst this differs from
other protocols (e.g. IPv6), network operators are used to explicitly
enabling MPLS forwarding on interfaces, and with the number of links
to the MPLS core typically fairly low this doesn't present too much of
a burden on operators.

Cc: "Eric W. Biederman"
Signed-off-by: Robert Shearman
Reviewed-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Robert Shearman
2015-04-23 02:24:54 +0800

15 Apr, 2015

1 commit

87ffabb1f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

The dwmac-socfpga.c conflict was a case of a bug fix overlapping
changes in net-next to handle an error pointer differently.

Signed-off-by: David S. Miller

David S. Miller
2015-04-15 03:44:14 +0800

10 Apr, 2015

3 commits

322d3ed16 ixgb: remove references to ifconfig ... Browse Code »

Move documentation into this century, even if this device hasn't
been available for some time.

Signed-off-by: Stephen Hemminger
Signed-off-by: Jeff Kirsher

Stephen Hemminger
2015-04-10 13:04:04 +0800
d7018be0a ixgbe: fix documentation ... Browse Code »

The MTU values in the documentation do not match the source.
The source has frame limit of IXGBE_MAX_JUMBO_FRAME_SIZE (9728)
which is MTU of 9710 because of the accounting for Ethernet header
and CRC.

Also, don't refer to the obsolete ifconfig command.

Signed-off-by: Stephen Hemminger
Signed-off-by: Jeff Kirsher

Stephen Hemminger
2015-04-10 13:03:54 +0800
c0a34ebd4 igb: doc don't refer to ifconfig ... Browse Code »

ifconfig command is obsolete, best to remove all references so that
new users learn ip.

Signed-off-by: Stephen Hemminger
Signed-off-by: Jeff Kirsher

Stephen Hemminger
2015-04-10 13:03:20 +0800

09 Apr, 2015

1 commit

ebe96e641 RDS: Documentation: Document AF_RDS, PF_RDS and SOL_RDS correctly. ... Browse Code »

AF_RDS, PF_RDS and SOL_RDS are available in header files,
and there is no need to get their values from /proc. Document
this correctly.

Fixes: 0c5f9b8830aa ("RDS: Documentation")

Signed-off-by: Sowmini Varadhan
Signed-off-by: David S. Miller

Sowmini Varadhan
2015-04-09 03:17:04 +0800

01 Apr, 2015

1 commit

a5581ef4c can: introduce new raw socket option to join the given CAN filters ... Browse Code »

The CAN_RAW socket can set multiple CAN identifier specific filters that lead
to multiple filters in the af_can.c filter processing. These filters are
indenpendent from each other which leads to logical OR'ed filters when applied.

This socket option joines the given CAN filters in the way that only CAN frames
are passed to user space that matched *all* given CAN filters. The semantic for
the applied filters is therefore changed to a logical AND.

This is useful especially when the filterset is a combination of filters where
the CAN_INV_FILTER flag is set in order to notch single CAN IDs or CAN ID
ranges from the incoming traffic.

As the raw_rcv() function is executed from NET_RX softirq the introduced
variables are implemented as per-CPU variables to avoid extensive locking at
CAN frame reception time.

Signed-off-by: Oliver Hartkopp
Signed-off-by: Marc Kleine-Budde

Oliver Hartkopp
2015-04-01 17:28:22 +0800

25 Mar, 2015

1 commit

27cd54524 filter: introduce SKF_AD_VLAN_TPID BPF extension ... Browse Code »

If vlan offloading takes place then vlan header is removed from frame
and its contents, both vlan_tci and vlan_proto, is available to user
space via TPACKET interface. However, only vlan_tci can be used in BPF
filters.

This commit introduces a new BPF extension. It makes possible to load
the value of vlan_proto (vlan TPID) to register A. Support for classic
BPF and eBPF is being added, analogous to skb->protocol.

Cc: Daniel Borkmann
Cc: Alexei Starovoitov
Cc: Jiri Pirko

Signed-off-by: Michal Sekletar
Acked-by: Daniel Borkmann
Acked-by: Alexei Starovoitov
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller

Michal Sekletar
2015-03-25 03:25:15 +0800