Eric Lee / smarc-fsl-linux-kernel

21 Aug, 2013

1 commit

89d5e2321 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Conflicts:
net/netfilter/nf_conntrack_proto_tcp.c

The conflict had to do with overlapping changes dealing with
fixing the use of an "s32" to hold the value returned by
NAT_OFFSET().

Pablo Neira Ayuso says:

====================
The following batch contains Netfilter/IPVS updates for your net-next tree.
More specifically, they are:

* Trivial typo fix in xt_addrtype, from Phil Oester.

* Remove net_ratelimit in the conntrack logging for consistency with other
logging subsystem, from Patrick McHardy.

* Remove unneeded includes from the recently added xt_connlabel support, from
Florian Westphal.

* Allow to update conntracks via nfqueue, don't need NFQA_CFG_F_CONNTRACK for
this, from Florian Westphal.

* Remove tproxy core, now that we have socket early demux, from Florian
Westphal.

* A couple of patches to refactor conntrack event reporting to save a good
bunch of lines, from Florian Westphal.

* Fix missing locking in NAT sequence adjustment, it did not manifested in
any known bug so far, from Patrick McHardy.

* Change sequence number adjustment variable to 32 bits, to delay the
possible early overflow in long standing connections, also from Patrick.

* Comestic cleanups for IPVS, from Dragos Foianu.

* Fix possible null dereference in IPVS in the SH scheduler, from Daniel
Borkmann.

* Allow to attach conntrack expectations via nfqueue. Before this patch, you
had to use ctnetlink instead, thus, we save the conntrack lookup.

* Export xt_rpfilter and xt_HMARK header files, from Nicolas Dichtel.
====================

Signed-off-by: David S. Miller

David S. Miller
2013-08-21 04:30:54 +0800

14 Aug, 2013

1 commit

fc4eba58b ipv6: make unsolicited report intervals configurable for mld ... Browse Code »

Commit cab70040dfd95ee32144f02fade64f0cb94f31a0 ("net: igmp:
Reduce Unsolicited report interval to 1s when using IGMPv3") and
2690048c01f32bf45d1c1e1ab3079bc10ad2aea7 ("net: igmp: Allow user-space
configuration of igmp unsolicited report interval") by William Manley made
igmp unsolicited report intervals configurable per interface and corrected
the interval of unsolicited igmpv3 report messages resendings to 1s.

Same needs to be done for IPv6:

MLDv1 (RFC2710 7.10.): 10 seconds
MLDv2 (RFC3810 9.11.): 1 second

Both intervals are configurable via new procfs knobs
mldv1_unsolicited_report_interval and mldv2_unsolicited_report_interval.

(also added .force_mld_version to ipv6_devconf_dflt to bring structs in
line without semantic changes)

v2:
a) Joined documentation update for IPv4 and IPv6 MLD/IGMP
unsolicited_report_interval procfs knobs.
b) incorporate stylistic feedback from William Manley

v3:
a) add new DEVCONF_* values to the end of the enum (thanks to David
Miller)

Cc: Cong Wang
Cc: William Manley
Cc: Benjamin LaHaise
Cc: YOSHIFUJI Hideaki
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-08-14 08:05:04 +0800

10 Aug, 2013

2 commits

71acc0ddd Revert "net: sctp: convert sctp_checksum_disable module param into sctp sysctl" ... Browse Code »

This reverts commit cda5f98e36576596b9230483ec52bff3cc97eb21.

As per Vlad's request.

Signed-off-by: David S. Miller

David S. Miller
2013-08-10 04:09:41 +0800
cda5f98e3 net: sctp: convert sctp_checksum_disable module param into sctp sysctl ... Browse Code »

Get rid of the last module parameter for SCTP and make this
configurable via sysctl for SCTP like all the rest of SCTP's
configuration knobs.

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2013-08-10 02:33:02 +0800

01 Aug, 2013

1 commit

49dfe7626 Documentation: add networking/netdev-FAQ.txt ... Browse Code »

A collection of expectations and operational details about how
networking development takes place in the context of the netdev
mailing list.

The content is meant to capture specific items that are unique
to netdev workflow, and not re-document generic linux expectations
that are already captured elsewhere.

This was originally proposed[1] as a regular posting mailing list
FAQ, but it probably is more universally accessible here in tree.

[1] https://lwn.net/Articles/559211/

Signed-off-by: Paul Gortmaker
Signed-off-by: David S. Miller

Paul Gortmaker
2013-08-01 08:36:29 +0800

31 Jul, 2013

2 commits

fd158d79d netfilter: tproxy: remove nf_tproxy_core, keep tw sk assigned to skb ... Browse Code »

The module was "permanent", due to the special tproxy skb->destructor.
Nowadays we have tcp early demux and its sock_edemux destructor in
networking core which can be used instead.

Thanks to early demux changes the input path now also handles
"skb->sk is tw socket" correctly, so this no longer needs the special
handling introduced with commit d503b30bd648b3cb4e5f50b65d27e389960cc6d9
(netfilter: tproxy: do not assign timewait sockets to skb->sk).

Thus:
- move assign_sock function to where its needed
- don't prevent timewait sockets from being assigned to the skb
- remove nf_tproxy_core.

Signed-off-by: Florian Westphal
Signed-off-by: Pablo Neira Ayuso

Florian Westphal
2013-07-31 22:39:40 +0800
5ad37d5de tcp: add tcp_syncookies mode to allow unconditionally generation of syncookies ... Browse Code »

| If you want to test which effects syncookies have to your
| network connections you can set this knob to 2 to enable
| unconditionally generation of syncookies.

Original idea and first implementation by Eric Dumazet.

Cc: Florian Westphal
Cc: David Miller
Signed-off-by: Eric Dumazet
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-07-31 07:15:18 +0800

25 Jul, 2013

2 commits

c9bee3b7f tcp: TCP_NOTSENT_LOWAT socket option ... Browse Code »

Idea of this patch is to add optional limitation of number of
unsent bytes in TCP sockets, to reduce usage of kernel memory.

TCP receiver might announce a big window, and TCP sender autotuning
might allow a large amount of bytes in write queue, but this has little
performance impact if a large part of this buffering is wasted :

Write queue needs to be large only to deal with large BDP, not
necessarily to cope with scheduling delays (incoming ACKS make room
for the application to queue more bytes)

For most workloads, using a value of 128 KB or less is OK to give
applications enough time to react to POLLOUT events in time
(or being awaken in a blocking sendmsg())

This patch adds two ways to set the limit :

1) Per socket option TCP_NOTSENT_LOWAT

2) A sysctl (/proc/sys/net/ipv4/tcp_notsent_lowat) for sockets
not using TCP_NOTSENT_LOWAT socket option (or setting a zero value)
Default value being UINT_MAX (0xFFFFFFFF), meaning this has no effect.

This changes poll()/select()/epoll() to report POLLOUT
only if number of unsent bytes is below tp->nosent_lowat

Note this might increase number of sendmsg()/sendfile() calls
when using non blocking sockets,
and increase number of context switches for blocking sockets.

Note this is not related to SO_SNDLOWAT (as SO_SNDLOWAT is
defined as :
Specify the minimum number of bytes in the buffer until
the socket layer will pass the data to the protocol)

Tested:

netperf sessions, and watching /proc/net/protocols "memory" column for TCP

With 200 concurrent netperf -t TCP_STREAM sessions, amount of kernel memory
used by TCP buffers shrinks by ~55 % (20567 pages instead of 45458)

lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
TCPv6 1880 2 45458 no 208 yes ipv6 y y y y y y y y y y y y y n y y y y y
TCP 1696 508 45458 no 208 yes kernel y y y y y y y y y y y y y n y y y y y

lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
TCPv6 1880 2 20567 no 208 yes ipv6 y y y y y y y y y y y y y n y y y y y
TCP 1696 508 20567 no 208 yes kernel y y y y y y y y y y y y y n y y y y y

Using 128KB has no bad effect on the throughput or cpu usage
of a single flow, although there is an increase of context switches.

A bonus is that we hold socket lock for a shorter amount
of time and should improve latencies of ACK processing.

lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1651584 6291456 16384 20.00 17447.90 10^6bits/s 3.13 S -1.00 U 0.353 -1.000 usec/KB

Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':

412,514 context-switches

200.034645535 seconds time elapsed

lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1593240 6291456 16384 20.00 17321.16 10^6bits/s 3.35 S -1.00 U 0.381 -1.000 usec/KB

Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':

2,675,818 context-switches

200.029651391 seconds time elapsed

Signed-off-by: Eric Dumazet
Cc: Neal Cardwell
Cc: Yuchung Cheng
Acked-By: Yuchung Cheng
Signed-off-by: David S. Miller

Eric Dumazet
2013-07-25 08:54:48 +0800
91705c61b net: sctp: trivial: update mailing list address ... Browse Code »

The SCTP mailing list address to send patches or questions
to is linux-sctp@vger.kernel.org and not
lksctp-developers@lists.sourceforge.net anymore. Therefore,
update all occurences.

Signed-off-by: Daniel Borkmann
Acked-by: Neil Horman
Acked-by: Vlad Yasevich
Signed-off-by: David S. Miller

Daniel Borkmann
2013-07-25 08:53:38 +0800

10 Jul, 2013

2 commits

496322bc9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:
"This is a re-do of the net-next pull request for the current merge
window. The only difference from the one I made the other day is that
this has Eliezer's interface renames and the timeout handling changes
made based upon your feedback, as well as a few bug fixes that have
trickeled in.

Highlights:

1) Low latency device polling, eliminating the cost of interrupt
handling and context switches. Allows direct polling of a network
device from socket operations, such as recvmsg() and poll().

Currently ixgbe, mlx4, and bnx2x support this feature.

Full high level description, performance numbers, and design in
commit 0a4db187a999 ("Merge branch 'll_poll'")

From Eliezer Tamir.

2) With the routing cache removed, ip_check_mc_rcu() gets exercised
more than ever before in the case where we have lots of multicast
addresses. Use a hash table instead of a simple linked list, from
Eric Dumazet.

3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from
Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski,
Marek Puzyniak, Michal Kazior, and Sujith Manoharan.

4) Support reporting the TUN device persist flag to userspace, from
Pavel Emelyanov.

5) Allow controlling network device VF link state using netlink, from
Rony Efraim.

6) Support GRE tunneling in openvswitch, from Pravin B Shelar.

7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from
Daniel Borkmann and Eric Dumazet.

8) Allow controlling of TCP quickack behavior on a per-route basis,
from Cong Wang.

9) Several bug fixes and improvements to vxlan from Stephen
Hemminger, Pravin B Shelar, and Mike Rapoport. In particular,
support receiving on multiple UDP ports.

10) Major cleanups, particular in the area of debugging and cookie
lifetime handline, to the SCTP protocol code. From Daniel
Borkmann.

11) Allow packets to cross network namespaces when traversing tunnel
devices. From Nicolas Dichtel.

12) Allow monitoring netlink traffic via AF_PACKET sockets, in a
manner akin to how we monitor real network traffic via ptype_all.
From Daniel Borkmann.

13) Several bug fixes and improvements for the new alx device driver,
from Johannes Berg.

14) Fix scalability issues in the netem packet scheduler's time queue,
by using an rbtree. From Eric Dumazet.

15) Several bug fixes in TCP loss recovery handling, from Yuchung
Cheng.

16) Add support for GSO segmentation of MPLS packets, from Simon
Horman.

17) Make network notifiers have a real data type for the opaque
pointer that's passed into them. Use this to properly handle
network device flag changes in arp_netdev_event(). From Jiri
Pirko and Timo Teräs.

18) Convert several drivers over to module_pci_driver(), from Peter
Huewe.

19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a
O(1) calculation instead. From Eric Dumazet.

20) Support setting of explicit tunnel peer addresses in ipv6, just
like ipv4. From Nicolas Dichtel.

21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet.

22) Prevent a single high rate flow from overruning an individual cpu
during RX packet processing via selective flow shedding. From
Willem de Bruijn.

23) Don't use spinlocks in TCP md5 signing fast paths, from Eric
Dumazet.

24) Don't just drop GSO packets which are above the TBF scheduler's
burst limit, chop them up so they are in-bounds instead. Also
from Eric Dumazet.

25) VLAN offloads are missed when configured on top of a bridge, fix
from Vlad Yasevich.

26) Support IPV6 in ping sockets. From Lorenzo Colitti.

27) Receive flow steering targets should be updated at poll() time
too, from David Majnemer.

28) Fix several corner case regressions in PMTU/redirect handling due
to the routing cache removal, from Timo Teräs.

29) We have to be mindful of ipv4 mapped ipv6 sockets in
upd_v6_push_pending_frames(). From Hannes Frederic Sowa.

30) Fix L2TP sequence number handling bugs, from James Chapman."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits)
drivers/net: caif: fix wrong rtnl_is_locked() usage
drivers/net: enic: release rtnl_lock on error-path
vhost-net: fix use-after-free in vhost_net_flush
net: mv643xx_eth: do not use port number as platform device id
net: sctp: confirm route during forward progress
virtio_net: fix race in RX VQ processing
virtio: support unlocked queue poll
net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit
Documentation: Fix references to defunct linux-net@vger.kernel.org
net/fs: change busy poll time accounting
net: rename low latency sockets functions to busy poll
bridge: fix some kernel warning in multicast timer
sfc: Fix memory leak when discarding scattered packets
sit: fix tunnel update via netlink
dt:net:stmmac: Add dt specific phy reset callback support.
dt:net:stmmac: Add support to dwmac version 3.610 and 3.710
dt:net:stmmac: Allocate platform data only if its NULL.
net:stmmac: fix memleak in the open method
ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
net: ipv6: fix wrong ping_v6_sendmsg return value
...

Linus Torvalds
2013-07-10 09:24:39 +0800
b78ba72cd Documentation: Fix references to defunct linux-net@vger.kernel.org ... Browse Code »

linux-net@vger.kernel.org was replaced by netdev@oss.sgi.com was replaced
by netdev@vger.kernel.org.

Signed-off-by: Geert Uytterhoeven
Signed-off-by: David S. Miller

Geert Uytterhoeven
2013-07-10 03:42:19 +0800

05 Jul, 2013

1 commit

80cc38b16 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull trivial tree updates from Jiri Kosina:
"The usual stuff from trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
treewide: relase -> release
Documentation/cgroups/memory.txt: fix stat file documentation
sysctl/net.txt: delete reference to obsolete 2.4.x kernel
spinlock_api_smp.h: fix preprocessor comments
treewide: Fix typo in printk
doc: device tree: clarify stuff in usage-model.txt.
open firmware: "/aliasas" -> "/aliases"
md: bcache: Fixed a typo with the word 'arithmetic'
irq/generic-chip: fix a few kernel-doc entries
frv: Convert use of typedef ctl_table to struct ctl_table
sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
doc: clk: Fix incorrect wording
Documentation/arm/IXP4xx fix a typo
Documentation/networking/ieee802154 fix a typo
Documentation/DocBook/media/v4l fix a typo
Documentation/video4linux/si476x.txt fix a typo
Documentation/virtual/kvm/api.txt fix a typo
Documentation/early-userspace/README fix a typo
Documentation/video4linux/soc-camera.txt fix a typo
lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
...

Linus Torvalds
2013-07-05 02:40:58 +0800

04 Jul, 2013

1 commit

0c1072ae0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
drivers/net/ethernet/freescale/fec_main.c
drivers/net/ethernet/renesas/sh_eth.c
net/ipv4/gre.c

The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list)
and the splitting of the gre.c code into seperate files.

The FEC conflict was two sets of changes adding ethtool support code
in an "!CONFIG_M5272" CPP protected block.

Finally the sh_eth.c conflict was between one commit add bits set
in the .eesr_err_check mask whilst another commit removed the
.tx_error_check member and assignments.

Signed-off-by: David S. Miller

David S. Miller
2013-07-04 05:55:13 +0800

01 Jul, 2013

1 commit

4e144d3a8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next ... Browse Code »

Pablo Neira Ayuso says:

====================
The following batch contains Netfilter/IPVS updates for net-next,
they are:

* Enforce policy to several nfnetlink subsystem, from Daniel
Borkmann.

* Use xt_socket to match the third packet (to perform simplistic
socket-based stateful filtering), from Eric Dumazet.

* Avoid large timeout for picked up from the middle TCP flows,
from Florian Westphal.

* Exclude IPVS from struct net if IPVS is disabled and removal
of unnecessary included header file, from JunweiZhang.

* Release SCTP connection immediately under load, to mimic current
TCP behaviour, from Julian Anastasov.

* Replace and enhance SCTP state machine, from Julian Anastasov.

* Add tweak to reduce sync traffic in the presence of persistence,
also from Julian Anastasov.

* Add tweak for the IPVS SH scheduler not to reject connections
directed to a server, choose a new one instead, from Alexander
Frolkin.

* Add support for sloppy TCP and SCTP modes, that creates state
information on any packet, not only initial handshake packets,
from Alexander Frolkin.
====================

Signed-off-by: David S. Miller

David S. Miller
2013-07-01 08:35:13 +0800

26 Jun, 2013

4 commits

4d0c875dc ipvs: add sync_persist_mode flag ... Browse Code »

Add sync_persist_mode flag to reduce sync traffic
by syncing only persistent templates.

Signed-off-by: Julian Anastasov
Tested-by: Aleksey Chudov
Signed-off-by: Simon Horman

Julian Anastasov
2013-06-26 17:01:46 +0800
8599b52e1 bonding: add an option to fail when any of arp_ip_target is inaccessible ... Browse Code »

Currently, we fail only when all of the ips in arp_ip_target are gone.
However, in some situations we might need to fail if even one host from
arp_ip_target becomes unavailable.

All situations, obviously, rely on the idea that we need *completely*
functional network, with all interfaces/addresses working correctly.

One real world example might be:
vlans on top on bond (hybrid port). If bond and vlans have ips assigned
and we have their peers monitored via arp_ip_target - in case of switch
misconfiguration (trunk/access port), slave driver malfunction or
tagged/untagged traffic dropped on the way - we will be able to switch
to another slave.

Though any other configuration needs that if we need to have access to all
arp_ip_targets.

This patch adds this possibility by adding a new parameter -
arp_all_targets (both as a module parameter and as a sysfs knob). It can be
set to:

0 or any (the default) - which works exactly as it's working now -
the slave is up if any of the arp_ip_targets are up.

1 or all - the slave is up if all of the arp_ip_targets are up.

This parameter can be changed on the fly (via sysfs), and requires the mode
to be active-backup and arp_validate to be enabled (it obeys the
arp_validate config on which slaves to validate).

Internally it's done through:

1) Add target_last_arp_rx[BOND_MAX_ARP_TARGETS] array to slave struct. It's
an array of jiffies, meaning that slave->target_last_arp_rx[i] is the
last time we've received arp from bond->params.arp_targets[i] on this
slave.

2) If we successfully validate an arp from bond->params.arp_targets[i] in
bond_validate_arp() - update the slave->target_last_arp_rx[i] with the
current jiffies value.

3) When getting slave's last_rx via slave_last_rx(), we return the oldest
time when we've received an arp from any address in
bond->params.arp_targets[].

If the value of arp_all_targets == 0 - we still work the same way as
before.

Also, update the documentation to reflect the new parameter.

v3->v4:
Kill the forgotten rtnl_unlock(), rephrase the documentation part to be
more clear, don't fail setting arp_all_targets if arp_validate is not set -
it has no effect anyway but can be easier to set up. Also, print a warning
if the last arp_ip_target is removed while the arp_interval is on, but not
the arp_validate.

v2->v3:
Use _bh spinlock, remove useless rtnl_lock() and use jiffies for new
arp_ip_target last arp, instead of slave_last_rx(). On bond_enslave(),
use the same initialization value for target_last_arp_rx[] as is used
for the default last_arp_rx, to avoid useless interface flaps.

Also, instead of failing to remove the last arp_ip_target just print a
warning - otherwise it might break existing scripts.

v1->v2:
Correctly handle adding/removing hosts in arp_ip_target - we need to
shift/initialize all slave's target_last_arp_rx. Also, don't fail module
loading on arp_all_targets misconfiguration, just disable it, and some
minor style fixes.

Signed-off-by: Veaceslav Falico
Signed-off-by: David S. Miller

Veaceslav Falico
2013-06-26 07:58:38 +0800
d7d35c681 bonding: doc: some details on backup slave arp validation ... Browse Code »

Add some details to bonding documentation on how backup slave arp
validation works.

Signed-off-by: Veaceslav Falico
Signed-off-by: David S. Miller

Veaceslav Falico
2013-06-26 07:58:38 +0800
762375766 doc: fix some syntax errors in netlink mmap sample code ... Browse Code »

Cc: Patrick McHardy
Cc: David S. Miller
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

Cong Wang
2013-06-26 07:47:02 +0800

24 Jun, 2013

1 commit

a3c910d2e tcp: doc : fix the syncookies default value ... Browse Code »

syncookies is on for default since in commit e994b7c901
(tcp: Don't make syn cookies initial setting depend on CONFIG_SYSCTL).

And fix a typo of CONFIG_SYN_COOKIES.

Signed-off-by: Shan Wei
Signed-off-by: David S. Miller

Shan Wei
2013-06-24 15:26:26 +0800

13 Jun, 2013

1 commit

e3d73bced net: add doc for ip_early_demux sysctl ... Browse Code »

commit 6648bd7e0e62c0c8c03b (ipv4: Add sysctl knob to control
early socket demux) introduced such sysctl, but forgot to add
doc into Documentation/networking/ip-sysctl.txt. This patch adds it.

Basically I grab the doc from the description of commit 41063e9dd11956f2d285
(ipv4: Early TCP socket demux.) and the above commit.

Cc: Eric Dumazet
Cc: Alexander Duyck
Cc: David S. Miller
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

Cong Wang
2013-06-13 06:10:13 +0800

08 Jun, 2013

2 commits

e8b265e8b doc:networking: Fix default value (icmp_ignore_bogus_error_responses). ... Browse Code »

This patch fixes icmp_ignore_bogus_error_responses default value to be 1 instead
of FALSE. It is initialized to 1 in icmp_sk_init().

Signed-off-by: Rami Rosen
Signed-off-by: David S. Miller

Rami Rosen
2013-06-08 14:31:54 +0800
d70a3f887 doc: packet: simplify tpacket example code ... Browse Code »

This patch simplifies the tpacket_v3 example code a bit by getting rid
of unecessary macro wrappers, removing some debugging code so that it is
more to the point, and also adds a header comment. Now this example code
is the very minimum one needs to start from when dealing with tpacket_v3
and ~100 lines smaller than before.

Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2013-06-08 05:39:05 +0800

05 Jun, 2013

1 commit

3d9738aa4 Documentation/networking/ieee802154 fix a typo ... Browse Code »

Corrected the words Accsess to Access.

Signed-off-by: Stefan Huber
Acked-by: Randy Dunlap
Signed-off-by: Jiri Kosina

Stefan Huber
2013-06-05 22:24:59 +0800

28 May, 2013

3 commits

f884ab15a doc: fix misspellings with 'codespell' tool ... Browse Code »

Signed-off-by: Anatol Pomozov
Signed-off-by: Jiri Kosina

Anatol Pomozov
2013-05-28 18:02:12 +0800
b1098bbe1 bonding: remove ifenslave.c from kernel source ... Browse Code »

As Stephen proposed:
Since bonding supports configuration via iproute (netlink) and sysfs, I think
it is time to purge the old ifenslave code out of Documentation/networking
and update the documentation.

Suggested-by: Stephen Hemminger
Cc: Stephen Hemminger
Cc: Jay Vosburgh
Cc: "David S. Miller"
Signed-off-by: Cong Wang
Signed-off-by: David S. Miller

Cong Wang
2013-05-28 14:34:46 +0800
3dd17edea doc:networking: Fix typo in documentation/networking ... Browse Code »

Correct spelling typo

Signed-off-by: Masanari Iida
Signed-off-by: David S. Miller

Masanari Iida
2013-05-28 14:29:18 +0800

24 May, 2013

1 commit

191cb1f21 rps: document flow limit in scaling.txt ... Browse Code »

Explain the mechanism and API of the recently merged
rps flow limit patch.

Signed-off-by: Willem de Bruijn
Signed-off-by: David S. Miller

Willem de Bruijn
2013-05-24 09:54:44 +0800

25 Apr, 2013

1 commit

2940b26be packet: doc: update timestamping part ... Browse Code »

Bring the timestamping section in sync with the implementation.

Signed-off-by: Daniel Borkmann
Acked-by: Willem de Bruijn
Signed-off-by: David S. Miller

Daniel Borkmann
2013-04-25 13:22:22 +0800

20 Apr, 2013

1 commit

5683264c3 netlink: add documentation for memory mapped I/O ... Browse Code »

Signed-off-by: Patrick McHardy
Signed-off-by: David S. Miller

Patrick McHardy
2013-04-20 02:58:36 +0800

09 Apr, 2013

2 commits

49cfbf675 stmmac: review driver documentation ... Browse Code »

This patch reviews the driver documentation file;
for example, there were some new fields (in the driver
module parameter section) and the ptp files were
not documented.

Signed-off-by: Giuseppe Cavallaro
Signed-off-by: David S. Miller

Giuseppe CAVALLARO
2013-04-09 04:55:27 +0800
56aa091d6 ieee802154/nl-mac.c: make some MLME operations optional ... Browse Code »

Check for NULL before calling the following operations from "struct
ieee802154_mlme_ops": assoc_req, assoc_resp, disassoc_req, start_req,
and scan_req.

This fixes a current oops where those functions are called but not
implemented. It also updates the documentation to clarify that they
are now optional by design. If a call to an unimplemented function
is attempted, the kernel returns EOPNOTSUPP via netlink.

The following operations are still required: get_phy, get_pan_id,
get_short_addr, and get_dsn.

Note that the places where this patch changes the initialization
of "ret" should not affect the rest of the code since "ret" was
always set (again) before returning its value.

Signed-off-by: Werner Almesberger
Signed-off-by: David S. Miller

Werner Almesberger
2013-04-09 00:00:16 +0800

31 Mar, 2013

1 commit

4eb061482 doc: packet: add minimal TPACKET_V3 example code ... Browse Code »

Lost in space for a long time, but it finally came back to us from
some ancient code tombs. This patch adds a minimal runnable example
of Linux' packet mmap(2) from Chetan Loke's TPACKET_V3. Special
thanks to David S. Miller, and also Eric Leblond and Victor Julien!

Cc: Eric Leblond
Cc: Victor Julien
Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller

Daniel Borkmann
2013-03-31 05:40:27 +0800

27 Mar, 2013

1 commit

94fbbbf89 stmmac: update the Doc and Version (PTP+SGMII) ... Browse Code »

This patch updates the stmmac.txt file adding information related to the PTP
and SGMII/RGMII supports.

Also the patch updates the driver version to: March_2013.

Signed-off-by: Giuseppe Cavallaro
Signed-off-by: David S. Miller

Giuseppe CAVALLARO
2013-03-27 00:53:37 +0800

21 Mar, 2013

3 commits

e33099f96 tcp: implement RFC5682 F-RTO ... Browse Code »

This patch implements F-RTO (foward RTO recovery):

When the first retransmission after timeout is acknowledged, F-RTO
sends new data instead of old data. If the next ACK acknowledges
some never-retransmitted data, then the timeout was spurious and the
congestion state is reverted. Otherwise if the next ACK selectively
acknowledges the new data, then the timeout was genuine and the
loss recovery continues. This idea applies to recurring timeouts
as well. While F-RTO sends different data during timeout recovery,
it does not (and should not) change the congestion control.

The implementaion follows the three steps of SACK enhanced algorithm
(section 3) in RFC5682. Step 1 is in tcp_enter_loss(). Step 2 and
3 are in tcp_process_loss(). The basic version is not supported
because SACK enhanced version also works for non-SACK connections.

The new implementation is functionally in parity with the old F-RTO
implementation except the one case where it increases undo events:
In addition to the RFC algorithm, a spurious timeout may be detected
without sending data in step 2, as long as the SACK confirms not
all the original data are dropped. When this happens, the sender
will undo the cwnd and perhaps enter fast recovery instead. This
additional check increases the F-RTO undo events by 5x compared
to the prior implementation on Google Web servers, since the sender
often does not have new data to send for HTTP.

Note F-RTO may detect spurious timeout before Eifel with timestamps
does so.

Signed-off-by: Yuchung Cheng
Acked-by: Eric Dumazet
Acked-by: Neal Cardwell
Signed-off-by: David S. Miller

Yuchung Cheng
2013-03-21 23:47:51 +0800
9b44190dc tcp: refactor F-RTO ... Browse Code »

The patch series refactor the F-RTO feature (RFC4138/5682).

This is to simplify the loss recovery processing. Existing F-RTO
was developed during the experimental stage (RFC4138) and has
many experimental features. It takes a separate code path from
the traditional timeout processing by overloading CA_Disorder
instead of using CA_Loss state. This complicates CA_Disorder state
handling because it's also used for handling dubious ACKs and undos.
While the algorithm in the RFC does not change the congestion control,
the implementation intercepts congestion control in various places
(e.g., frto_cwnd in tcp_ack()).

The new code implements newer F-RTO RFC5682 using CA_Loss processing
path. F-RTO becomes a small extension in the timeout processing
and interfaces with congestion control and Eifel undo modules.
It lets congestion control (module) determines how many to send
independently. F-RTO only chooses what to send in order to detect
spurious retranmission. If timeout is found spurious it invokes
existing Eifel undo algorithms like DSACK or TCP timestamp based
detection.

The first patch removes all F-RTO code except the sysctl_tcp_frto is
left for the new implementation. Since CA_EVENT_FRTO is removed, TCP
westwood now computes ssthresh on regular timeout CA_EVENT_LOSS event.

Signed-off-by: Yuchung Cheng
Acked-by: Neal Cardwell
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Yuchung Cheng
2013-03-21 23:47:50 +0800
61816596d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull in the 'net' tree to get Daniel Borkmann's flow dissector
infrastructure change.

Signed-off-by: David S. Miller

David S. Miller
2013-03-21 00:46:26 +0800

19 Mar, 2013

1 commit

0c12582fb ipvs: add backup_only flag to avoid loops ... Browse Code »

Dmitry Akindinov is reporting for a problem where SYNs are looping
between the master and backup server when the backup server is used as
real server in DR mode and has IPVS rules to function as director.

Even when the backup function is enabled we continue to forward
traffic and schedule new connections when the current master is using
the backup server as real server. While this is not a problem for NAT,
for DR and TUN method the backup server can not determine if a request
comes from client or from director.

To avoid such loops add new sysctl flag backup_only. It can be needed
for DR/TUN setups that do not need backup and director function at the
same time. When the backup function is enabled we stop any forwarding
and pass the traffic to the local stack (real server mode). The flag
disables the director function when the backup function is enabled.

For setups that enable backup function for some virtual services and
director function for other virtual services there should be another
more complex solution to support DR/TUN mode, may be to assign
per-virtual service syncid value, so that we can differentiate the
requests.

Reported-by: Dmitry Akindinov
Tested-by: German Myzovsky
Signed-off-by: Julian Anastasov
Signed-off-by: Simon Horman

Julian Anastasov
2013-03-19 20:21:51 +0800

18 Mar, 2013

1 commit

1a2c6181c tcp: Remove TCPCT ... Browse Code »

TCPCT uses option-number 253, reserved for experimental use and should
not be used in production environments.
Further, TCPCT does not fully implement RFC 6013.

As a nice side-effect, removing TCPCT increases TCP's performance for
very short flows:

Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests
for files of 1KB size.

before this patch:
average (among 7 runs) of 20845.5 Requests/Second
after:
average (among 7 runs) of 21403.6 Requests/Second

Signed-off-by: Christoph Paasch
Signed-off-by: David S. Miller

Christoph Paasch
2013-03-18 02:35:13 +0800

15 Mar, 2013

1 commit

b66c66dc5 Documentation: fix neigh/default/gc_thresh1 default value. ... Browse Code »

The default value is 128, not 256
#grep gc_thresh1 net/ -rI
net/decnet/dn_neigh.c: .gc_thresh1 = 128,
net/ipv6/ndisc.c: .gc_thresh1 = 128,
net/ipv4/arp.c: .gc_thresh1 = 128,

Signed-off-by: Li RongQing
Signed-off-by: David S. Miller

Li RongQing
2013-03-15 21:12:23 +0800

12 Mar, 2013

1 commit

6ba8a3b19 tcp: Tail loss probe (TLP) ... Browse Code »

This patch series implement the Tail loss probe (TLP) algorithm described
in http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01. The
first patch implements the basic algorithm.

TLP's goal is to reduce tail latency of short transactions. It achieves
this by converting retransmission timeouts (RTOs) occuring due
to tail losses (losses at end of transactions) into fast recovery.
TLP transmits one packet in two round-trips when a connection is in
Open state and isn't receiving any ACKs. The transmitted packet, aka
loss probe, can be either new or a retransmission. When there is tail
loss, the ACK from a loss probe triggers FACK/early-retransmit based
fast recovery, thus avoiding a costly RTO. In the absence of loss,
there is no change in the connection state.

PTO stands for probe timeout. It is a timer event indicating
that an ACK is overdue and triggers a loss probe packet. The PTO value
is set to max(2*SRTT, 10ms) and is adjusted to account for delayed
ACK timer when there is only one oustanding packet.

TLP Algorithm

On transmission of new data in Open state:
-> packets_out > 1: schedule PTO in max(2*SRTT, 10ms).
-> packets_out == 1: schedule PTO in max(2*RTT, 1.5*RTT + 200ms)
-> PTO = min(PTO, RTO)

Conditions for scheduling PTO:
-> Connection is in Open state.
-> Connection is either cwnd limited or no new data to send.
-> Number of probes per tail loss episode is limited to one.
-> Connection is SACK enabled.

When PTO fires:
new_segment_exists:
-> transmit new segment.
-> packets_out++. cwnd remains same.

no_new_packet:
-> retransmit the last segment.
Its ACK triggers FACK or early retransmit based recovery.

ACK path:
-> rearm RTO at start of ACK processing.
-> reschedule PTO if need be.

In addition, the patch includes a small variation to the Early Retransmit
(ER) algorithm, such that ER and TLP together can in principle recover any
N-degree of tail loss through fast recovery. TLP is controlled by the same
sysctl as ER, tcp_early_retrans sysctl.
tcp_early_retrans==0; disables TLP and ER.
==1; enables RFC5827 ER.
==2; delayed ER.
==3; TLP and delayed ER. [DEFAULT]
==4; TLP only.

The TLP patch series have been extensively tested on Google Web servers.
It is most effective for short Web trasactions, where it reduced RTOs by 15%
and improved HTTP response time (average by 6%, 99th percentile by 10%).
The transmitted probes account for
Acked-by: Neal Cardwell
Acked-by: Yuchung Cheng
Signed-off-by: David S. Miller

Nandita Dukkipati
2013-03-12 20:30:34 +0800