13 Aug, 2013
1 commit
-
…wireless-next into for-davem
Conflicts:
drivers/net/ethernet/broadcom/Kconfig
12 Aug, 2013
1 commit
-
commit e370a723632 ("af_unix: improve STREAM behavior with fragmented
memory") added a bug on large send() because the
skb_copy_datagram_from_iovec() call always start from the beginning
of iovec.We must instead use the @sent variable to properly skip the
already processed part.Reported-by: Hannes Frederic Sowa
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
10 Aug, 2013
11 commits
-
Adding paged frags skbs to af_unix sockets introduced a performance
regression on large sends because of additional page allocations, even
if each skb could carry at least 100% more payload than before.We can instruct sock_alloc_send_pskb() to attempt high order
allocations.Most of the time, it does a single page allocation instead of 8.
I added an additional parameter to sock_alloc_send_pskb() to
let other users to opt-in for this new feature on followup patches.Tested:
Before patch :
$ netperf -t STREAM_STREAM
STREAM STREAM TEST
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec2304 212992 212992 10.00 46861.15
After patch :
$ netperf -t STREAM_STREAM
STREAM STREAM TEST
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec2304 212992 212992 10.00 57981.11
Signed-off-by: Eric Dumazet
Cc: David Rientjes
Signed-off-by: David S. Miller -
unix_stream_sendmsg() currently uses order-2 allocations,
and we had numerous reports this can fail.The __GFP_REPEAT flag present in sock_alloc_send_pskb() is
not helping.This patch extends the work done in commit eb6a24816b247c
("af_unix: reduce high order page allocations) for
datagram sockets.This opens the possibility of zero copy IO (splice() and
friends)The trick is to not use skb_pull() anymore in recvmsg() path,
and instead add a @consumed field in UNIXCB() to track amount
of already read payload in the skb.There is a performance regression for large sends
because of extra page allocations that will be addressed
in a follow-up patch, allowing sock_alloc_send_pskb()
to attempt high order page allocations.Signed-off-by: Eric Dumazet
Cc: David Rientjes
Signed-off-by: David S. Miller -
Encrypt the cookie with both server and client IPv4 addresses,
such that multi-homed server will grant different cookies
based on both the source and destination IPs. No client change
is needed since cookie is opaque to the client.Signed-off-by: Yuchung Cheng
Reviewed-by: Eric Dumazet
Acked-by: Neal Cardwell
Signed-off-by: David S. Miller -
This reverts commit cda5f98e36576596b9230483ec52bff3cc97eb21.
As per Vlad's request.
Signed-off-by: David S. Miller
-
With the restructuring of the lksctp.org site, we only allow bug
reports through the SCTP mailing list linux-sctp@vger.kernel.org,
not via SF, as SF is only used for web hosting and nothing more.
While at it, also remove the obvious statement that bugs will be
fixed and incooperated into the kernel.Signed-off-by: Daniel Borkmann
Acked-by: Vlad Yasevich
Signed-off-by: David S. Miller -
Get rid of the last module parameter for SCTP and make this
configurable via sysctl for SCTP like all the rest of SCTP's
configuration knobs.Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller -
Adds the new procfs knobs:
/proc/sys/net/ipv4/conf/*/igmpv2_unsolicited_report_interval
/proc/sys/net/ipv4/conf/*/igmpv3_unsolicited_report_intervalWhich will allow userspace configuration of the IGMP unsolicited report
interval (see below) in milliseconds. The defaults are 10000ms for IGMPv2
and 1000ms for IGMPv3 in accordance with RFC2236 and RFC3376.Background:
If an IGMP join packet is lost you will not receive data sent to the
multicast group so if no data arrives from that multicast group in a
period of time after the IGMP join a second IGMP join will be sent. The
delay between joins is the "IGMP Unsolicited Report Interval".Prior to this patch this value was hard coded in the kernel to 10s for
IGMPv2 and 1s for IGMPv3. 10s is unsuitable for some use-cases, such as
IPTV as it can cause channel change to be slow in the presence of packet
loss.This patch allows the value to be overridden from userspace for both
IGMPv2 and IGMPv3 such that it can be tuned accoding to the network.Tested with Wireshark and a simple program to join a (non-existent)
multicast group. The distribution of timings for the second join differ
based upon setting the procfs knobs.igmpvX_unsolicited_report_interval is intended to follow the pattern
established by force_igmp_version, and while a procfs entry has been added
a corresponding sysctl knob has not as it is my understanding that sysctl
is deprecated[1].[1]: http://lwn.net/Articles/247243/
Signed-off-by: William Manley
Acked-by: Hannes Frederic Sowa
Acked-by: Benjamin LaHaise
Signed-off-by: David S. Miller -
The procfs knob /proc/sys/net/ipv4/conf/*/force_igmp_version allows the
IGMP protocol version to use to be explicitly set. As a side effect this
caused the routing cache to be flushed as it was declared as a
DEVINET_SYSCTL_FLUSHING_ENTRY. Flushing is unnecessary and this patch
makes it so flushing does not occur.Requested by Hannes Frederic Sowa as he was reviewing other patches
adding procfs entries.Suggested-by: Hannes Frederic Sowa
Signed-off-by: William Manley
Acked-by: Hannes Frederic Sowa
Acked-by: Benjamin LaHaise
Signed-off-by: David S. Miller -
If an IGMP join packet is lost you will not receive data sent to the
multicast group so if no data arrives from that multicast group in a
period of time after the IGMP join a second IGMP join will be sent. The
delay between joins is the "IGMP Unsolicited Report Interval".Previously this value was hard coded to be chosen randomly between 0-10s.
This can be too long for some use-cases, such as IPTV as it can cause
channel change to be slow in the presence of packet loss.The value 10s has come from IGMPv2 RFC2236, which was reduced to 1s in
IGMPv3 RFC3376. This patch makes the kernel use the 1s value from the
later RFC if we are operating in IGMPv3 mode. IGMPv2 behaviour is
unaffected.Tested with Wireshark and a simple program to join a (non-existent)
multicast group. The distribution of timings for the second join differ
based upon setting /proc/sys/net/ipv4/conf/eth0/force_igmp_version.Signed-off-by: William Manley
Acked-by: Hannes Frederic Sowa
Acked-by: Benjamin LaHaise
Signed-off-by: David S. Miller
09 Aug, 2013
1 commit
-
With GRO/LRO processing, there is a problem because Ip[6]InReceives SNMP
counters do not count the number of frames, but number of aggregated
segments.Its probably too late to change this now.
This patch adds four new counters, tracking number of frames, regardless
of LRO/GRO, and on a per ECN status basis, for IPv4 and IPv6.Ip[6]NoECTPkts : Number of packets received with NOECT
Ip[6]ECT1Pkts : Number of packets received with ECT(1)
Ip[6]ECT0Pkts : Number of packets received with ECT(0)
Ip[6]CEPkts : Number of packets received with Congestion Experiencedlph37:~# nstat | egrep "Pkts|InReceive"
IpInReceives 1634137 0.0
Ip6InReceives 3714107 0.0
Ip6InNoECTPkts 19205 0.0
Ip6InECT0Pkts 52651828 0.0
IpExtInNoECTPkts 33630 0.0
IpExtInECT0Pkts 15581379 0.0
IpExtInCEPkts 6 0.0Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
08 Aug, 2013
8 commits
-
This reverts commits:
0f75b09c798ed00c30d7d5551b896be883bc2aeb
cbd89acb9eb257ed3b2be867142583fdcf7fdc5b
c483e02614551e44ced3fe6eedda8e36d3277cccAmongst other things, it's modifies the SKB header
to pull the ethernet headers off via eth_type_trans()
on the output path which is bogus.It's causing serious regressions for people.
Signed-off-by: David S. Miller
-
Use skb_copy_datagram_from_iovec() to avoid code duplication and make it easy to
be read. Also we can do the skipping inside the zero-copy loop.Signed-off-by: Jason Wang
Signed-off-by: David S. Miller -
To reduce the duplicated codes.
Signed-off-by: Jason Wang
Signed-off-by: David S. Miller -
Signed-off-by: Jason Wang
Signed-off-by: David S. Miller -
Signed-off-by: Jason Wang
Signed-off-by: David S. Miller -
To let it be reused and reduce code duplication. Also document this function.
Signed-off-by: Jason Wang
Signed-off-by: David S. Miller -
To let it be reused and reduce code duplication.
Signed-off-by: Jason Wang
Signed-off-by: David S. Miller -
The IP tunnel hash heads can be embedded in the per-net structure
since it is a fixed size. Reduce the size so that the total structure
fits in a page size. The original size was overly large, even NETDEV_HASHBITS
is only 8 bits!Also, add some white space for readability.
Signed-off-by: Stephen Hemminger
Acked-by: Pravin B Shelar .
Signed-off-by: David S. Miller
05 Aug, 2013
1 commit
-
Use of RCU here with out marked pointer and function doesn't match prototype
with sparse.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller
04 Aug, 2013
5 commits
-
Merge net into net-next to setup some infrastructure Eric
Dumazet needs for usbnet changes.Signed-off-by: David S. Miller
-
Pull networking fixes from David Miller:
1) Don't ignore user initiated wireless regulatory settings on cards
with custom regulatory domains, from Arik Nemtsov.2) Fix length check of bluetooth information responses, from Jaganath
Kanakkassery.3) Fix misuse of PTR_ERR in btusb, from Adam Lee.
4) Handle rfkill properly while iwlwifi devices are offline, from
Emmanuel Grumbach.5) Fix r815x devices DMA'ing to stack buffers, from Hayes Wang.
6) Kernel info leak in ATM packet scheduler, from Dan Carpenter.
7) 8139cp doesn't check for DMA mapping errors, from Neil Horman.
8) Fix bridge multicast code to not snoop when no querier exists,
otherwise mutlicast traffic is lost. From Linus Lüssing.9) Avoid soft lockups in fib6_run_gc(), from Michal Kubecek.
10) Fix races in automatic address asignment on ipv6, which can result
in incorrect lifetime assignments. From Jiri Benc.11) Cure build bustage when CONFIG_NET_LL_RX_POLL is not set and rename
it CONFIG_NET_RX_BUSY_POLL to eliminate the last reference to the
original naming of this feature. From Cong Wang.12) Fix crash in TIPC when server socket creation fails, from Ying Xue.
13) macvlan_changelink() silently succeeds when it shouldn't, from
Michael S Tsirkin.14) HTB packet scheduler can crash due to sign extension, fix from
Stephen Hemminger.15) With the cable unplugged, r8169 prints out a message every 10
seconds, make it netif_dbg() instead of netif_warn(). From Peter
Wu.16) Fix memory leak in rtm_to_ifaddr(), from Daniel Borkmann.
17) sis900 gets spurious TX queue timeouts due to mismanagement of link
carrier state, from Denis Kirjanov.18) Validate somaxconn sysctl to make sure it fits inside of a u16.
From Roman Gushchin.19) Fix MAC address filtering on qlcnic, from Shahed Shaikh.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (68 commits)
qlcnic: Fix for flash update failure on 83xx adapter
qlcnic: Fix link speed and duplex display for 83xx adapter
qlcnic: Fix link speed display for 82xx adapter
qlcnic: Fix external loopback test.
qlcnic: Removed adapter series name from warning messages.
qlcnic: Free up memory in error path.
qlcnic: Fix ingress MAC learning
qlcnic: Fix MAC address filter issue on 82xx adapter
net: ethernet: davinci_emac: drop IRQF_DISABLED
netlabel: use domain based selectors when address based selectors are not available
net: check net.core.somaxconn sysctl values
sis900: Fix the tx queue timeout issue
net: rtm_to_ifaddr: free ifa if ifa_cacheinfo processing fails
r8169: remove "PHY reset until link up" log spam
net: ethernet: cpsw: drop IRQF_DISABLED
htb: fix sign extension bug
macvlan: handle set_promiscuity failures
macvlan: better mode validation
tipc: fix oops when creating server socket fails
net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLL
... -
Pull nfsd bugfixes from Bruce Fields:
"Most of this is due to a screwup on my part -- some gss-proxy crashes
got fixed before the merge window but somehow never made it out of a
temporary git repo on my laptop...."* 'for-3.11' of git://linux-nfs.org/~bfields/linux:
svcrpc: set cr_gss_mech from gss-proxy as well as legacy upcall
svcrpc: fix kfree oops in gss-proxy code
svcrpc: fix gss-proxy xdr decoding oops
svcrpc: fix gss_rpc_upcall create error
NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure. -
This change brings the suppressor attribute names into line; it also changes
the data types to provide a more consistent interface.While -1 indicates that the suppressor is not enabled, values >= 0 for
suppress_prefixlen or suppress_ifgroup reject routing decisions violating the
constraint.This changes the previously presented behaviour of suppress_prefixlen, where a
prefix length _less_ than the attribute value was rejected. After this change,
a prefix length less than *or* equal to the value is considered a violation of
the rule constraint.It also changes the default values for default and newly added rules (disabling
any suppression for those).Signed-off-by: Stefan Tomanek
Signed-off-by: David S. Miller -
This patch cleanup 2 points for the usage of vlan_dev_priv(dev):
* In vlan_dev.c/vlan_dev_hard_header, we should use the var *vlan directly
after grabing the pointer at the beginning with
*vlan = vlan_dev_priv(dev);
when we need to access the fields of *vlan.
* In vlan.c/register_vlan_device, add the var *vlan pointer
struct vlan_dev_priv *vlan;
to cleanup the code to access the fields of vlan_dev_priv(new_dev).Signed-off-by: Wang Sheng-Hui
Signed-off-by: David S. Miller
03 Aug, 2013
12 commits
-
NetLabel has the ability to selectively assign network security labels
to outbound traffic based on either the LSM's "domain" (different for
each LSM), the network destination, or a combination of both. Depending
on the type of traffic, local or forwarded, and the type of traffic
selector, domain or address based, different hooks are used to label the
traffic; the goal being minimal overhead.Unfortunately, there is a bug such that a system using NetLabel domain
based traffic selectors does not correctly label outbound local traffic
that is not assigned to a socket. The issue is that in these cases
the associated NetLabel hook only looks at the address based selectors
and not the domain based selectors. This patch corrects this by
checking both the domain and address based selectors so that the correct
labeling is applied, regardless of the configuration type.In order to acomplish this fix, this patch also simplifies some of the
NetLabel domainhash structures to use a more common outbound traffic
mapping type: struct netlbl_dommap_def. This simplifies some of the code
in this patch and paves the way for further simplifications in the
future.Signed-off-by: Paul Moore
Signed-off-by: David S. Miller -
dev->ndo_neigh_setup() might need some of the values of neigh_parms, so
populate them before calling it.Signed-off-by: Veaceslav Falico
Signed-off-by: David S. Miller -
Variable ptr is being assigned, but never used, so just remove it.
Signed-off-by: Daniel Borkmann
Signed-off-by: David S. Miller -
This change adds the ability to suppress a routing decision based upon the
interface group the selected interface belongs to. This allows it to
exclude specific devices from a routing decision.Signed-off-by: Stefan Tomanek
Signed-off-by: David S. Miller -
It's possible to assign an invalid value to the net.core.somaxconn
sysctl variable, because there is no checks at all.The sk_max_ack_backlog field of the sock structure is defined as
unsigned short. Therefore, the backlog argument in inet_listen()
shouldn't exceed USHRT_MAX. The backlog argument in the listen() syscall
is truncated to the somaxconn value. So, the somaxconn value shouldn't
exceed 65535 (USHRT_MAX).
Also, negative values of somaxconn are meaningless.before:
$ sysctl -w net.core.somaxconn=256
net.core.somaxconn = 256
$ sysctl -w net.core.somaxconn=65536
net.core.somaxconn = 65536
$ sysctl -w net.core.somaxconn=-100
net.core.somaxconn = -100after:
$ sysctl -w net.core.somaxconn=256
net.core.somaxconn = 256
$ sysctl -w net.core.somaxconn=65536
error: "Invalid argument" setting key "net.core.somaxconn"
$ sysctl -w net.core.somaxconn=-100
error: "Invalid argument" setting key "net.core.somaxconn"Based on a prior patch from Changli Gao.
Signed-off-by: Roman Gushchin
Reported-by: Changli Gao
Suggested-by: Eric Dumazet
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller -
By using sizeof(_hdr), net/ipv6/raw.c:icmpv6_filter implicitly assumes
that any valid ICMPv6 message is at least eight bytes long, i.e., that
the message body is at least four bytes.The DIS message of RPL (RFC 6550 section 6.2, from the 6LoWPAN world),
has a minimum length of only six bytes, and is thus blocked by
icmpv6_filter.RFC 4443 seems to allow even a zero-sized body, making the minimum
allowable message size four bytes.Signed-off-by: Werner Almesberger
Signed-off-by: David S. Miller -
"_hdr" should hold the ICMPv6 header while "hdr" is the pointer to it.
This worked by accident.Signed-off-by: Werner Almesberger
Signed-off-by: David S. Miller -
For ethernet frames, eth_type_trans() already parses the header, so one
can skip this when checking the frame size.Signed-off-by: Phil Sutter
Signed-off-by: David S. Miller -
Since tpacket_fill_skb() parses the protocol field in ethernet frames'
headers, it's easy to see if any passed frame is a VLAN one and account
for the extended size.But as the real protocol does not turn up before tpacket_fill_skb()
runs which in turn also checks the frame length, move the max frame
length calculation into the function.Signed-off-by: Phil Sutter
Signed-off-by: David S. Miller -
This may be necessary when the SKB is passed to other layers on the go,
which check the protocol field on their own. An example is a VLAN packet
sent out using AF_PACKET on a bridge interface. The bridging code checks
the SKB size, accounting for any VLAN header only if the protocol field
is set accordingly.Note that eth_type_trans() sets skb->dev to the passed argument, so this
can be skipped in packet_snd() for ethernet frames, as well.Signed-off-by: Phil Sutter
Signed-off-by: David S. Miller -
Commit 5c766d642 ("ipv4: introduce address lifetime") leaves the ifa
resource that was allocated via inet_alloc_ifa() unfreed when returning
the function with -EINVAL. Thus, free it first via inet_free_ifa().Signed-off-by: Daniel Borkmann
Reviewed-by: Jiri Pirko
Signed-off-by: David S. Miller -
When userspace passes a large priority value
the assignment of the unsigned value hopt->prio
to signed int cl->prio causes cl->prio to become negative and the
comparison is with TC_HTB_NUMPRIO is always false.The result is that HTB crashes by referencing outside
the array when processing packets. With this patch the large value
wraps around like other values outside the normal range.See: https://bugzilla.kernel.org/show_bug.cgi?id=60669
Signed-off-by: Stephen Hemminger
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller