23 Dec, 2011
1 commit
-
skb->truesize might be big even for a small packet.
Its even bigger after commit 87fb4b7b533 (net: more accurate skb
truesize) and big MTU.We should allow queueing at least one packet per receiver, even with a
low RCVBUF setting.Reported-by: Michal Simek
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
07 Nov, 2011
1 commit
-
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include
net: sch_generic remove redundant use of
net: inet_timewait_sock doesnt need
...Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
01 Nov, 2011
2 commits
-
Standardize the style for compiler based printf format verification.
Standardized the location of __printf too.Done via script and a little typing.
$ grep -rPl --include=*.[ch] -w "__attribute__" * | \
grep -vP "^(tools|scripts|include/linux/compiler-gcc.h)" | \
xargs perl -n -i -e 'local $/; while (<>) { s/\b__attribute__\s*\(\s*\(\s*format\s*\(\s*printf\s*,\s*(.+)\s*,\s*(.+)\s*\)\s*\)\s*\)/__printf($1, $2)/g ; print; }'[akpm@linux-foundation.org: revert arch bits]
Signed-off-by: Joe Perches
Cc: "Kirill A. Shutemov"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The pretty much brings in the kitchen sink along
with it, so it should be avoided wherever reasonably possible in
terms of being included from other commonly used
files, as it results in a measureable increase on compile times.The worst culprit was probably device.h since it is used everywhere.
This file also had an implicit dependency/usage of mutex.h which was
masked by module.h, and is also fixed here at the same time.There are over a dozen other headers that simply declare the
struct instead of pulling in the whole file, so follow their lead
and simply make it a few more.Most of the implicit dependencies on module.h being present by
these headers pulling it in have been now weeded out, so we can
finally make this change with hopefully minimal breakage.Signed-off-by: Paul Gortmaker
18 Aug, 2011
1 commit
-
The l4_rxhash flag was added to the skb structure to indicate
that the rxhash value was computed over the 4 tuple for the
packet which includes the port information in the encapsulated
transport packet. This is used by the stack to preserve the
rxhash value in __skb_rx_tunnel.Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller
26 Jul, 2011
1 commit
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
fs: Merge split strings
treewide: fix potentially dangerous trailing ';' in #defined values/expressions
uwb: Fix misspelling of neighbourhood in comment
net, netfilter: Remove redundant goto in ebt_ulog_packet
trivial: don't touch files that are removed in the staging tree
lib/vsprintf: replace link to Draft by final RFC number
doc: Kconfig: `to be' -> `be'
doc: Kconfig: Typo: square -> squared
doc: Konfig: Documentation/power/{pm => apm-acpi}.txt
drivers/net: static should be at beginning of declaration
drivers/media: static should be at beginning of declaration
drivers/i2c: static should be at beginning of declaration
XTENSA: static should be at beginning of declaration
SH: static should be at beginning of declaration
MIPS: static should be at beginning of declaration
ARM: static should be at beginning of declaration
rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check
Update my e-mail address
PCIe ASPM: forcedly -> forcibly
gma500: push through device driver tree
...Fix up trivial conflicts:
- arch/arm/mach-ep93xx/dma-m2p.c (deleted)
- drivers/gpio/gpio-ep93xx.c (renamed and context nearby)
- drivers/net/r8169.c (just context changes)
09 Jul, 2011
1 commit
-
Since ca5ecddf (rcu: define __rcu address space modifier for sparse)
rcu_dereference_check use rcu_read_lock_held as a part of condition
automatically so callers do not have to do that as well.Signed-off-by: Michal Hocko
Acked-by: Paul E. McKenney
Signed-off-by: Jiri Kosina
07 Jul, 2011
1 commit
-
Signed-off-by: Shirley Ma
Signed-off-by: David S. Miller
06 Jul, 2011
1 commit
28 Jun, 2011
2 commits
-
Fix 'make htmldocs' warnings:
Warning(/include/linux/hrtimer.h:153): No description found for
parameter 'clockid'
Warning(/include/linux/device.h:604): Excess struct/union/enum/typedef
member 'of_match' description in 'device'
Warning(/include/net/sock.h:349): Excess struct/union/enum/typedef
member 'sk_rmem_alloc' description in 'sock'Signed-off-by: Vitaliy Ivanov
Acked-by: Grant Likely
Acked-by: David S. Miller
Acked-by: Randy Dunlap
Signed-off-by: Jiri Kosina -
Fix 'make htmldocs' warnings:
Warning(/include/linux/hrtimer.h:153): No description found for parameter 'clockid'
Warning(/include/linux/device.h:604): Excess struct/union/enum/typedef member 'of_match' description in 'device'
Warning(/include/net/sock.h:349): Excess struct/union/enum/typedef member 'sk_rmem_alloc' description in 'sock'Signed-off-by: Vitaliy Ivanov
Acked-by: Grant Likely
Acked-by: David S. Miller
Signed-off-by: Linus Torvalds
07 Jun, 2011
1 commit
-
* remove interrupt.g inclusion from netdevice.h -- not needed
* fixup fallout, add interrupt.h and hardirq.h back where needed.Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller
12 Apr, 2011
1 commit
-
Conflicts:
drivers/net/smsc911x.c
07 Apr, 2011
1 commit
-
commit c6e1a0d12ca7b4f22c58e55a16beacfb7d3d8462 broken the calc
(net: Allow no-cache copy from user on transmit)
of checksum, which may cause some tcp packets be dropped because
incorrect checksum. ssh does not work under today's net-next-2.6
tree.Signed-off-by: Wei Yongjun
Acked-by: Tom Herbert
Signed-off-by: David S. Miller
05 Apr, 2011
1 commit
-
This patch uses __copy_from_user_nocache on transmit to bypass data
cache for a performance improvement. skb_add_data_nocache and
skb_copy_to_page_nocache can be called by sendmsg functions to use
this feature, initial support is in tcp_sendmsg. This functionality is
configurable per device using ethtool.Presumably, this feature would only be useful when the driver does
not touch the data. The feature is turned on by default if a device
indicates that it does some form of checksum offload; it is off by
default for devices that do no checksum offload or indicate no checksum
is necessary. For the former case copy-checksum is probably done
anyway, in the latter case the device is likely loopback in which case
the no cache copy is probably not beneficial.This patch was tested using 200 instances of netperf TCP_RR with
1400 byte request and one byte reply. Platform is 16 core AMD x86.No-cache copy disabled:
672703 tps, 97.13% utilization
50/90/99% latency:244.31 484.205 1028.41No-cache copy enabled:
702113 tps, 96.16% utilization,
50/90/99% latency 238.56 467.56 956.955Using 14000 byte request and response sizes demonstrate the
effects more dramatically:No-cache copy disabled:
79571 tps, 34.34 %utlization
50/90/95% latency 1584.46 2319.59 5001.76No-cache copy enabled:
83856 tps, 34.81% utilization
50/90/95% latency 2508.42 2622.62 2735.88Note especially the effect on latency tail (95th percentile).
This seems to provide a nice performance improvement and is
consistent in the tests I ran. Presumably, this would provide
the greatest benfits in the presence of an application workload
stressing the cache and a lot of transmit data happening.Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller
31 Mar, 2011
1 commit
-
Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: Lucas De Marchi
23 Feb, 2011
1 commit
-
Add proper RCU annotations/verbs to sk_wq and wq members
Fix __sctp_write_space() sk_sleep() abuse (and sock->wq access)
Fix sunrpc sk_sleep() abuse too
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
01 Feb, 2011
1 commit
30 Jan, 2011
1 commit
-
SIOCGETSGCNT is not a unique ioctl value as it it maps tio SIOCPROTOPRIVATE +1,
which unfortunately means the existing infrastructure for compat networking
ioctls is insufficient. A trivial compact ioctl implementation would conflict
with:SIOCAX25ADDUID
SIOCAIPXPRISLT
SIOCGETSGCNT_IN6
SIOCGETSGCNT
SIOCRSSCAUSE
SIOCX25SSUBSCRIP
SIOCX25SDTEFACILITIESTo make this work I have updated the compat_ioctl decode path to mirror the
the normal ioctl decode path. I have added an ipv4 inet_compat_ioctl function
so that I can have ipv4 specific compat ioctls. I have added a compat_ioctl
function into struct proto so I can break out ioctls by which kind of ip socket
I am using. I have added a compat_raw_ioctl function because SIOCGETSGCNT only
works on raw sockets. I have added a ipmr_compat_ioctl that mirrors the normal
ipmr_ioctl.This was necessary because unfortunately the struct layout for the SIOCGETSGCNT
has unsigned longs in it so changes between 32bit and 64bit kernels.This change was sufficient to run a 32bit ip multicast routing daemon on a
64bit kernel.Reported-by: Bill Fenner
Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller
19 Jan, 2011
1 commit
-
Packet filter (BPF) doesnt need to disable softirqs, being fully
re-entrant and lock-less.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
10 Jan, 2011
1 commit
-
Fix new kernel-doc notation warning in sock.h by annotating skc_dontcopy_*
as private fields.Warning(include/net/sock.h:163): No description found for parameter 'skc_dontcopy_end[0]'
Signed-off-by: Randy Dunlap
Signed-off-by: David S. Miller
18 Dec, 2010
1 commit
-
Conflicts:
drivers/net/bnx2x/bnx2x.h
drivers/net/wireless/iwlwifi/iwl-1000.c
drivers/net/wireless/iwlwifi/iwl-6000.c
drivers/net/wireless/iwlwifi/iwl-core.h
drivers/vhost/vhost.c
17 Dec, 2010
1 commit
-
Special care is taken inside sk_port_alloc to avoid overwriting
skc_node/skc_nulls_node. We should also avoid overwriting
skc_bind_node/skc_portaddr_node.The patch fixes the following crash:
BUG: unable to handle kernel paging request at fffffffffffffff0
IP: [] udp4_lib_lookup2+0xad/0x370
[] __udp4_lib_lookup+0x282/0x360
[] __udp4_lib_rcv+0x31e/0x700
[] ? ip_local_deliver_finish+0x65/0x190
[] ? ip_local_deliver+0x88/0xa0
[] udp_rcv+0x15/0x20
[] ip_local_deliver_finish+0x65/0x190
[] ip_local_deliver+0x88/0xa0
[] ip_rcv_finish+0x32d/0x6f0
[] ? netif_receive_skb+0x99c/0x11c0
[] ip_rcv+0x2bb/0x350
[] netif_receive_skb+0x99c/0x11c0Signed-off-by: Leonard Crestez
Signed-off-by: Octavian Purdila
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
10 Dec, 2010
1 commit
-
Followup of commit b178bb3dfc30 (net: reorder struct sock fields)
Optimize INET input path a bit further, by :
1) moving sk_refcnt close to sk_lock.
This reduces number of dirtied cache lines by one on 64bit arches (and
64 bytes cache line size).2) moving inet_daddr & inet_rcv_saddr at the beginning of sk
(same cache line than hash / family / bound_dev_if / nulls_node)
This reduces number of accessed cache lines in lookups by one, and dont
increase size of inet and timewait socks.
inet and tw sockets now share same place-holder for these fields.Before patch :
offsetof(struct sock, sk_refcnt) = 0x10
offsetof(struct sock, sk_lock) = 0x40
offsetof(struct sock, sk_receive_queue) = 0x60
offsetof(struct inet_sock, inet_daddr) = 0x270
offsetof(struct inet_sock, inet_rcv_saddr) = 0x274After patch :
offsetof(struct sock, sk_refcnt) = 0x44
offsetof(struct sock, sk_lock) = 0x48
offsetof(struct sock, sk_receive_queue) = 0x68
offsetof(struct inet_sock, inet_daddr) = 0x0
offsetof(struct inet_sock, inet_rcv_saddr) = 0x4compute_score() (udp or tcp) now use a single cache line per ignored
item, instead of two.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
09 Dec, 2010
1 commit
-
Conflicts:
drivers/net/wireless/ath/ath9k/ar9003_eeprom.c
net/llc/af_llc.c
07 Dec, 2010
1 commit
-
Pavel Emelyanov tried to fix a race between sk_filter_(de|at)tach and
sk_clone() in commit 47e958eac280c263397Problem is we can have several clones sharing a common sk_filter, and
these clones might want to sk_filter_attach() their own filters at the
same time, and can overwrite old_filter->rcu, corrupting RCU queues.We can not use filter->rcu without being sure no other thread could do
the same thing.Switch code to a more conventional ref-counting technique : Do the
atomic decrement immediately and queue one rcu call back when last
reference is released.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
03 Dec, 2010
1 commit
-
These macros have been defined for several years since v2.6.12-rc2(tracing by git),
but never be used. So remove them.Signed-off-by: Shan Wei
Signed-off-by: David S. Miller
17 Nov, 2010
2 commits
-
Right now, fields in struct sock are not optimally ordered, because each
path (RX softirq, TX completion, RX user, TX user) has to touch fields
that are contained in many different cache lines.The really critical thing is to shrink number of cache lines that are
used at RX softirq time : CPU handling softirqs for a device can receive
many frames per second for many sockets. If load is too big, we can drop
frames at NIC level. RPS or multiqueue cards can help, but better reduce
latency if possible.This patch starts with UDP protocol, then additional patches will try to
reduce latencies of other ones as well.At RX softirq time, fields of interest for UDP protocol are :
(not counting ones in inet struct for the lookup)Read/Written:
sk_refcnt (atomic increment/decrement)
sk_rmem_alloc & sk_backlog.len (to check if there is room in queues)
sk_receive_queue
sk_backlog (if socket locked by user program)
sk_rxhash
sk_forward_alloc
sk_dropsRead only:
sk_rcvbuf (sk_rcvqueues_full())
sk_filter
sk_wq
sk_policy[0]
sk_flagsAdditional notes :
- sk_backlog has one hole on 64bit arches. We can fill it to save 8
bytes.
- sk_backlog is used only if RX sofirq handler finds the socket while
locked by user.
- sk_rxhash is written only once per flow.
- sk_drops is written only if queues are fullFinal layout :
[1] One section grouping all read/write fields, but placing rxhash and
sk_backlog at the end of this section.[2] One section grouping all read fields in RX handler
(sk_filter, sk_rcv_buf, sk_wq)[3] Section used by other paths
I'll post a patch on its own to put sk_refcnt at the end of struct
sock_common so that it shares same cache line than section [1]New offsets on 64bit arch :
sizeof(struct sock)=0x268
offsetof(struct sock, sk_refcnt) =0x10
offsetof(struct sock, sk_lock) =0x48
offsetof(struct sock, sk_receive_queue)=0x68
offsetof(struct sock, sk_backlog)=0x80
offsetof(struct sock, sk_rmem_alloc)=0x80
offsetof(struct sock, sk_forward_alloc)=0x98
offsetof(struct sock, sk_rxhash)=0x9c
offsetof(struct sock, sk_rcvbuf)=0xa4
offsetof(struct sock, sk_drops) =0xa0
offsetof(struct sock, sk_filter)=0xa8
offsetof(struct sock, sk_wq)=0xb0
offsetof(struct sock, sk_policy)=0xd0
offsetof(struct sock, sk_flags) =0xe0Instead of :
sizeof(struct sock)=0x270
offsetof(struct sock, sk_refcnt) =0x10
offsetof(struct sock, sk_lock) =0x50
offsetof(struct sock, sk_receive_queue)=0xc0
offsetof(struct sock, sk_backlog)=0x70
offsetof(struct sock, sk_rmem_alloc)=0xac
offsetof(struct sock, sk_forward_alloc)=0x10c
offsetof(struct sock, sk_rxhash)=0x128
offsetof(struct sock, sk_rcvbuf)=0x4c
offsetof(struct sock, sk_drops) =0x16c
offsetof(struct sock, sk_filter)=0x198
offsetof(struct sock, sk_wq)=0x88
offsetof(struct sock, sk_policy)=0x98
offsetof(struct sock, sk_flags) =0x130Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller -
UDP sockets refcount is usually 2, unless an incoming frame is going to
be queued in receive or backlog queue.Using atomic_inc_not_zero_hint() permits to reduce latency, because
processor issues less memory transactions.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
11 Nov, 2010
1 commit
-
Robin Holt tried to boot a 16TB machine and found some limits were
reached : sysctl_tcp_mem[2], sysctl_udp_mem[2]We can switch infrastructure to use long "instead" of "int", now
atomic_long_t primitives are available for free.Signed-off-by: Eric Dumazet
Reported-by: Robin Holt
Reviewed-by: Robin Holt
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller
26 Oct, 2010
1 commit
-
Add __rcu annotation to :
(struct sock)->sk_filterAnd use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=ySigned-off-by: Eric Dumazet
Signed-off-by: David S. Miller
27 Sep, 2010
1 commit
-
SOCK_MIN_RCVBUF current value is 256 bytes
It doesnt permit to receive the smallest possible frame, considering
socket sk_rmem_alloc/sk_rcvbuf account skb truesizes. On 64bit arches,
sizeof(struct sk_buff) is 240 bytes. Add the typical 64 bytes of
headroom, and we go over the limit.With old kernels and 32bit arches, we were under the limit, if netdriver
was doing copybreak.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
10 Sep, 2010
1 commit
-
Conflicts:
net/mac80211/main.c
09 Sep, 2010
1 commit
-
commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation)
added a secondary hash on UDP, hashed on (local addr, local port).Problem is that following sequence :
fd = socket(...)
connect(fd, &remote, ...)not only selects remote end point (address and port), but also sets
local address, while UDP stack stored in secondary hash table the socket
while its local address was INADDR_ANY (or ipv6 equivalent)Sequence is :
- autobind() : choose a random local port, insert socket in hash tables
[while local address is INADDR_ANY]
- connect() : set remote address and port, change local address to IP
given by a route lookup.When an incoming UDP frame comes, if more than 10 sockets are found in
primary hash table, we switch to secondary table, and fail to find
socket because its local address changed.One solution to this problem is to rehash datagram socket if needed.
We add a new rehash(struct socket *) method in "struct proto", and
implement this method for UDP v4 & v6, using a common helper.This rehashing only takes care of secondary hash table, since primary
hash (based on local port only) is not changed.Reported-by: Krzysztof Piotr Oledzki
Signed-off-by: Eric Dumazet
Tested-by: Krzysztof Piotr Oledzki
Signed-off-by: David S. Miller
19 Aug, 2010
1 commit
-
This patch removes the abstraction introduced by the union skb_shared_tx in
the shared skb data.The access of the different union elements at several places led to some
confusion about accessing the shared tx_flags e.g. in skb_orphan_try().http://marc.info/?l=linux-netdev&m=128084897415886&w=2
Signed-off-by: Oliver Hartkopp
Signed-off-by: David S. Miller
10 Aug, 2010
1 commit
-
Add missing kernel-doc notation to struct sock:
Warning(include/net/sock.h:324): No description found for parameter 'sk_peer_pid'
Warning(include/net/sock.h:324): No description found for parameter 'sk_peer_cred'
Warning(include/net/sock.h:324): No description found for parameter 'sk_classid'
Warning(include/net/sock.h:324): Excess struct/union/enum/typedef member 'sk_peercred' description in 'sock'Signed-off-by: Randy Dunlap
Signed-off-by: David S. Miller
21 Jul, 2010
1 commit
-
Conflicts:
drivers/vhost/net.c
net/bridge/br_device.cFix merge conflict in drivers/vhost/net.c with guidance from
Stephen Rothwell.Revert the effects of net-2.6 commit 573201f36fd9c7c6d5218cdcd9948cee700b277d
since net-next-2.6 has fixes that make bridge netpoll work properly thus
we don't need it disabled.Signed-off-by: David S. Miller
15 Jul, 2010
1 commit
-
Fix problem in reading the tx_queue recorded in a socket. In
dev_pick_tx, the TX queue is read by doing a check with
sk_tx_queue_recorded on the socket, followed by a sk_tx_queue_get.
The problem is that there is not mutual exclusion across these
calls in the socket so it it is possible that the queue in the
sock can be invalidated after sk_tx_queue_recorded is called so
that sk_tx_queue get returns -1, which sets 65535 in queue_index
and thus dev_pick_tx returns 65536 which is a bogus queue and
can cause crash in dev_queue_xmit.We fix this by only calling sk_tx_queue_get which does the proper
checks. The interface is that sk_tx_queue_get returns the TX queue
if the sock argument is non-NULL and TX queue is recorded, else it
returns -1. sk_tx_queue_recorded is no longer used so it can be
completely removed.Signed-off-by: Tom Herbert
Signed-off-by: David S. Miller
13 Jul, 2010
1 commit
-
a new boolean flag no_autobind is added to structure proto to avoid the autobind
calls when the protocol is TCP. Then sock_rps_record_flow() is called int the
TCP's sendmsg() and sendpage() pathes.Signed-off-by: Changli Gao
----
include/net/inet_common.h | 4 ++++
include/net/sock.h | 1 +
include/net/tcp.h | 8 ++++----
net/ipv4/af_inet.c | 15 +++++++++------
net/ipv4/tcp.c | 11 +++++------
net/ipv4/tcp_ipv4.c | 3 +++
net/ipv6/af_inet6.c | 8 ++++----
net/ipv6/tcp_ipv6.c | 3 +++
8 files changed, 33 insertions(+), 20 deletions(-)
Signed-off-by: David S. Miller
17 Jun, 2010
1 commit
-
Use struct pid and struct cred to store the peer credentials on struct
sock. This gives enough information to convert the peer credential
information to a value relative to whatever namespace the socket is in
at the time.This removes nasty surprises when using SO_PEERCRED on socket
connetions where the processes on either side are in different pid and
user namespaces.Signed-off-by: Eric W. Biederman
Acked-by: Daniel Lezcano
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller