10 Jan, 2012
1 commit
-
> net/core/sock.c: In function 'sk_update_clone':
> net/core/sock.c:1278:3: error: implicit declaration of function 'sock_update_memcg'Reported-by: Randy Dunlap
Signed-off-by: David S. Miller
09 Jan, 2012
1 commit
-
so move it there. Fixes build errors when CONFIG_INET is not defined:
In file included from include/linux/tcp.h:211:0,
from include/linux/ipv6.h:221,
from include/net/ipv6.h:16,
from include/linux/sunrpc/clnt.h:26,
from include/linux/nfs_fs.h:50,
from init/do_mounts.c:20:
include/net/sock.h: In function 'sk_update_clone':
include/net/sock.h:1109:3: error: implicit declaration of function 'sock_update_memcg' [-Werror=implicit-function-declaration]Signed-off-by: Stephen Rothwell
Signed-off-by: David S. Miller
08 Jan, 2012
1 commit
-
Sockets can also be created through sock_clone. Because it copies
all data in the sock structure, it also copies the memcg-related pointer,
and all should be fine. However, since we now use reference counts in
socket creation, we are left with some sockets that have no reference
counts. It matters when we destroy them, since it leads to a mismatch.Signed-off-by: Glauber Costa
CC: David S. Miller
CC: Greg Thelen
CC: Hiroyouki Kamezawa
CC: Laurent Chavey
Signed-off-by: David S. Miller
24 Dec, 2011
1 commit
-
Conflicts:
net/bluetooth/l2cap_core.cJust two overlapping changes, one added an initialization of
a local variable, and another change added a new local variable.Signed-off-by: David S. Miller
23 Dec, 2011
1 commit
-
skb->truesize might be big even for a small packet.
Its even bigger after commit 87fb4b7b533 (net: more accurate skb
truesize) and big MTU.We should allow queueing at least one packet per receiver, even with a
low RCVBUF setting.Reported-by: Michal Simek
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
17 Dec, 2011
1 commit
-
We can't scan the proto_list to initialize sock cgroups, as it
holds a rwlock, and we also want to keep the code generic enough to
avoid calling the initialization functions of protocols directly,Convert proto_list_lock into a mutex, so we can sleep and do the
necessary allocations. This lock is seldom taken, so there shouldn't
be any performance penalties associated with thatSigned-off-by: Glauber Costa
CC: Hiroyouki Kamezawa
CC: David S. Miller
CC: Eric Dumazet
CC: Stephen Rothwell
CC: Randy Dunlap
Signed-off-by: David S. Miller
13 Dec, 2011
3 commits
-
This patch introduces memory pressure controls for the tcp
protocol. It uses the generic socket memory pressure code
introduced in earlier patches, and fills in the
necessary data in cg_proto struct.Signed-off-by: Glauber Costa
Reviewed-by: KAMEZAWA Hiroyuki
CC: Eric W. Biederman
Signed-off-by: David S. Miller -
The goal of this work is to move the memory pressure tcp
controls to a cgroup, instead of just relying on global
conditions.To avoid excessive overhead in the network fast paths,
the code that accounts allocated memory to a cgroup is
hidden inside a static_branch(). This branch is patched out
until the first non-root cgroup is created. So when nobody
is using cgroups, even if it is mounted, no significant performance
penalty should be seen.This patch handles the generic part of the code, and has nothing
tcp-specific.Signed-off-by: Glauber Costa
Reviewed-by: KAMEZAWA Hiroyuki
CC: Kirill A. Shutemov
CC: David S. Miller
CC: Eric W. Biederman
CC: Eric Dumazet
Signed-off-by: David S. Miller -
This patch replaces all uses of struct sock fields' memory_pressure,
memory_allocated, sockets_allocated, and sysctl_mem to acessor
macros. Those macros can either receive a socket argument, or a mem_cgroup
argument, depending on the context they live in.Since we're only doing a macro wrapping here, no performance impact at all is
expected in the case where we don't have cgroups disabled.Signed-off-by: Glauber Costa
Reviewed-by: Hiroyouki Kamezawa
CC: David S. Miller
CC: Eric W. Biederman
CC: Eric Dumazet
Signed-off-by: David S. Miller
29 Nov, 2011
1 commit
-
We can test/set multiple bits from sk_flags at once, to shorten a bit
socket setup/dismantle phase.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
23 Nov, 2011
1 commit
-
This patch adds in the infrastructure code to create the network priority
cgroup. The cgroup, in addition to the standard processes file creates two
control files:1) prioidx - This is a read-only file that exports the index of this cgroup.
This is a value that is both arbitrary and unique to a cgroup in this subsystem,
and is used to index the per-device priority map2) priomap - This is a writeable file. On read it reports a table of 2-tuples
where name is the name of a network interface and priority is
indicates the priority assigned to frames egresessing on the named interface and
originating from a pid in this cgroupThis cgroup allows for skb priority to be set prior to a root qdisc getting
selected. This is benenficial for DCB enabled systems, in that it allows for any
application to use dcb configured priorities so without application modificationSigned-off-by: Neil Horman
Signed-off-by: John Fastabend
CC: Robert Love
CC: "David S. Miller"
Signed-off-by: David S. Miller
18 Nov, 2011
1 commit
-
…wireless-next into for-davem
Conflicts:
include/net/bluetooth/bluetooth.h
10 Nov, 2011
1 commit
-
The 802.1X EAPOL handshake hostapd does requires
knowing whether the frame was ack'ed by the peer.
Currently, we fudge this pretty badly by not even
transmitting the frame as a normal data frame but
injecting it with radiotap and getting the status
out of radiotap monitor as well. This is rather
complex, confuses users (mon.wlan0 presence) and
doesn't work with all hardware.To get rid of that hack, introduce a real wifi TX
status option for data frame transmissions.This works similar to the existing TX timestamping
in that it reflects the SKB back to the socket's
error queue with a SCM_WIFI_STATUS cmsg that has
an int indicating ACK status (0/1).Since it is possible that at some point we will
want to have TX timestamping and wifi status in a
single errqueue SKB (there's little point in not
doing that), redefine SO_EE_ORIGIN_TIMESTAMPING
to SO_EE_ORIGIN_TXSTATUS which can collect more
than just the timestamp; keep the old constant
as an alias of course. Currently the internal APIs
don't make that possible, but it wouldn't be hard
to split them up in a way that makes it possible.Thanks to Neil Horman for helping me figure out
the functions that add the control messages.Signed-off-by: Johannes Berg
Signed-off-by: John W. Linville
09 Nov, 2011
1 commit
-
Make clear that sk_clone() and inet_csk_clone() return a locked socket.
Add _lock() prefix and kerneldoc.
Suggested-by: Linus Torvalds
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
26 Oct, 2011
1 commit
-
Signed-off-by: Thomas Gleixner
Signed-off-by: David S. Miller
14 Oct, 2011
1 commit
-
skb truesize currently accounts for sk_buff struct and part of skb head.
kmalloc() roundings are also ignored.Considering that skb_shared_info is larger than sk_buff, its time to
take it into account for better memory accounting.This patch introduces SKB_TRUESIZE(X) macro to centralize various
assumptions into a single place.At skb alloc phase, we put skb_shared_info struct at the exact end of
skb head, to allow a better use of memory (lowering number of
reallocations), since kmalloc() gives us power-of-two memory blocks.Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are
aligned to cache lines, as before.Note: This patch might trigger performance regressions because of
misconfigured protocol stacks, hitting per socket or global memory
limits that were previously not reached. But its a necessary step for a
more accurate memory accounting.Signed-off-by: Eric Dumazet
CC: Andi Kleen
CC: Ben Hutchings
Signed-off-by: David S. Miller
08 Oct, 2011
1 commit
-
There's no point in open-coding sock_valbool_flag().
Signed-off-by: Johannes Berg
Acked-by: Neil Horman
Signed-off-by: David S. Miller
25 Aug, 2011
1 commit
-
Signed-off-by: Ian Campbell
Cc: "David S. Miller"
Cc: Eric Dumazet
Cc: "Michał Mirosław"
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller
02 Aug, 2011
1 commit
-
When assigning a NULL value to an RCU protected pointer, no barrier
is needed. The rcu_assign_pointer, used to handle that but will soon
change to not handle the special case.Convert all rcu_assign_pointer of NULL value.
//smpl
@@ expression P; @@- rcu_assign_pointer(P, NULL)
+ RCU_INIT_POINTER(P, NULL)//
Signed-off-by: Stephen Hemminger
Acked-by: Paul E. McKenney
Signed-off-by: David S. Miller
08 Jul, 2011
1 commit
-
…wireless-next-2.6 into for-davem
06 Jul, 2011
1 commit
-
Signed-off-by: Lauro Ramos Venancio
Signed-off-by: Aloisio Almeida Jr
Signed-off-by: John W. Linville
22 Jun, 2011
1 commit
-
This patch adds 2 tracepoints to get a status of a socket receive queue
and related parameter.One tracepoint is added to sock_queue_rcv_skb. It records rcvbuf size
and its usage. The other tracepoint is added to __sk_mem_schedule and
it records limitations of memory for sockets and current usage.By using these tracepoints we're able to know detailed reason why kernel
drop the packet.Signed-off-by: Satoru Moriya
Acked-by: Neil Horman
Signed-off-by: David S. Miller
31 Mar, 2011
1 commit
-
Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: Lucas De Marchi
14 Jan, 2011
1 commit
-
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (46 commits)
hwrng: via_rng - Fix memory scribbling on some CPUs
crypto: padlock - Move padlock.h into include/crypto
hwrng: via_rng - Fix asm constraints
crypto: n2 - use __devexit not __exit in n2_unregister_algs
crypto: mark crypto workqueues CPU_INTENSIVE
crypto: mv_cesa - dont return PTR_ERR() of wrong pointer
crypto: ripemd - Set module author and update email address
crypto: omap-sham - backlog handling fix
crypto: gf128mul - Remove experimental tag
crypto: af_alg - fix af_alg memory_allocated data type
crypto: aesni-intel - Fixed build with binutils 2.16
crypto: af_alg - Make sure sk_security is initialized on accept()ed sockets
net: Add missing lockdep class names for af_alg
include: Install linux/if_alg.h for user-space crypto API
crypto: omap-aes - checkpatch --file warning fixes
crypto: omap-aes - initialize aes module once per request
crypto: omap-aes - unnecessary code removed
crypto: omap-aes - error handling implementation improved
crypto: omap-aes - redundant locking is removed
crypto: omap-aes - DMA initialization fixes for OMAP off mode
...
07 Jan, 2011
1 commit
-
Leonardo Chiquitto found poll() could block forever on tcp sockets and
Urgent data was received, if the event flag only contains POLLPRI.He did a bisection and found commit 4938d7e0233 (poll: avoid extra
wakeups in select/poll) was the source of the problem.Problem is TCP sockets use standard sock_def_readable() function for
their sk_data_ready() handler, and sock_def_readable() doesnt signal
POLLPRI.Only TCP is affected by the problem. Adding POLLPRI to the list of flags
might trigger unnecessary schedules, but URGENT handling is such a
seldom used feature this seems a good compromise.Thanks a lot to Leonardo for providing the bisection result and a test
program as well.Reference : http://www.spinics.net/lists/netdev/msg151793.html
Reported-and-bisected-by: Leonardo Chiquitto
Signed-off-by: Eric Dumazet
Tested-by: Eric Dumazet
Signed-off-by: David S. Miller
18 Dec, 2010
1 commit
-
Conflicts:
drivers/net/bnx2x/bnx2x.h
drivers/net/wireless/iwlwifi/iwl-1000.c
drivers/net/wireless/iwlwifi/iwl-6000.c
drivers/net/wireless/iwlwifi/iwl-core.h
drivers/vhost/vhost.c
17 Dec, 2010
1 commit
-
Special care is taken inside sk_port_alloc to avoid overwriting
skc_node/skc_nulls_node. We should also avoid overwriting
skc_bind_node/skc_portaddr_node.The patch fixes the following crash:
BUG: unable to handle kernel paging request at fffffffffffffff0
IP: [] udp4_lib_lookup2+0xad/0x370
[] __udp4_lib_lookup+0x282/0x360
[] __udp4_lib_rcv+0x31e/0x700
[] ? ip_local_deliver_finish+0x65/0x190
[] ? ip_local_deliver+0x88/0xa0
[] udp_rcv+0x15/0x20
[] ip_local_deliver_finish+0x65/0x190
[] ip_local_deliver+0x88/0xa0
[] ip_rcv_finish+0x32d/0x6f0
[] ? netif_receive_skb+0x99c/0x11c0
[] ip_rcv+0x2bb/0x350
[] netif_receive_skb+0x99c/0x11c0Signed-off-by: Leonard Crestez
Signed-off-by: Octavian Purdila
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
10 Dec, 2010
1 commit
-
Followup of commit b178bb3dfc30 (net: reorder struct sock fields)
Optimize INET input path a bit further, by :
1) moving sk_refcnt close to sk_lock.
This reduces number of dirtied cache lines by one on 64bit arches (and
64 bytes cache line size).2) moving inet_daddr & inet_rcv_saddr at the beginning of sk
(same cache line than hash / family / bound_dev_if / nulls_node)
This reduces number of accessed cache lines in lookups by one, and dont
increase size of inet and timewait socks.
inet and tw sockets now share same place-holder for these fields.Before patch :
offsetof(struct sock, sk_refcnt) = 0x10
offsetof(struct sock, sk_lock) = 0x40
offsetof(struct sock, sk_receive_queue) = 0x60
offsetof(struct inet_sock, inet_daddr) = 0x270
offsetof(struct inet_sock, inet_rcv_saddr) = 0x274After patch :
offsetof(struct sock, sk_refcnt) = 0x44
offsetof(struct sock, sk_lock) = 0x48
offsetof(struct sock, sk_receive_queue) = 0x68
offsetof(struct inet_sock, inet_daddr) = 0x0
offsetof(struct inet_sock, inet_rcv_saddr) = 0x4compute_score() (udp or tcp) now use a single cache line per ignored
item, instead of two.Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
08 Dec, 2010
1 commit
-
Signed-off-by: Miloslav Trmač
Signed-off-by: Herbert Xu
11 Nov, 2010
1 commit
-
Robin Holt tried to boot a 16TB machine and found some limits were
reached : sysctl_tcp_mem[2], sysctl_udp_mem[2]We can switch infrastructure to use long "instead" of "int", now
atomic_long_t primitives are available for free.Signed-off-by: Eric Dumazet
Reported-by: Robin Holt
Reviewed-by: Robin Holt
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller
26 Oct, 2010
1 commit
-
Add __rcu annotation to :
(struct sock)->sk_filterAnd use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=ySigned-off-by: Eric Dumazet
Signed-off-by: David S. Miller
24 Oct, 2010
1 commit
-
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits)
bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
vlan: Calling vlan_hwaccel_do_receive() is always valid.
tproxy: use the interface primary IP address as a default value for --on-ip
tproxy: added IPv6 support to the socket match
cxgb3: function namespace cleanup
tproxy: added IPv6 support to the TPROXY target
tproxy: added IPv6 socket lookup function to nf_tproxy_core
be2net: Changes to use only priority codes allowed by f/w
tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
tproxy: added tproxy sockopt interface in the IPV6 layer
tproxy: added udp6_lib_lookup function
tproxy: added const specifiers to udp lookup functions
tproxy: split off ipv6 defragmentation to a separate module
l2tp: small cleanup
nf_nat: restrict ICMP translation for embedded header
can: mcp251x: fix generation of error frames
can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set
can-raw: add msg_flags to distinguish local traffic
9p: client code cleanup
rds: make local functions/variables static
...Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and
drivers/net/wireless/ath/ath9k/debug.c as per David
08 Oct, 2010
1 commit
-
> ===================================================
> [ INFO: suspicious rcu_dereference_check() usage. ]
> ---------------------------------------------------
> include/linux/cgroup.h:542 invoked rcu_dereference_check() without protection!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 1, debug_locks = 0
> 1 lock held by swapper/1:
> #0: (net_mutex){+.+.+.}, at: []
> register_pernet_subsys+0x1f/0x47
>
> stack backtrace:
> Pid: 1, comm: swapper Not tainted 2.6.35.4-28.fc14.x86_64 #1
> Call Trace:
> [] lockdep_rcu_dereference+0xaa/0xb3
> [] sock_update_classid+0x7c/0xa2
> [] sk_alloc+0x6b/0x77
> [] __netlink_create+0x37/0xab
> [] ? rtnetlink_rcv+0x0/0x2d
> [] netlink_kernel_create+0x74/0x19d
> [] ? __mutex_lock_common+0x339/0x35b
> [] rtnetlink_net_init+0x2e/0x48
> [] ops_init+0xe9/0xff
> [] register_pernet_operations+0xab/0x130
> [] register_pernet_subsys+0x2e/0x47
> [] rtnetlink_init+0x53/0x102
> [] netlink_proto_init+0x126/0x143
> [] ? netlink_proto_init+0x0/0x143
> [] do_one_initcall+0x72/0x186
> [] kernel_init+0x23b/0x2c9
> [] kernel_thread_helper+0x4/0x10
> [] ? restore_args+0x0/0x30
> [] ? kernel_init+0x0/0x2c9
> [] ? kernel_thread_helper+0x0/0x10The sock_update_classid() function calls task_cls_classid(current),
but the calling task cannot go away, so there is no danger of
the associated structures disappearing. Insert an RCU read-side
critical section to suppress the false positive.Reported-by: Subrata Modak
Signed-off-by: Paul E. McKenney
27 Sep, 2010
1 commit
-
Conflicts:
drivers/net/qlcnic/qlcnic_init.c
net/ipv4/ip_output.c
25 Sep, 2010
1 commit
-
We have for each socket :
One spinlock (sk_slock.slock)
One rwlock (sk_callback_lock)Possible scenarios are :
(A) (this is used in net/sunrpc/xprtsock.c)
read_lock(&sk->sk_callback_lock) (without blocking BH)spin_lock(&sk->sk_slock.slock);
...
read_lock(&sk->sk_callback_lock);
...(B)
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)(C)
spin_lock_bh(&sk->sk_slock)
...
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)
spin_unlock_bh(&sk->sk_slock)This (C) case conflicts with (A) :
CPU1 [A] CPU2 [C]
read_lock(callback_lock)
spin_lock_bh(slock)We have one problematic (C) use case in inet_csk_listen_stop() :
local_bh_disable();
bh_lock_sock(child); // spin_lock_bh(&sk->sk_slock)
WARN_ON(sock_owned_by_user(child));
...
sock_orphan(child); // write_lock_bh(&sk->sk_callback_lock)lockdep is not happy with this, as reported by Tetsuo Handa
It seems only way to deal with this is to use read_lock_bh(callbacklock)
everywhere.Thanks to Jarek for pointing a bug in my first attempt and suggesting
this solution.Reported-by: Tetsuo Handa
Tested-by: Tetsuo Handa
Signed-off-by: Eric Dumazet
CC: Jarek Poplawski
Tested-by: Eric Dumazet
Signed-off-by: David S. Miller
10 Sep, 2010
1 commit
-
__lock_sock() and __release_sock() releases and regrabs lock but
were missing proper annotations. Add it. This removes following
warning from sparse. (Currently __lock_sock() does not emit any
warning about it but I think it is better to add also.)net/core/sock.c:1580:17: warning: context imbalance in '__release_sock' - unexpected unlock
Signed-off-by: Namhyung Kim
Signed-off-by: David S. Miller
20 Jul, 2010
1 commit
-
Use modern this_cpu_xxx() api, saving few bytes on x86
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
13 Jul, 2010
1 commit
-
Avoid two extra instructions in sock_free(), to reload
skb->truesize and skb->skSigned-off-by: Eric Dumazet
Signed-off-by: David S. Miller
17 Jun, 2010
2 commits
-
AF_UNIX references this, and can be built as a module,
so...Signed-off-by: David S. Miller
-
Use struct pid and struct cred to store the peer credentials on struct
sock. This gives enough information to convert the peer credential
information to a value relative to whatever namespace the socket is in
at the time.This removes nasty surprises when using SO_PEERCRED on socket
connetions where the processes on either side are in different pid and
user namespaces.Signed-off-by: Eric W. Biederman
Acked-by: Daniel Lezcano
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller