Eric Lee / smarc-fsl-linux-kernel

10 Jan, 2012

1 commit

3969eb385 net: Fix build with INET disabled. ... Browse Code »

> net/core/sock.c: In function 'sk_update_clone':
> net/core/sock.c:1278:3: error: implicit declaration of function 'sock_update_memcg'

Reported-by: Randy Dunlap
Signed-off-by: David S. Miller

David S. Miller
2012-01-10 05:44:23 +0800

09 Jan, 2012

1 commit

475f1b526 net: sk_update_clone is only used in net/core/sock.c ... Browse Code »
43

so move it there. Fixes build errors when CONFIG_INET is not defined:

In file included from include/linux/tcp.h:211:0,
from include/linux/ipv6.h:221,
from include/net/ipv6.h:16,
from include/linux/sunrpc/clnt.h:26,
from include/linux/nfs_fs.h:50,
from init/do_mounts.c:20:
include/net/sock.h: In function 'sk_update_clone':
include/net/sock.h:1109:3: error: implicit declaration of function 'sock_update_memcg' [-Werror=implicit-function-declaration]

Signed-off-by: Stephen Rothwell
Signed-off-by: David S. Miller

Stephen Rothwell
2012-01-09 15:44:26 +0800

08 Jan, 2012

1 commit

f3f511e1c net: fix sock_clone reference mismatch with tcp memcontrol ... Browse Code »

Sockets can also be created through sock_clone. Because it copies
all data in the sock structure, it also copies the memcg-related pointer,
and all should be fine. However, since we now use reference counts in
socket creation, we are left with some sockets that have no reference
counts. It matters when we destroy them, since it leads to a mismatch.

Signed-off-by: Glauber Costa
CC: David S. Miller
CC: Greg Thelen
CC: Hiroyouki Kamezawa
CC: Laurent Chavey
Signed-off-by: David S. Miller

Glauber Costa
2012-01-08 02:16:34 +0800

24 Dec, 2011

1 commit

abb434cb0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Conflicts:
net/bluetooth/l2cap_core.c

Just two overlapping changes, one added an initialization of
a local variable, and another change added a new local variable.

Signed-off-by: David S. Miller

David S. Miller
2011-12-24 06:13:56 +0800

23 Dec, 2011

1 commit

0fd7bac6b net: relax rcvbuf limits ... Browse Code »

skb->truesize might be big even for a small packet.

Its even bigger after commit 87fb4b7b533 (net: more accurate skb
truesize) and big MTU.

We should allow queueing at least one packet per receiver, even with a
low RCVBUF setting.

Reported-by: Michal Simek
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-12-23 15:15:14 +0800

17 Dec, 2011

1 commit

36b77a520 net: fix sleeping while atomic problem in sock mem_cgroup. ... Browse Code »

We can't scan the proto_list to initialize sock cgroups, as it
holds a rwlock, and we also want to keep the code generic enough to
avoid calling the initialization functions of protocols directly,

Convert proto_list_lock into a mutex, so we can sleep and do the
necessary allocations. This lock is seldom taken, so there shouldn't
be any performance penalties associated with that

Signed-off-by: Glauber Costa
CC: Hiroyouki Kamezawa
CC: David S. Miller
CC: Eric Dumazet
CC: Stephen Rothwell
CC: Randy Dunlap
Signed-off-by: David S. Miller

Glauber Costa
2011-12-17 04:35:17 +0800

13 Dec, 2011

3 commits

d1a4c0b37 tcp memory pressure controls ... Browse Code »

This patch introduces memory pressure controls for the tcp
protocol. It uses the generic socket memory pressure code
introduced in earlier patches, and fills in the
necessary data in cg_proto struct.

Signed-off-by: Glauber Costa
Reviewed-by: KAMEZAWA Hiroyuki
CC: Eric W. Biederman
Signed-off-by: David S. Miller

Glauber Costa
2011-12-13 08:04:10 +0800
e1aab161e socket: initial cgroup code. ... Browse Code »

The goal of this work is to move the memory pressure tcp
controls to a cgroup, instead of just relying on global
conditions.

To avoid excessive overhead in the network fast paths,
the code that accounts allocated memory to a cgroup is
hidden inside a static_branch(). This branch is patched out
until the first non-root cgroup is created. So when nobody
is using cgroups, even if it is mounted, no significant performance
penalty should be seen.

This patch handles the generic part of the code, and has nothing
tcp-specific.

Signed-off-by: Glauber Costa
Reviewed-by: KAMEZAWA Hiroyuki
CC: Kirill A. Shutemov
CC: David S. Miller
CC: Eric W. Biederman
CC: Eric Dumazet
Signed-off-by: David S. Miller

Glauber Costa
2011-12-13 08:04:10 +0800
180d8cd94 foundations of per-cgroup memory pressure controlling. ... Browse Code »
43

This patch replaces all uses of struct sock fields' memory_pressure,
memory_allocated, sockets_allocated, and sysctl_mem to acessor
macros. Those macros can either receive a socket argument, or a mem_cgroup
argument, depending on the context they live in.

Since we're only doing a macro wrapping here, no performance impact at all is
expected in the case where we don't have cgroups disabled.

Signed-off-by: Glauber Costa
Reviewed-by: Hiroyouki Kamezawa
CC: David S. Miller
CC: Eric W. Biederman
CC: Eric Dumazet
Signed-off-by: David S. Miller

Glauber Costa
2011-12-13 08:04:10 +0800

29 Nov, 2011

1 commit

08e29af3a net: optimize socket timestamping ... Browse Code »

We can test/set multiple bits from sk_flags at once, to shorten a bit
socket setup/dismantle phase.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-11-29 13:27:11 +0800

23 Nov, 2011

1 commit

5bc1421e3 net: add network priority cgroup infrastructure (v4) ... Browse Code »

This patch adds in the infrastructure code to create the network priority
cgroup. The cgroup, in addition to the standard processes file creates two
control files:

1) prioidx - This is a read-only file that exports the index of this cgroup.
This is a value that is both arbitrary and unique to a cgroup in this subsystem,
and is used to index the per-device priority map

2) priomap - This is a writeable file. On read it reports a table of 2-tuples
where name is the name of a network interface and priority is
indicates the priority assigned to frames egresessing on the named interface and
originating from a pid in this cgroup

This cgroup allows for skb priority to be set prior to a root qdisc getting
selected. This is benenficial for DCB enabled systems, in that it allows for any
application to use dcb configured priorities so without application modification

Signed-off-by: Neil Horman
Signed-off-by: John Fastabend
CC: Robert Love
CC: "David S. Miller"
Signed-off-by: David S. Miller

Neil Horman
2011-11-23 04:22:23 +0800

18 Nov, 2011

1 commit

e11c259f7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/… ... Browse Code »

…wireless-next into for-davem

Conflicts:
include/net/bluetooth/bluetooth.h

John W. Linville
2011-11-18 02:11:43 +0800

10 Nov, 2011

1 commit

6e3e939f3 net: add wireless TX status socket option ... Browse Code »

The 802.1X EAPOL handshake hostapd does requires
knowing whether the frame was ack'ed by the peer.
Currently, we fudge this pretty badly by not even
transmitting the frame as a normal data frame but
injecting it with radiotap and getting the status
out of radiotap monitor as well. This is rather
complex, confuses users (mon.wlan0 presence) and
doesn't work with all hardware.

To get rid of that hack, introduce a real wifi TX
status option for data frame transmissions.

This works similar to the existing TX timestamping
in that it reflects the SKB back to the socket's
error queue with a SCM_WIFI_STATUS cmsg that has
an int indicating ACK status (0/1).

Since it is possible that at some point we will
want to have TX timestamping and wifi status in a
single errqueue SKB (there's little point in not
doing that), redefine SO_EE_ORIGIN_TIMESTAMPING
to SO_EE_ORIGIN_TXSTATUS which can collect more
than just the timestamp; keep the old constant
as an alias of course. Currently the internal APIs
don't make that possible, but it wouldn't be hard
to split them up in a way that makes it possible.

Thanks to Neil Horman for helping me figure out
the functions that add the control messages.

Signed-off-by: Johannes Berg
Signed-off-by: John W. Linville

Johannes Berg
2011-11-10 05:01:02 +0800

09 Nov, 2011

1 commit

e56c57d0d net: rename sk_clone to sk_clone_lock ... Browse Code »

Make clear that sk_clone() and inet_csk_clone() return a locked socket.

Add _lock() prefix and kerneldoc.

Suggested-by: Linus Torvalds
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-11-09 06:07:07 +0800

26 Oct, 2011

1 commit

b0691c8ee net: Unlock sock before calling sk_free() ... Browse Code »
1

Signed-off-by: Thomas Gleixner
Signed-off-by: David S. Miller

Thomas Gleixner
2011-10-26 07:17:25 +0800

14 Oct, 2011

1 commit

87fb4b7b5 net: more accurate skb truesize ... Browse Code »

skb truesize currently accounts for sk_buff struct and part of skb head.
kmalloc() roundings are also ignored.

Considering that skb_shared_info is larger than sk_buff, its time to
take it into account for better memory accounting.

This patch introduces SKB_TRUESIZE(X) macro to centralize various
assumptions into a single place.

At skb alloc phase, we put skb_shared_info struct at the exact end of
skb head, to allow a better use of memory (lowering number of
reallocations), since kmalloc() gives us power-of-two memory blocks.

Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are
aligned to cache lines, as before.

Note: This patch might trigger performance regressions because of
misconfigured protocol stacks, hitting per socket or global memory
limits that were previously not reached. But its a necessary step for a
more accurate memory accounting.

Signed-off-by: Eric Dumazet
CC: Andi Kleen
CC: Ben Hutchings
Signed-off-by: David S. Miller

Eric Dumazet
2011-10-14 04:05:07 +0800

08 Oct, 2011

1 commit

8083f0fc9 net: use sock_valbool_flag to set/clear SOCK_RXQ_OVFL ... Browse Code »

There's no point in open-coding sock_valbool_flag().

Signed-off-by: Johannes Berg
Acked-by: Neil Horman
Signed-off-by: David S. Miller

Johannes Berg
2011-10-08 01:27:07 +0800

25 Aug, 2011

1 commit

ea2ab6937 net: convert core to skb paged frag APIs ... Browse Code »

Signed-off-by: Ian Campbell
Cc: "David S. Miller"
Cc: Eric Dumazet
Cc: "Michał Mirosław"
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller

Ian Campbell
2011-08-25 08:52:11 +0800

02 Aug, 2011

1 commit

a9b3cd7f3 rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER ... Browse Code »

When assigning a NULL value to an RCU protected pointer, no barrier
is needed. The rcu_assign_pointer, used to handle that but will soon
change to not handle the special case.

Convert all rcu_assign_pointer of NULL value.

//smpl
@@ expression P; @@

- rcu_assign_pointer(P, NULL)
+ RCU_INIT_POINTER(P, NULL)

//

Signed-off-by: Stephen Hemminger
Acked-by: Paul E. McKenney
Signed-off-by: David S. Miller

Stephen Hemminger
2011-08-02 19:29:23 +0800

08 Jul, 2011

1 commit

204d1641d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/… ... Browse Code »

…wireless-next-2.6 into for-davem

John W. Linville
2011-07-08 23:03:36 +0800

06 Jul, 2011

1 commit

c7fe3b52c NFC: add NFC socket family ... Browse Code »

Signed-off-by: Lauro Ramos Venancio
Signed-off-by: Aloisio Almeida Jr
Signed-off-by: John W. Linville

Aloisio Almeida Jr
2011-07-06 03:26:58 +0800

22 Jun, 2011

1 commit

3847ce32a core: add tracepoints for queueing skb to rcvbuf ... Browse Code »

This patch adds 2 tracepoints to get a status of a socket receive queue
and related parameter.

One tracepoint is added to sock_queue_rcv_skb. It records rcvbuf size
and its usage. The other tracepoint is added to __sk_mem_schedule and
it records limitations of memory for sockets and current usage.

By using these tracepoints we're able to know detailed reason why kernel
drop the packet.

Signed-off-by: Satoru Moriya
Acked-by: Neil Horman
Signed-off-by: David S. Miller

Satoru Moriya
2011-06-22 07:06:10 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

14 Jan, 2011

1 commit

27d189c02 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (46 commits)
hwrng: via_rng - Fix memory scribbling on some CPUs
crypto: padlock - Move padlock.h into include/crypto
hwrng: via_rng - Fix asm constraints
crypto: n2 - use __devexit not __exit in n2_unregister_algs
crypto: mark crypto workqueues CPU_INTENSIVE
crypto: mv_cesa - dont return PTR_ERR() of wrong pointer
crypto: ripemd - Set module author and update email address
crypto: omap-sham - backlog handling fix
crypto: gf128mul - Remove experimental tag
crypto: af_alg - fix af_alg memory_allocated data type
crypto: aesni-intel - Fixed build with binutils 2.16
crypto: af_alg - Make sure sk_security is initialized on accept()ed sockets
net: Add missing lockdep class names for af_alg
include: Install linux/if_alg.h for user-space crypto API
crypto: omap-aes - checkpatch --file warning fixes
crypto: omap-aes - initialize aes module once per request
crypto: omap-aes - unnecessary code removed
crypto: omap-aes - error handling implementation improved
crypto: omap-aes - redundant locking is removed
crypto: omap-aes - DMA initialization fixes for OMAP off mode
...

Linus Torvalds
2011-01-14 02:25:58 +0800

07 Jan, 2011

1 commit

2c6607c61 net: add POLLPRI to sock_def_readable() ... Browse Code »

Leonardo Chiquitto found poll() could block forever on tcp sockets and
Urgent data was received, if the event flag only contains POLLPRI.

He did a bisection and found commit 4938d7e0233 (poll: avoid extra
wakeups in select/poll) was the source of the problem.

Problem is TCP sockets use standard sock_def_readable() function for
their sk_data_ready() handler, and sock_def_readable() doesnt signal
POLLPRI.

Only TCP is affected by the problem. Adding POLLPRI to the list of flags
might trigger unnecessary schedules, but URGENT handling is such a
seldom used feature this seems a good compromise.

Thanks a lot to Leonardo for providing the bisection result and a test
program as well.

Reference : http://www.spinics.net/lists/netdev/msg151793.html

Reported-and-bisected-by: Leonardo Chiquitto
Signed-off-by: Eric Dumazet
Tested-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-01-07 02:54:29 +0800

18 Dec, 2010

1 commit

b4aa9e05a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/bnx2x/bnx2x.h
drivers/net/wireless/iwlwifi/iwl-1000.c
drivers/net/wireless/iwlwifi/iwl-6000.c
drivers/net/wireless/iwlwifi/iwl-core.h
drivers/vhost/vhost.c

David S. Miller
2010-12-18 04:27:22 +0800

17 Dec, 2010

1 commit

fcbdf09d9 net: fix nulls list corruptions in sk_prot_alloc ... Browse Code »

Special care is taken inside sk_port_alloc to avoid overwriting
skc_node/skc_nulls_node. We should also avoid overwriting
skc_bind_node/skc_portaddr_node.

The patch fixes the following crash:

BUG: unable to handle kernel paging request at fffffffffffffff0
IP: [] udp4_lib_lookup2+0xad/0x370
[] __udp4_lib_lookup+0x282/0x360
[] __udp4_lib_rcv+0x31e/0x700
[] ? ip_local_deliver_finish+0x65/0x190
[] ? ip_local_deliver+0x88/0xa0
[] udp_rcv+0x15/0x20
[] ip_local_deliver_finish+0x65/0x190
[] ip_local_deliver+0x88/0xa0
[] ip_rcv_finish+0x32d/0x6f0
[] ? netif_receive_skb+0x99c/0x11c0
[] ip_rcv+0x2bb/0x350
[] netif_receive_skb+0x99c/0x11c0

Signed-off-by: Leonard Crestez
Signed-off-by: Octavian Purdila
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Octavian Purdila
2010-12-17 06:26:56 +0800

10 Dec, 2010

1 commit

68835aba4 net: optimize INET input path further ... Browse Code »

Followup of commit b178bb3dfc30 (net: reorder struct sock fields)

Optimize INET input path a bit further, by :

1) moving sk_refcnt close to sk_lock.

This reduces number of dirtied cache lines by one on 64bit arches (and
64 bytes cache line size).

2) moving inet_daddr & inet_rcv_saddr at the beginning of sk

(same cache line than hash / family / bound_dev_if / nulls_node)

This reduces number of accessed cache lines in lookups by one, and dont
increase size of inet and timewait socks.
inet and tw sockets now share same place-holder for these fields.

Before patch :

offsetof(struct sock, sk_refcnt) = 0x10
offsetof(struct sock, sk_lock) = 0x40
offsetof(struct sock, sk_receive_queue) = 0x60
offsetof(struct inet_sock, inet_daddr) = 0x270
offsetof(struct inet_sock, inet_rcv_saddr) = 0x274

After patch :

offsetof(struct sock, sk_refcnt) = 0x44
offsetof(struct sock, sk_lock) = 0x48
offsetof(struct sock, sk_receive_queue) = 0x68
offsetof(struct inet_sock, inet_daddr) = 0x0
offsetof(struct inet_sock, inet_rcv_saddr) = 0x4

compute_score() (udp or tcp) now use a single cache line per ignored
item, instead of two.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-12-10 12:05:58 +0800

08 Dec, 2010

1 commit

6f107b586 net: Add missing lockdep class names for af_alg ... Browse Code »

Signed-off-by: Miloslav Trmač
Signed-off-by: Herbert Xu

Miloslav Trmač
2010-12-08 14:35:34 +0800

11 Nov, 2010

1 commit

8d987e5c7 net: avoid limits overflow ... Browse Code »

Robin Holt tried to boot a 16TB machine and found some limits were
reached : sysctl_tcp_mem[2], sysctl_udp_mem[2]

We can switch infrastructure to use long "instead" of "int", now
atomic_long_t primitives are available for free.

Signed-off-by: Eric Dumazet
Reported-by: Robin Holt
Reviewed-by: Robin Holt
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller

Eric Dumazet
2010-11-11 04:12:00 +0800

26 Oct, 2010

1 commit

0d7da9ddd net: add __rcu annotation to sk_filter ... Browse Code »

Add __rcu annotation to :
(struct sock)->sk_filter

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-10-26 05:18:28 +0800

24 Oct, 2010

1 commit

5f05647dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits)
bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
vlan: Calling vlan_hwaccel_do_receive() is always valid.
tproxy: use the interface primary IP address as a default value for --on-ip
tproxy: added IPv6 support to the socket match
cxgb3: function namespace cleanup
tproxy: added IPv6 support to the TPROXY target
tproxy: added IPv6 socket lookup function to nf_tproxy_core
be2net: Changes to use only priority codes allowed by f/w
tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
tproxy: added tproxy sockopt interface in the IPV6 layer
tproxy: added udp6_lib_lookup function
tproxy: added const specifiers to udp lookup functions
tproxy: split off ipv6 defragmentation to a separate module
l2tp: small cleanup
nf_nat: restrict ICMP translation for embedded header
can: mcp251x: fix generation of error frames
can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set
can-raw: add msg_flags to distinguish local traffic
9p: client code cleanup
rds: make local functions/variables static
...

Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and
drivers/net/wireless/ath/ath9k/debug.c as per David

Linus Torvalds
2010-10-24 02:47:02 +0800

08 Oct, 2010

1 commit

1144182a8 net: suppress RCU lockdep false positive in sock_update_classid ... Browse Code »

> ===================================================
> [ INFO: suspicious rcu_dereference_check() usage. ]
> ---------------------------------------------------
> include/linux/cgroup.h:542 invoked rcu_dereference_check() without protection!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 1, debug_locks = 0
> 1 lock held by swapper/1:
> #0: (net_mutex){+.+.+.}, at: []
> register_pernet_subsys+0x1f/0x47
>
> stack backtrace:
> Pid: 1, comm: swapper Not tainted 2.6.35.4-28.fc14.x86_64 #1
> Call Trace:
> [] lockdep_rcu_dereference+0xaa/0xb3
> [] sock_update_classid+0x7c/0xa2
> [] sk_alloc+0x6b/0x77
> [] __netlink_create+0x37/0xab
> [] ? rtnetlink_rcv+0x0/0x2d
> [] netlink_kernel_create+0x74/0x19d
> [] ? __mutex_lock_common+0x339/0x35b
> [] rtnetlink_net_init+0x2e/0x48
> [] ops_init+0xe9/0xff
> [] register_pernet_operations+0xab/0x130
> [] register_pernet_subsys+0x2e/0x47
> [] rtnetlink_init+0x53/0x102
> [] netlink_proto_init+0x126/0x143
> [] ? netlink_proto_init+0x0/0x143
> [] do_one_initcall+0x72/0x186
> [] kernel_init+0x23b/0x2c9
> [] kernel_thread_helper+0x4/0x10
> [] ? restore_args+0x0/0x30
> [] ? kernel_init+0x0/0x2c9
> [] ? kernel_thread_helper+0x0/0x10

The sock_update_classid() function calls task_cls_classid(current),
but the calling task cannot go away, so there is no danger of
the associated structures disappearing. Insert an RCU read-side
critical section to suppress the false positive.

Reported-by: Subrata Modak
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2010-10-08 01:02:28 +0800

27 Sep, 2010

1 commit

e40051d13 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:
drivers/net/qlcnic/qlcnic_init.c
net/ipv4/ip_output.c

David S. Miller
2010-09-27 16:03:03 +0800

25 Sep, 2010

1 commit

f064af1e5 net: fix a lockdep splat ... Browse Code »

We have for each socket :

One spinlock (sk_slock.slock)
One rwlock (sk_callback_lock)

Possible scenarios are :

(A) (this is used in net/sunrpc/xprtsock.c)
read_lock(&sk->sk_callback_lock) (without blocking BH)

spin_lock(&sk->sk_slock.slock);
...
read_lock(&sk->sk_callback_lock);
...

(B)
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)

(C)
spin_lock_bh(&sk->sk_slock)
...
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)
spin_unlock_bh(&sk->sk_slock)

This (C) case conflicts with (A) :

CPU1 [A] CPU2 [C]
read_lock(callback_lock)
spin_lock_bh(slock)

We have one problematic (C) use case in inet_csk_listen_stop() :

local_bh_disable();
bh_lock_sock(child); // spin_lock_bh(&sk->sk_slock)
WARN_ON(sock_owned_by_user(child));
...
sock_orphan(child); // write_lock_bh(&sk->sk_callback_lock)

lockdep is not happy with this, as reported by Tetsuo Handa

It seems only way to deal with this is to use read_lock_bh(callbacklock)
everywhere.

Thanks to Jarek for pointing a bug in my first attempt and suggesting
this solution.

Reported-by: Tetsuo Handa
Tested-by: Tetsuo Handa
Signed-off-by: Eric Dumazet
CC: Jarek Poplawski
Tested-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-09-25 13:26:10 +0800

10 Sep, 2010

1 commit

f39234d60 net/core: add lock context change annotations in net/core/sock.c ... Browse Code »

__lock_sock() and __release_sock() releases and regrabs lock but
were missing proper annotations. Add it. This removes following
warning from sparse. (Currently __lock_sock() does not emit any
warning about it but I think it is better to add also.)

net/core/sock.c:1580:17: warning: context imbalance in '__release_sock' - unexpected unlock

Signed-off-by: Namhyung Kim
Signed-off-by: David S. Miller

Namhyung Kim
2010-09-10 06:02:39 +0800

20 Jul, 2010

1 commit

d6d9ca0fe net: this_cpu_xxx conversions ... Browse Code »

Use modern this_cpu_xxx() api, saving few bytes on x86

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-20 06:12:51 +0800

13 Jul, 2010

1 commit

d361fd599 net: sock_free() optimizations ... Browse Code »

Avoid two extra instructions in sock_free(), to reload
skb->truesize and skb->sk

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-07-13 11:21:46 +0800

17 Jun, 2010

2 commits

3924773a5 net: Export cred_to_ucred to modules. ... Browse Code »

AF_UNIX references this, and can be built as a module,
so...

Signed-off-by: David S. Miller

David S. Miller
2010-06-17 07:18:25 +0800
109f6e39f af_unix: Allow SO_PEERCRED to work across namespaces. ... Browse Code »

Use struct pid and struct cred to store the peer credentials on struct
sock. This gives enough information to convert the peer credential
information to a value relative to whatever namespace the socket is in
at the time.

This removes nasty surprises when using SO_PEERCRED on socket
connetions where the processes on either side are in different pid and
user namespaces.

Signed-off-by: Eric W. Biederman
Acked-by: Daniel Lezcano
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Eric W. Biederman
2010-06-17 05:55:55 +0800