Doug / smarc-fsl-linux-kernel | Embedian Git Server

21 Nov, 2013

1 commit

f3d334260 net: rework recvmsg handler msg_name and msg_namelen logic ... Browse Code »

This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
set msg_namelen to the proper size
Suggested-by: Eric Dumazet
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-11-21 10:52:30 +0800

26 Oct, 2013

1 commit

f84be2bd9 net: make net_get_random_once irq safe ... Browse Code »

I initial build non irq safe version of net_get_random_once because I
would liked to have the freedom to defer even the extraction process of
get_random_bytes until the nonblocking pool is fully seeded.

I don't think this is a good idea anymore and thus this patch makes
net_get_random_once irq safe. Now someone using net_get_random_once does
not need to care from where it is called.

Cc: David S. Miller
Cc: Eric Dumazet
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-10-26 07:03:39 +0800

22 Oct, 2013

1 commit

c68c7f5a8 net: fix build warnings because of net_get_random_once merge ... Browse Code »

This patch fixes the following warning:

In file included from include/linux/skbuff.h:27:0,
from include/linux/netfilter.h:5,
from include/net/netns/netfilter.h:5,
from include/net/net_namespace.h:20,
from include/linux/init_task.h:14,
from init/init_task.c:1:
include/linux/net.h:243:14: warning: 'struct static_key' declared inside parameter list [enabled by default]
struct static_key *done_key);

on x86_64 allnoconfig, um defconfig and ia64 allmodconfig and maybe others as well.

Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-10-22 04:27:03 +0800

20 Oct, 2013

1 commit

a48e42920 net: introduce new macro net_get_random_once ... Browse Code »

net_get_random_once is a new macro which handles the initialization
of secret keys. It is possible to call it in the fast path. Only the
initialization depends on the spinlock and is rather slow. Otherwise
it should get used just before the key is used to delay the entropy
extration as late as possible to get better randomness. It returns true
if the key got initialized.

The usage of static_keys for net_get_random_once is a bit uncommon so
it needs some further explanation why this actually works:

=== In the simple non-HAVE_JUMP_LABEL case we actually have ===
no constrains to use static_key_(true|false) on keys initialized with
STATIC_KEY_INIT_(FALSE|TRUE). So this path just expands in favor of
the likely case that the initialization is already done. The key is
initialized like this:

___done_key = { .enabled = ATOMIC_INIT(0) }

The check

if (!static_key_true(&___done_key)) \

expands into (pseudo code)

if (!likely(___done_key > 0))

, so we take the fast path as soon as ___done_key is increased from the
helper function.

=== If HAVE_JUMP_LABELs are available this depends ===
on patching of jumps into the prepared NOPs, which is done in
jump_label_init at boot-up time (from start_kernel). It is forbidden
and dangerous to use net_get_random_once in functions which are called
before that!

At compilation time NOPs are generated at the call sites of
net_get_random_once. E.g. net/ipv6/inet6_hashtable.c:inet6_ehashfn (we
need to call net_get_random_once two times in inet6_ehashfn, so two NOPs):

71: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
76: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

Both will be patched to the actual jumps to the end of the function to
call __net_get_random_once at boot time as explained above.

arch_static_branch is optimized and inlined for false as return value and
actually also returns false in case the NOP is placed in the instruction
stream. So in the fast case we get a "return false". But because we
initialize ___done_key with (enabled != (entries & 1)) this call-site
will get patched up at boot thus returning true. The final check looks
like this:

if (!static_key_true(&___done_key)) \
___ret = __net_get_random_once(buf, \

expands to

if (!!static_key_false(&___done_key)) \
___ret = __net_get_random_once(buf, \

So we get true at boot time and as soon as static_key_slow_inc is called
on the key it will invert the logic and return false for the fast path.
static_key_slow_inc will change the branch because it got initialized
with .enabled == 0. After static_key_slow_inc is called on the key the
branch is replaced with a nop again.

=== Misc: ===
The helper defers the increment into a workqueue so we don't
have problems calling this code from atomic sections. A seperate boolean
(___done) guards the case where we enter net_get_random_once again before
the increment happend.

Cc: Ingo Molnar
Cc: Steven Rostedt
Cc: Jason Baron
Cc: Peter Zijlstra
Cc: Eric Dumazet
Cc: "David S. Miller"
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
2013-10-20 07:45:35 +0800

27 Sep, 2013

1 commit

7965bd4d7 net.h/skbuff.h: Remove extern from function prototypes ... Browse Code »

There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches

Joe Perches
2013-09-27 05:53:19 +0800

05 Jun, 2013

1 commit

0e9649c14 net: do not manually initialize enumerators ... Browse Code »

Clean up unnecessary initialization of enumerators as the compiler takes
care of that.

Signed-off-by: Jean Sacren
Signed-off-by: David S. Miller

Jean Sacren
2013-06-05 06:17:38 +0800

30 Apr, 2013

1 commit

8d564368a net: rename random32 to prandom ... Browse Code »

Commit 496f2f93b1cc ("random32: rename random32 to prandom") renamed
random32() and srandom32() to prandom_u32() and prandom_seed()
respectively.

net_random() and net_srandom() need to be redefined with prandom_* in
order to finish the naming transition.

While I'm at it, enclose macro argument of net_srandom() with parenthesis.

Signed-off-by: Akinobu Mita
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2013-04-30 09:28:44 +0800

13 Oct, 2012

1 commit

607ca46e9 UAPI: (Scripted) Disintegrate include/linux ... Browse Code »

Signed-off-by: David Howells
Acked-by: Arnd Bergmann
Acked-by: Thomas Gleixner
Acked-by: Michael Kerrisk
Acked-by: Paul E. McKenney
Acked-by: Dave Jones

David Howells
2012-10-13 17:46:48 +0800

03 Oct, 2012

1 commit

aab174f0d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs update from Al Viro:

- big one - consolidation of descriptor-related logics; almost all of
that is moved to fs/file.c

(BTW, I'm seriously tempted to rename the result to fd.c. As it is,
we have a situation when file_table.c is about handling of struct
file and file.c is about handling of descriptor tables; the reasons
are historical - file_table.c used to be about a static array of
struct file we used to have way back).

A lot of stray ends got cleaned up and converted to saner primitives,
disgusting mess in android/binder.c is still disgusting, but at least
doesn't poke so much in descriptor table guts anymore. A bunch of
relatively minor races got fixed in process, plus an ext4 struct file
leak.

- related thing - fget_light() partially unuglified; see fdget() in
there (and yes, it generates the code as good as we used to have).

- also related - bits of Cyrill's procfs stuff that got entangled into
that work; _not_ all of it, just the initial move to fs/proc/fd.c and
switch of fdinfo to seq_file.

- Alex's fs/coredump.c spiltoff - the same story, had been easier to
take that commit than mess with conflicts. The rest is a separate
pile, this was just a mechanical code movement.

- a few misc patches all over the place. Not all for this cycle,
there'll be more (and quite a few currently sit in akpm's tree)."

Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
MAX_LFS_FILESIZE should be a loff_t
compat: fs: Generic compat_sys_sendfile implementation
fs: push rcu_barrier() from deactivate_locked_super() to filesystems
btrfs: reada_extent doesn't need kref for refcount
coredump: move core dump functionality into its own file
coredump: prevent double-free on an error path in core dumper
usb/gadget: fix misannotations
fcntl: fix misannotations
ceph: don't abuse d_delete() on failure exits
hypfs: ->d_parent is never NULL or negative
vfs: delete surplus inode NULL check
switch simple cases of fget_light to fdget
new helpers: fdget()/fdput()
switch o2hb_region_dev_write() to fget_light()
proc_map_files_readdir(): don't bother with grabbing files
make get_file() return its argument
vhost_set_vring(): turn pollstart/pollstop into bool
switch prctl_set_mm_exe_file() to fget_light()
switch xfs_find_handle() to fget_light()
switch xfs_swapext() to fget_light()
...

Linus Torvalds
2012-10-03 11:25:04 +0800

27 Sep, 2012

1 commit

56b31d1c9 unexport sock_map_fd(), switch to sock_alloc_file() ... Browse Code »

Both modular callers of sock_map_fd() had been buggy; sctp one leaks
descriptor and file if copy_to_user() fails, 9p one shouldn't be
exposing file in the descriptor table at all.

Switch both to sock_alloc_file(), export it, unexport sock_map_fd() and
make it static.

Signed-off-by: Al Viro

Al Viro
2012-09-27 09:08:50 +0800

23 Jul, 2012

1 commit

406a3c638 net: netprio_cgroup: rework update socket logic ... Browse Code »

Instead of updating the sk_cgrp_prioidx struct field on every send
this only updates the field when a task is moved via cgroup
infrastructure.

This allows sockets that may be used by a kernel worker thread
to be managed. For example in the iscsi case today a user can
put iscsid in a netprio cgroup and control traffic will be sent
with the correct sk_cgrp_prioidx value set but as soon as data
is sent the kernel worker thread isssues a send and sk_cgrp_prioidx
is updated with the kernel worker threads value which is the
default case.

It seems more correct to only update the field when the user
explicitly sets it via control group infrastructure. This allows
the users to manage sockets that may be used with other threads.

Signed-off-by: John Fastabend
Acked-by: Neil Horman
Signed-off-by: David S. Miller

John Fastabend
2012-07-23 03:44:01 +0800

21 Jul, 2012

1 commit

b09e786bd tun: fix a crash bug and a memory leak ... Browse Code »

This patch fixes a crash
tun_chr_close -> netdev_run_todo -> tun_free_netdev -> sk_release_kernel ->
sock_release -> iput(SOCK_INODE(sock))
introduced by commit 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d

The problem is that this socket is embedded in struct tun_struct, it has
no inode, iput is called on invalid inode, which modifies invalid memory
and optionally causes a crash.

sock_release also decrements sockets_in_use, this causes a bug that
"sockets: used" field in /proc/*/net/sockstat keeps on decreasing when
creating and closing tun devices.

This patch introduces a flag SOCK_EXTERNALLY_ALLOCATED that instructs
sock_release to not free the inode and not decrement sockets_in_use,
fixing both memory corruption and sockets_in_use underflow.

It should be backported to 3.3 an 3.4 stabke.

Signed-off-by: Mikulas Patocka
Cc: stable@kernel.org
Signed-off-by: David S. Miller

Mikulas Patocka
2012-07-21 02:21:06 +0800

30 May, 2012

1 commit

2033e9bf0 net: add MODULE_ALIAS_NET_PF_PROTO_NAME ... Browse Code »

The MODULE_ALAIS_NET_PF macro set is missing a variant that allows for the
appending of an arbitrary string to the net-pf--proto- base. while
MODULE_ALIAS_NET_PF_PROTO_NAME_TYPE allows an appending of a numerical type, we
need to be able to append a generic string to support generic netlink families
that have neither a fix numberical protocol nor type number

Signed-off-by: Neil Horman
CC: Eric Dumazet
CC: David Miller
Signed-off-by: David S. Miller

Neil Horman
2012-05-30 10:33:55 +0800

16 May, 2012

1 commit

3a3bfb61e net: Add net_ratelimited_function and net_<level>_ratelimited macros ... Browse Code »

__ratelimit() can be considered an inverted bool test because
it returns true when not ratelimited. Several tests in the
kernel tree use this __ratelimit() function incorrectly.

No net_ratelimit uses are incorrect currently though.

Most uses of net_ratelimit are to log something via printk or
pr_.

In order to minimize the uses of net_ratelimit, and to start
standardizing the code style used for __ratelimit() and net_ratelimit(),
add a net_ratelimited_function() macro and net__ratelimited()
logging macros similar to pr__ratelimited that use the global
net_ratelimit instead of a static per call site "struct ratelimit_state".

Signed-off-by: Joe Perches
Signed-off-by: David S. Miller

Joe Perches
2012-05-16 01:45:02 +0800

22 Feb, 2012

1 commit

ef64a54f6 sock: Introduce the SO_PEEK_OFF sock option ... Browse Code »

This one specifies where to start MSG_PEEK-ing queue data from. When
set to negative value means that MSG_PEEK works as ususally -- peeks
from the head of the queue always.

When some bytes are peeked from queue and the peeking offset is non
negative it is moved forward so that the next peek will return next
portion of data.

When non-peeking recvmsg occurs and the peeking offset is non negative
is is moved backward so that the next peek will still peek the proper
data (i.e. the one that would have been picked if there were no non
peeking recv in between).

The offset is set using per-proto opteration to let the protocol handle
the locking issues and to check whether the peeking offset feature is
supported by the protocol the socket belongs to.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2012-02-22 04:03:48 +0800

28 May, 2011

1 commit

c5c177b4a net: Kill ratelimit.h dependency in linux/net.h ... Browse Code »

Ingo Molnar noticed that we have this unnecessary ratelimit.h
dependency in linux/net.h, which hid compilation problems from
people doing builds only with CONFIG_NET enabled.

Move this stuff out to a seperate net/net_ratelimit.h file and
include that in the only two places where this thing is needed.

Signed-off-by: David S. Miller
Acked-by: Ingo Molnar

David S. Miller
2011-05-28 01:41:33 +0800

06 May, 2011

1 commit

228e548e6 net: Add sendmmsg socket system call ... Browse Code »

This patch adds a multiple message send syscall and is the send
version of the existing recvmmsg syscall. This is heavily
based on the patch by Arnaldo that added recvmmsg.

I wrote a microbenchmark to test the performance gains of using
this new syscall:

http://ozlabs.org/~anton/junkcode/sendmmsg_test.c

The test was run on a ppc64 box with a 10 Gbit network card. The
benchmark can send both UDP and RAW ethernet packets.

64B UDP

batch pkts/sec
1 804570
2 872800 (+ 8 %)
4 916556 (+14 %)
8 939712 (+17 %)
16 952688 (+18 %)
32 956448 (+19 %)
64 964800 (+20 %)

64B raw socket

batch pkts/sec
1 1201449
2 1350028 (+12 %)
4 1461416 (+22 %)
8 1513080 (+26 %)
16 1541216 (+28 %)
32 1553440 (+29 %)
64 1557888 (+30 %)

We see a 20% improvement in throughput on UDP send and 30%
on raw socket send.

[ Add sparc syscall entries. -DaveM ]

Signed-off-by: Anton Blanchard
Signed-off-by: David S. Miller

Anton Blanchard
2011-05-06 02:10:14 +0800

23 Feb, 2011

1 commit

eaefd1105 net: add __rcu annotations to sk_wq and wq ... Browse Code »

Add proper RCU annotations/verbs to sk_wq and wq members

Fix __sctp_write_space() sk_sleep() abuse (and sock->wq access)

Fix sunrpc sk_sleep() abuse too

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2011-02-23 02:19:31 +0800

02 Oct, 2010

1 commit

721db93a5 net: Export __sock_create ... Browse Code »

Signed-off-by: Pavel Emelyanov
Acked-by: David S. Miller
Signed-off-by: J. Bruce Fields

Pavel Emelyanov
2010-10-02 05:18:59 +0800

03 Jul, 2010

1 commit

e2aec372f linux/net.h: fix kernel-doc warnings ... Browse Code »

Fix kernel-doc warnings in linux/net.h:

Warning(include/linux/net.h:151): No description found for parameter 'wq'
Warning(include/linux/net.h:151): Excess struct/union/enum/typedef member 'fasync_list' description in 'socket'
Warning(include/linux/net.h:151): Excess struct/union/enum/typedef member 'wait' description in 'socket'

Signed-off-by: Randy Dunlap
Signed-off-by: David S. Miller

Randy Dunlap
2010-07-03 12:59:08 +0800

02 May, 2010

1 commit

438154823 net: sock_def_readable() and friends RCU conversion ... Browse Code »

sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
need two atomic operations (and associated dirtying) per incoming
packet.

RCU conversion is pretty much needed :

1) Add a new structure, called "struct socket_wq" to hold all fields
that will need rcu_read_lock() protection (currently: a
wait_queue_head_t and a struct fasync_struct pointer).

[Future patch will add a list anchor for wakeup coalescing]

2) Attach one of such structure to each "struct socket" created in
sock_alloc_inode().

3) Respect RCU grace period when freeing a "struct socket_wq"

4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
socket_wq"

5) Change sk_sleep() function to use new sk->sk_wq instead of
sk->sk_sleep

6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
a rcu_read_lock() section.

7) Change all sk_has_sleeper() callers to :
- Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
- Use wq_has_sleeper() to eventually wakeup tasks.
- Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)

8) sock_wake_async() is modified to use rcu protection as well.

9) Exceptions :
macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
instead of dynamically allocated ones. They dont need rcu freeing.

Some cleanups or followups are probably needed, (possible
sk_callback_lock conversion to a spinlock for example...).

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2010-05-02 06:00:15 +0800

04 Feb, 2010

1 commit

1621e0940 net: CONFIG_COMPAT redux ... Browse Code »

Ifdef out
struct proto_ops::compat_ioctl
struct proto_ops::compat_setsockopt
struct proto_ops::compat_getsockopt
to make structures smaller on COMPAT=n kernels.

Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller

Alexey Dobriyan
2010-02-04 12:32:28 +0800

06 Dec, 2009

2 commits

28b4d5cc1 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ ... Browse Code »

Conflicts:
drivers/net/pcmcia/fmvj18x_cs.c
drivers/net/pcmcia/nmclan_cs.c
drivers/net/pcmcia/xirc2ps_cs.c
drivers/net/wireless/ray_cs.c

David S. Miller
2009-12-06 07:22:26 +0800
d0b093a8b Merge branch 'core-printk-for-linus' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/tip/linux-2.6-tip

* 'core-printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ratelimit: Make suppressed output messages more useful
printk: Remove ratelimit.h from kernel.h
ratelimit: Fix/allow use in atomic contexts
ratelimit: Use per ratelimit context locking

Linus Torvalds
2009-12-06 01:50:22 +0800

07 Nov, 2009

1 commit

b215c57dc net: kill proto_ops wrapper ... Browse Code »

All users of wrapped proto_ops are now gone, so we can safely remove
the wrappers as well.

Cc: David S. Miller
Cc: netdev@vger.kernel.org
Signed-off-by: Arnd Bergmann
Signed-off-by: David S. Miller

Arnd Bergmann
2009-11-07 16:46:40 +0800

06 Nov, 2009

1 commit

3f378b684 net: pass kern to net_proto_family create function ... Browse Code »

The generic __sock_create function has a kern argument which allows the
security system to make decisions based on if a socket is being created by
the kernel or by userspace. This patch passes that flag to the
net_proto_family specific create function, so it can do the same thing.

Signed-off-by: Eric Paris
Acked-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller

Eric Paris
2009-11-06 14:18:14 +0800

29 Oct, 2009

1 commit

38bfd8f5b net,socket: introduce DECLARE_SOCKADDR helper to catch overflow at build time ... Browse Code »

proto_ops->getname implies copying protocol specific data
into storage unit (particulary to __kernel_sockaddr_storage).
So when we implement new protocol support we should keep such
a detail in mind (which is easy to forget about).

Lets introduce DECLARE_SOCKADDR helper which check if
storage unit is not overfowed at build time.

Eventually inet_getname is switched to use DECLARE_SOCKADDR
(to show example of usage).

Signed-off-by: Cyrill Gorcunov
Signed-off-by: David S. Miller

Cyrill Gorcunov
2009-10-29 18:00:06 +0800

13 Oct, 2009

1 commit

a2e272554 net: Introduce recvmmsg socket syscall ... Browse Code »

Meaning receive multiple messages, reducing the number of syscalls and
net stack entry/exit operations.

Next patches will introduce mechanisms where protocols that want to
optimize this operation will provide an unlocked_recvmsg operation.

This takes into account comments made by:

. Paul Moore: sock_recvmsg is called only for the first datagram,
sock_recvmsg_nosec is used for the rest.

. Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
works in the same fashion as the ppoll one.

If the underlying protocol returns a datagram with MSG_OOB set, this
will make recvmmsg return right away with as many datagrams (+ the OOB
one) it has received so far.

. Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
datagrams and then recvmsg returns an error, recvmmsg will return
the successfully received datagrams, store the error and return it
in the next call.

This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
where we will be able to acquire the lock only at batch start and end, not at
every underlying recvmsg call.

Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller

Arnaldo Carvalho de Melo
2009-10-13 14:40:10 +0800

01 Oct, 2009

1 commit

b7058842c net: Make setsockopt() optlen be unsigned. ... Browse Code »

This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.

Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.

Signed-off-by: David S. Miller

David S. Miller
2009-10-01 07:12:20 +0800

22 Sep, 2009

1 commit

3fff4c42b printk: Remove ratelimit.h from kernel.h ... Browse Code »

Decouple kernel.h from ratelimit.h: the global declaration of
printk's ratelimit_state is not needed, and it leads to messy
circular dependencies due to ratelimit.h's (new) adding of a
spinlock_types.h include.

Cc: Peter Zijlstra
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: David S. Miller
LKML-Reference:
Signed-off-by: Ingo Molnar

Ingo Molnar
2009-09-22 22:18:09 +0800

15 Sep, 2009

1 commit

29a020d35 [PATCH] net: kmemcheck annotation in struct socket ... Browse Code »

struct socket has a 16 bit hole that triggers kmemcheck warnings.

As suggested by Ingo, use kmemcheck annotations

Signed-off-by: Eric Dumazet
Acked-by: Ingo Molnar
Signed-off-by: David S. Miller

Eric Dumazet
2009-09-15 17:39:20 +0800

16 Mar, 2009

1 commit

8bdd663ab net: reorder fields of struct socket ... Browse Code »

On x86_64, its rather unfortunate that "wait_queue_head_t wait"
field of "struct socket" spans two cache lines (assuming a 64
bytes cache line in current cpus)

offsetof(struct socket, wait)=0x30
sizeof(wait_queue_head_t)=0x18

This might explain why Kenny Chang noticed that his multicast workload
was performing bad with 64 bit kernels, since more cache lines ping pongs
were involved.

This litle patch moves "wait" field next "fasync_list" so that both
fields share a single cache line, to speedup sock_def_readable()

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-03-16 10:59:13 +0800

20 Nov, 2008

1 commit

de11defeb reintroduce accept4 ... Browse Code »

Introduce a new accept4() system call. The addition of this system call
matches analogous changes in 2.6.27 (dup3(), evenfd2(), signalfd4(),
inotify_init1(), epoll_create1(), pipe2()) which added new system calls
that differed from analogous traditional system calls in adding a flags
argument that can be used to access additional functionality.

The accept4() system call is exactly the same as accept(), except that
it adds a flags bit-mask argument. Two flags are initially implemented.
(Most of the new system calls in 2.6.27 also had both of these flags.)

SOCK_CLOEXEC causes the close-on-exec (FD_CLOEXEC) flag to be enabled
for the new file descriptor returned by accept4(). This is a useful
security feature to avoid leaking information in a multithreaded
program where one thread is doing an accept() at the same time as
another thread is doing a fork() plus exec(). More details here:
http://udrepper.livejournal.com/20407.html "Secure File Descriptor Handling",
Ulrich Drepper).

The other flag is SOCK_NONBLOCK, which causes the O_NONBLOCK flag
to be enabled on the new open file description created by accept4().
(This flag is merely a convenience, saving the use of additional calls
fcntl(F_GETFL) and fcntl (F_SETFL) to achieve the same result.

Here's a test program. Works on x86-32. Should work on x86-64, but
I (mtk) don't have a system to hand to test with.

It tests accept4() with each of the four possible combinations of
SOCK_CLOEXEC and SOCK_NONBLOCK set/clear in 'flags', and verifies
that the appropriate flags are set on the file descriptor/open file
description returned by accept4().

I tested Ulrich's patch in this thread by applying against 2.6.28-rc2,
and it passes according to my test program.

/* test_accept4.c

Copyright (C) 2008, Linux Foundation, written by Michael Kerrisk

Licensed under the GNU GPLv2 or later.
*/
#define _GNU_SOURCE
#include
#include
#include
#include
#include
#include
#include
#include

#define PORT_NUM 33333

#define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)

/**********************************************************************/

/* The following is what we need until glibc gets a wrapper for
accept4() */

/* Flags for socket(), socketpair(), accept4() */
#ifndef SOCK_CLOEXEC
#define SOCK_CLOEXEC O_CLOEXEC
#endif
#ifndef SOCK_NONBLOCK
#define SOCK_NONBLOCK O_NONBLOCK
#endif

#ifdef __x86_64__
#define SYS_accept4 288
#elif __i386__
#define USE_SOCKETCALL 1
#define SYS_ACCEPT4 18
#else
#error "Sorry -- don't know the syscall # on this architecture"
#endif

static int
accept4(int fd, struct sockaddr *sockaddr, socklen_t *addrlen, int flags)
{
printf("Calling accept4(): flags = %x", flags);
if (flags != 0) {
printf(" (");
if (flags & SOCK_CLOEXEC)
printf("SOCK_CLOEXEC");
if ((flags & SOCK_CLOEXEC) && (flags & SOCK_NONBLOCK))
printf(" ");
if (flags & SOCK_NONBLOCK)
printf("SOCK_NONBLOCK");
printf(")");
}
printf("\n");

#if USE_SOCKETCALL
long args[6];

args[0] = fd;
args[1] = (long) sockaddr;
args[2] = (long) addrlen;
args[3] = flags;

return syscall(SYS_socketcall, SYS_ACCEPT4, args);
#else
return syscall(SYS_accept4, fd, sockaddr, addrlen, flags);
#endif
}

/**********************************************************************/

static int
do_test(int lfd, struct sockaddr_in *conn_addr,
int closeonexec_flag, int nonblock_flag)
{
int connfd, acceptfd;
int fdf, flf, fdf_pass, flf_pass;
struct sockaddr_in claddr;
socklen_t addrlen;

printf("=======================================\n");

connfd = socket(AF_INET, SOCK_STREAM, 0);
if (connfd == -1)
die("socket");
if (connect(connfd, (struct sockaddr *) conn_addr,
sizeof(struct sockaddr_in)) == -1)
die("connect");

addrlen = sizeof(struct sockaddr_in);
acceptfd = accept4(lfd, (struct sockaddr *) &claddr, &addrlen,
closeonexec_flag | nonblock_flag);
if (acceptfd == -1) {
perror("accept4()");
close(connfd);
return 0;
}

fdf = fcntl(acceptfd, F_GETFD);
if (fdf == -1)
die("fcntl:F_GETFD");
fdf_pass = ((fdf & FD_CLOEXEC) != 0) ==
((closeonexec_flag & SOCK_CLOEXEC) != 0);
printf("Close-on-exec flag is %sset (%s); ",
(fdf & FD_CLOEXEC) ? "" : "not ",
fdf_pass ? "OK" : "failed");

flf = fcntl(acceptfd, F_GETFL);
if (flf == -1)
die("fcntl:F_GETFD");
flf_pass = ((flf & O_NONBLOCK) != 0) ==
((nonblock_flag & SOCK_NONBLOCK) !=0);
printf("nonblock flag is %sset (%s)\n",
(flf & O_NONBLOCK) ? "" : "not ",
flf_pass ? "OK" : "failed");

close(acceptfd);
close(connfd);

printf("Test result: %s\n", (fdf_pass && flf_pass) ? "PASS" : "FAIL");
return fdf_pass && flf_pass;
}

static int
create_listening_socket(int port_num)
{
struct sockaddr_in svaddr;
int lfd;
int optval;

memset(&svaddr, 0, sizeof(struct sockaddr_in));
svaddr.sin_family = AF_INET;
svaddr.sin_addr.s_addr = htonl(INADDR_ANY);
svaddr.sin_port = htons(port_num);

lfd = socket(AF_INET, SOCK_STREAM, 0);
if (lfd == -1)
die("socket");

optval = 1;
if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval,
sizeof(optval)) == -1)
die("setsockopt");

if (bind(lfd, (struct sockaddr *) &svaddr,
sizeof(struct sockaddr_in)) == -1)
die("bind");

if (listen(lfd, 5) == -1)
die("listen");

return lfd;
}

int
main(int argc, char *argv[])
{
struct sockaddr_in conn_addr;
int lfd;
int port_num;
int passed;

passed = 1;

port_num = (argc > 1) ? atoi(argv[1]) : PORT_NUM;

memset(&conn_addr, 0, sizeof(struct sockaddr_in));
conn_addr.sin_family = AF_INET;
conn_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
conn_addr.sin_port = htons(port_num);

lfd = create_listening_socket(port_num);

if (!do_test(lfd, &conn_addr, 0, 0))
passed = 0;
if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, 0))
passed = 0;
if (!do_test(lfd, &conn_addr, 0, SOCK_NONBLOCK))
passed = 0;
if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, SOCK_NONBLOCK))
passed = 0;

close(lfd);

exit(passed ? EXIT_SUCCESS : EXIT_FAILURE);
}

[mtk.manpages@gmail.com: rewrote changelog, updated test program]
Signed-off-by: Ulrich Drepper
Tested-by: Michael Kerrisk
Acked-by: Michael Kerrisk
Cc:
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2008-11-20 10:49:57 +0800

27 Aug, 2008

1 commit

5770a3fb5 Fix userspace export of <linux/net.h> ... Browse Code »

Including in the user-visible part of this header has
caused build regressions with headers from 2.6.27-rc. Move it down to
the #ifdef __KERNEL__ part, which is the only place it's needed. Move
some other kernel-only things down there too, while we're at it.

Signed-off-by: David Woodhouse
Signed-off-by: Linus Torvalds

David Woodhouse
2008-08-27 01:37:20 +0800

26 Jul, 2008

1 commit

717115e1a printk ratelimiting rewrite ... Browse Code »

All ratelimit user use same jiffies and burst params, so some messages
(callbacks) will be lost.

For example:
a call printk_ratelimit(5 * HZ, 1)
b call printk_ratelimit(5 * HZ, 1) before the 5*HZ timeout of a, then b will
will be supressed.

- rewrite __ratelimit, and use a ratelimit_state as parameter. Thanks for
hints from andrew.

- Add WARN_ON_RATELIMIT, update rcupreempt.h

- remove __printk_ratelimit

- use __ratelimit in net_ratelimit

Signed-off-by: Dave Young
Cc: "David S. Miller"
Cc: "Paul E. McKenney"
Cc: Dave Young
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Young
2008-07-26 01:53:29 +0800

25 Jul, 2008

4 commits

77d272005 flag parameters: NONBLOCK in socket and socketpair ... Browse Code »

This patch introduces support for the SOCK_NONBLOCK flag in socket,
socketpair, and paccept. To do this the internal function sock_attach_fd
gets an additional parameter which it uses to set the appropriate flag for
the file descriptor.

Given that in modern, scalable programs almost all socket connections are
non-blocking and the minimal additional cost for the new functionality
I see no reason not to add this code.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include
#include
#include
#include
#include
#include
#include

#ifndef __NR_paccept
# ifdef __x86_64__
# define __NR_paccept 288
# elif defined __i386__
# define SYS_PACCEPT 18
# define USE_SOCKETCALL 1
# else
# error "need __NR_paccept"
# endif
#endif

#ifdef USE_SOCKETCALL
# define paccept(fd, addr, addrlen, mask, flags) \
({ long args[6] = { \
(long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \
syscall (__NR_socketcall, SYS_PACCEPT, args); })
#else
# define paccept(fd, addr, addrlen, mask, flags) \
syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags)
#endif

#define PORT 57392

#define SOCK_NONBLOCK O_NONBLOCK

static pthread_barrier_t b;

static void *
tf (void *arg)
{
pthread_barrier_wait (&b);
int s = socket (AF_INET, SOCK_STREAM, 0);
struct sockaddr_in sin;
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
sin.sin_port = htons (PORT);
connect (s, (const struct sockaddr *) &sin, sizeof (sin));
close (s);
pthread_barrier_wait (&b);

pthread_barrier_wait (&b);
s = socket (AF_INET, SOCK_STREAM, 0);
sin.sin_port = htons (PORT);
connect (s, (const struct sockaddr *) &sin, sizeof (sin));
close (s);
pthread_barrier_wait (&b);

return NULL;
}

int
main (void)
{
int fd;
fd = socket (PF_INET, SOCK_STREAM, 0);
if (fd == -1)
{
puts ("socket(0) failed");
return 1;
}
int fl = fcntl (fd, F_GETFL);
if (fl == -1)
{
puts ("fcntl failed");
return 1;
}
if (fl & O_NONBLOCK)
{
puts ("socket(0) set non-blocking mode");
return 1;
}
close (fd);

fd = socket (PF_INET, SOCK_STREAM|SOCK_NONBLOCK, 0);
if (fd == -1)
{
puts ("socket(SOCK_NONBLOCK) failed");
return 1;
}
fl = fcntl (fd, F_GETFL);
if (fl == -1)
{
puts ("fcntl failed");
return 1;
}
if ((fl & O_NONBLOCK) == 0)
{
puts ("socket(SOCK_NONBLOCK) does not set non-blocking mode");
return 1;
}
close (fd);

int fds[2];
if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1)
{
puts ("socketpair(0) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
fl = fcntl (fds[i], F_GETFL);
if (fl == -1)
{
puts ("fcntl failed");
return 1;
}
if (fl & O_NONBLOCK)
{
printf ("socketpair(0) set non-blocking mode for fds[%d]\n", i);
return 1;
}
close (fds[i]);
}

if (socketpair (PF_UNIX, SOCK_STREAM|SOCK_NONBLOCK, 0, fds) == -1)
{
puts ("socketpair(SOCK_NONBLOCK) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
fl = fcntl (fds[i], F_GETFL);
if (fl == -1)
{
puts ("fcntl failed");
return 1;
}
if ((fl & O_NONBLOCK) == 0)
{
printf ("socketpair(SOCK_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i);
return 1;
}
close (fds[i]);
}

pthread_barrier_init (&b, NULL, 2);

struct sockaddr_in sin;
pthread_t th;
if (pthread_create (&th, NULL, tf, NULL) != 0)
{
puts ("pthread_create failed");
return 1;
}

int s = socket (AF_INET, SOCK_STREAM, 0);
int reuse = 1;
setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse));
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
sin.sin_port = htons (PORT);
bind (s, (struct sockaddr *) &sin, sizeof (sin));
listen (s, SOMAXCONN);

pthread_barrier_wait (&b);

int s2 = paccept (s, NULL, 0, NULL, 0);
if (s2 < 0)
{
puts ("paccept(0) failed");
return 1;
}

fl = fcntl (s2, F_GETFL);
if (fl & O_NONBLOCK)
{
puts ("paccept(0) set non-blocking mode");
return 1;
}
close (s2);
close (s);

pthread_barrier_wait (&b);

s = socket (AF_INET, SOCK_STREAM, 0);
sin.sin_port = htons (PORT);
setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse));
bind (s, (struct sockaddr *) &sin, sizeof (sin));
listen (s, SOMAXCONN);

pthread_barrier_wait (&b);

s2 = paccept (s, NULL, 0, NULL, SOCK_NONBLOCK);
if (s2 < 0)
{
puts ("paccept(SOCK_NONBLOCK) failed");
return 1;
}

fl = fcntl (s2, F_GETFL);
if ((fl & O_NONBLOCK) == 0)
{
puts ("paccept(SOCK_NONBLOCK) does not set non-blocking mode");
return 1;
}
close (s2);
close (s);

pthread_barrier_wait (&b);
puts ("OK");

return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper
Acked-by: Davide Libenzi
Cc: Michael Kerrisk
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2008-07-25 01:47:29 +0800
c019bbc61 flag parameters: paccept w/out set_restore_sigmask ... Browse Code »

Some platforms do not have support to restore the signal mask in the
return path from a syscall. For those platforms syscalls like pselect are
not defined at all. This is, I think, not a good choice for paccept()
since paccept() adds more value on top of accept() than just the signal
mask handling.

Therefore this patch defines a scaled down version of the sys_paccept
function for those platforms. It returns -EINVAL in case the signal mask
is non-NULL but behaves the same otherwise.

Note that I explicitly included . I saw that it is
currently included but indirectly two levels down. There is too much risk
in relying on this. The header might change and then suddenly the
function definition would change without anyone immediately noticing.

Signed-off-by: Ulrich Drepper
Cc: Davide Libenzi
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2008-07-25 01:47:27 +0800
aaca0bdca flag parameters: paccept ... Browse Code »

This patch is by far the most complex in the series. It adds a new syscall
paccept. This syscall differs from accept in that it adds (at the userlevel)
two additional parameters:

- a signal mask
- a flags value

The flags parameter can be used to set flag like SOCK_CLOEXEC. This is
imlpemented here as well. Some people argued that this is a property which
should be inherited from the file desriptor for the server but this is against
POSIX. Additionally, we really want the signal mask parameter as well
(similar to pselect, ppoll, etc). So an interface change in inevitable.

The flag value is the same as for socket and socketpair. I think diverging
here will only create confusion. Similar to the filesystem interfaces where
the use of the O_* constants differs, it is acceptable here.

The signal mask is handled as for pselect etc. The mask is temporarily
installed for the thread and removed before the call returns. I modeled the
code after pselect. If there is a problem it's likely also in pselect.

For architectures which use socketcall I maintained this interface instead of
adding a system call. The symmetry shouldn't be broken.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include
#include
#include
#include
#include
#include
#include
#include
#include

#ifndef __NR_paccept
# ifdef __x86_64__
# define __NR_paccept 288
# elif defined __i386__
# define SYS_PACCEPT 18
# define USE_SOCKETCALL 1
# else
# error "need __NR_paccept"
# endif
#endif

#ifdef USE_SOCKETCALL
# define paccept(fd, addr, addrlen, mask, flags) \
({ long args[6] = { \
(long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \
syscall (__NR_socketcall, SYS_PACCEPT, args); })
#else
# define paccept(fd, addr, addrlen, mask, flags) \
syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags)
#endif

#define PORT 57392

#define SOCK_CLOEXEC O_CLOEXEC

static pthread_barrier_t b;

static void *
tf (void *arg)
{
pthread_barrier_wait (&b);
int s = socket (AF_INET, SOCK_STREAM, 0);
struct sockaddr_in sin;
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
sin.sin_port = htons (PORT);
connect (s, (const struct sockaddr *) &sin, sizeof (sin));
close (s);

pthread_barrier_wait (&b);
s = socket (AF_INET, SOCK_STREAM, 0);
sin.sin_port = htons (PORT);
connect (s, (const struct sockaddr *) &sin, sizeof (sin));
close (s);
pthread_barrier_wait (&b);

pthread_barrier_wait (&b);
sleep (2);
pthread_kill ((pthread_t) arg, SIGUSR1);

return NULL;
}

static void
handler (int s)
{
}

int
main (void)
{
pthread_barrier_init (&b, NULL, 2);

struct sockaddr_in sin;
pthread_t th;
if (pthread_create (&th, NULL, tf, (void *) pthread_self ()) != 0)
{
puts ("pthread_create failed");
return 1;
}

int s = socket (AF_INET, SOCK_STREAM, 0);
int reuse = 1;
setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse));
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
sin.sin_port = htons (PORT);
bind (s, (struct sockaddr *) &sin, sizeof (sin));
listen (s, SOMAXCONN);

pthread_barrier_wait (&b);

int s2 = paccept (s, NULL, 0, NULL, 0);
if (s2 < 0)
{
puts ("paccept(0) failed");
return 1;
}

int coe = fcntl (s2, F_GETFD);
if (coe & FD_CLOEXEC)
{
puts ("paccept(0) set close-on-exec-flag");
return 1;
}
close (s2);

pthread_barrier_wait (&b);

s2 = paccept (s, NULL, 0, NULL, SOCK_CLOEXEC);
if (s2 < 0)
{
puts ("paccept(SOCK_CLOEXEC) failed");
return 1;
}

coe = fcntl (s2, F_GETFD);
if ((coe & FD_CLOEXEC) == 0)
{
puts ("paccept(SOCK_CLOEXEC) does not set close-on-exec flag");
return 1;
}
close (s2);

pthread_barrier_wait (&b);

struct sigaction sa;
sa.sa_handler = handler;
sa.sa_flags = 0;
sigemptyset (&sa.sa_mask);
sigaction (SIGUSR1, &sa, NULL);

sigset_t ss;
pthread_sigmask (SIG_SETMASK, NULL, &ss);
sigaddset (&ss, SIGUSR1);
pthread_sigmask (SIG_SETMASK, &ss, NULL);

sigdelset (&ss, SIGUSR1);
alarm (4);
pthread_barrier_wait (&b);

errno = 0 ;
s2 = paccept (s, NULL, 0, &ss, 0);
if (s2 != -1 || errno != EINTR)
{
puts ("paccept did not fail with EINTR");
return 1;
}

close (s);

puts ("OK");

return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[akpm@linux-foundation.org: make it compile]
[akpm@linux-foundation.org: add sys_ni stub]
Signed-off-by: Ulrich Drepper
Acked-by: Davide Libenzi
Cc: Michael Kerrisk
Cc:
Cc: "David S. Miller"
Cc: Roland McGrath
Cc: Kyle McMartin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2008-07-25 01:47:27 +0800
a677a039b flag parameters: socket and socketpair ... Browse Code »

This patch adds support for flag values which are ORed to the type passwd
to socket and socketpair. The additional code is minimal. The flag
values in this implementation can and must match the O_* flags. This
avoids overhead in the conversion.

The internal functions sock_alloc_fd and sock_map_fd get a new parameters
and all callers are changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include
#include
#include
#include
#include

#define PORT 57392

/* For Linux these must be the same. */
#define SOCK_CLOEXEC O_CLOEXEC

int
main (void)
{
int fd;
fd = socket (PF_INET, SOCK_STREAM, 0);
if (fd == -1)
{
puts ("socket(0) failed");
return 1;
}
int coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
puts ("socket(0) set close-on-exec flag");
return 1;
}
close (fd);

fd = socket (PF_INET, SOCK_STREAM|SOCK_CLOEXEC, 0);
if (fd == -1)
{
puts ("socket(SOCK_CLOEXEC) failed");
return 1;
}
coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
puts ("socket(SOCK_CLOEXEC) does not set close-on-exec flag");
return 1;
}
close (fd);

int fds[2];
if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1)
{
puts ("socketpair(0) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
coe = fcntl (fds[i], F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
printf ("socketpair(0) set close-on-exec flag for fds[%d]\n", i);
return 1;
}
close (fds[i]);
}

if (socketpair (PF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, fds) == -1)
{
puts ("socketpair(SOCK_CLOEXEC) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
coe = fcntl (fds[i], F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
printf ("socketpair(SOCK_CLOEXEC) does not set close-on-exec flag for fds[%d]\n", i);
return 1;
}
close (fds[i]);
}

puts ("OK");

return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper
Acked-by: Davide Libenzi
Cc: Michael Kerrisk
Cc: "David S. Miller"
Cc: Ralf Baechle
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2008-07-25 01:47:27 +0800

08 Jul, 2008

1 commit

2c693610f net: remove padding from struct socket on 64bit & increase objects/cache ... Browse Code »

remove padding from struct socket reducing its size by 8 bytes.

This allows more objects/cache in sock_inode_cache
12 objects/cache when cacheline size is 128 (generic x86_64)

Signed-off-by: Richard Kennedy
Signed-off-by: David S. Miller

Richard Kennedy
2008-07-08 18:03:01 +0800