Doug / smarc-fsl-linux-kernel | Embedian Git Server

13 Oct, 2009

2 commits

a2e272554 net: Introduce recvmmsg socket syscall ... Browse Code »

Meaning receive multiple messages, reducing the number of syscalls and
net stack entry/exit operations.

Next patches will introduce mechanisms where protocols that want to
optimize this operation will provide an unlocked_recvmsg operation.

This takes into account comments made by:

. Paul Moore: sock_recvmsg is called only for the first datagram,
sock_recvmsg_nosec is used for the rest.

. Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
works in the same fashion as the ppoll one.

If the underlying protocol returns a datagram with MSG_OOB set, this
will make recvmmsg return right away with as many datagrams (+ the OOB
one) it has received so far.

. Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
datagrams and then recvmsg returns an error, recvmmsg will return
the successfully received datagrams, store the error and return it
in the next call.

This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
where we will be able to acquire the lock only at batch start and end, not at
every underlying recvmsg call.

Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller

Arnaldo Carvalho de Melo
2009-10-13 14:40:10 +0800
3b885787e net: Generalize socket rx gap / receive queue overflow cmsg ... Browse Code »

Create a new socket level option to report number of queue overflows

Recently I augmented the AF_PACKET protocol to report the number of frames lost
on the socket receive queue between any two enqueued frames. This value was
exported via a SOL_PACKET level cmsg. AFter I completed that work it was
requested that this feature be generalized so that any datagram oriented socket
could make use of this option. As such I've created this patch, It creates a
new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
overflowed between any two given frames. It also augments the AF_PACKET
protocol to take advantage of this new feature (as it previously did not touch
sk->sk_drops, which this patch uses to record the overflow count). Tested
successfully by me.

Notes:

1) Unlike my previous patch, this patch simply records the sk_drops value, which
is not a number of drops between packets, but rather a total number of drops.
Deltas must be computed in user space.

2) While this patch currently works with datagram oriented protocols, it will
also be accepted by non-datagram oriented protocols. I'm not sure if thats
agreeable to everyone, but my argument in favor of doing so is that, for those
protocols which aren't applicable to this option, sk_drops will always be zero,
and reporting no drops on a receive queue that isn't used for those
non-participating protocols seems reasonable to me. This also saves us having
to code in a per-protocol opt in mechanism.

3) This applies cleanly to net-next assuming that commit
977750076d98c7ff6cbda51858bb5a5894a9d9ab (my af packet cmsg patch) is reverted

Signed-off-by: Neil Horman
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Neil Horman
2009-10-13 04:26:31 +0800

10 Oct, 2009

1 commit

8aa0f64ac Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 Browse Code »

David S. Miller
2009-10-10 05:40:09 +0800

08 Oct, 2009

1 commit

3d23e349d wext: refactor ... Browse Code »

Refactor wext to
* split out iwpriv handling
* split out iwspy handling
* split out procfs support
* allow cfg80211 to have wireless extensions compat code
w/o CONFIG_WIRELESS_EXT

After this, drivers need to
- select WIRELESS_EXT - for wext support
- select WEXT_PRIV - for iwpriv support
- select WEXT_SPY - for iwspy support

except cfg80211 -- which gets new hooks in wext-core.c
and can then get wext handlers without CONFIG_WIRELESS_EXT.

Wireless extensions procfs support is auto-selected
based on PROC_FS and anything that requires the wext core
(i.e. WIRELESS_EXT or CFG80211_WEXT).

Signed-off-by: Johannes Berg
Signed-off-by: John W. Linville

Johannes Berg
2009-10-08 04:39:43 +0800

07 Oct, 2009

1 commit

bcdce7195 net: speedup sk_wake_async() ... Browse Code »

An incoming datagram must bring into cpu cache *lot* of cache lines,
in particular : (other parts omitted (hash chains, ip route cache...))

On 32bit arches :

offsetof(struct sock, sk_rcvbuf) =0x30 (read)
offsetof(struct sock, sk_lock) =0x34 (rw)

offsetof(struct sock, sk_sleep) =0x50 (read)
offsetof(struct sock, sk_rmem_alloc) =0x64 (rw)
offsetof(struct sock, sk_receive_queue)=0x74 (rw)

offsetof(struct sock, sk_forward_alloc)=0x98 (rw)

offsetof(struct sock, sk_callback_lock)=0xcc (rw)
offsetof(struct sock, sk_drops) =0xd8 (read if we add dropcount support, rw if frame dropped)
offsetof(struct sock, sk_filter) =0xf8 (read)

offsetof(struct sock, sk_socket) =0x138 (read)

offsetof(struct sock, sk_data_ready) =0x15c (read)

We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
with no fasync() structures. (socket->fasync_list ptr is probably already in cache
because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)

This avoids one cache line load per incoming packet for common cases (no fasync())

We can leave (or even move in a future patch) sk->sk_socket in a cold location

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-10-07 08:28:29 +0800

01 Oct, 2009

1 commit

b7058842c net: Make setsockopt() optlen be unsigned. ... Browse Code »

This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.

Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.

Signed-off-by: David S. Miller

David S. Miller
2009-10-01 07:12:20 +0800

29 Sep, 2009

1 commit

47379052b net: Add explicit bound checks in net/socket.c ... Browse Code »

The sys_socketcall() function has a very clever system for the copy
size of its arguments. Unfortunately, gcc cannot deal with this in
terms of proving that the copy_from_user() is then always in bounds.
This is the last (well 9th of this series, but last in the kernel) such
case around.

With this patch, we can turn on code to make having the boundary provably
right for the whole kernel, and detect introduction of new security
accidents of this type early on.

Signed-off-by: Arjan van de Ven
Signed-off-by: David S. Miller

Arjan van de Ven
2009-09-29 03:57:44 +0800

23 Sep, 2009

1 commit

1fd7317d0 Move magic numbers into magic.h ... Browse Code »

Move various magic-number definitions into magic.h.

Signed-off-by: Nick Black
Acked-by: Pekka Enberg
Cc: Al Viro
Cc: "David S. Miller"
Cc: Casey Schaufler
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Black
2009-09-23 22:39:28 +0800

22 Sep, 2009

1 commit

b87221de6 const: mark remaining super_operations const ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-09-22 22:17:24 +0800

15 Sep, 2009

1 commit

29a020d35 [PATCH] net: kmemcheck annotation in struct socket ... Browse Code »

struct socket has a 16 bit hole that triggers kmemcheck warnings.

As suggested by Ingo, use kmemcheck annotations

Signed-off-by: Eric Dumazet
Acked-by: Ingo Molnar
Signed-off-by: David S. Miller

Eric Dumazet
2009-09-15 17:39:20 +0800

14 Aug, 2009

1 commit

e69495838 Make sock_sendpage() use kernel_sendpage() ... Browse Code »

kernel_sendpage() does the proper default case handling for when the
socket doesn't have a native sendpage implementation.

Now, arguably this might be something that we could instead solve by
just specifying that all protocols should do it themselves at the
protocol level, but we really only care about the common protocols.
Does anybody really care about sendpage on something like Appletalk? Not
likely.

Acked-by: David S. Miller
Acked-by: Julien TINNES
Acked-by: Tavis Ormandy
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Linus Torvalds
2009-08-14 01:57:26 +0800

07 Apr, 2009

1 commit

398920329 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
b44: Use kernel DMA addresses for the kernel DMA API
forcedeth: Fix resume from hibernation regression.
xfrm: fix fragmentation on inter family tunnels
ibm_newemac: Fix dangerous struct assumption
gigaset: documentation update
gigaset: in file ops, check for device disconnect before anything else
bas_gigaset: use tasklet_hi_schedule for timing critical tasklets
net/802/fddi.c: add MODULE_LICENSE
smsc911x: remove unused #include
axnet_cs: fix phy_id detection for bogus Asix chip.
bnx2: Use request_firmware()
b44: Fix sizes passed to b44_sync_dma_desc_for_{device,cpu}()
socket: use percpu_add() while updating sockets_in_use
virtio_net: Set the mac config only when VIRITO_NET_F_MAC
myri_sbus: use request_firmware
e1000: fix loss of multicast packets
vxge: should include tcp.h

Conflict in firmware/WHENCE (SCSI vs net firmware)

Linus Torvalds
2009-04-07 09:05:43 +0800

05 Apr, 2009

1 commit

4e69489a0 socket: use percpu_add() while updating sockets_in_use ... Browse Code »

sock_alloc() currently uses following code to update sockets_in_use

get_cpu_var(sockets_in_use)++;
put_cpu_var(sockets_in_use);

This translates to :

c0436274: b8 01 00 00 00 mov $0x1,%eax
c0436279: e8 42 40 df ff call c022a2c0
c043627e: bb 20 4f 6a c0 mov $0xc06a4f20,%ebx
c0436283: e8 18 ca f0 ff call c0342ca0
c0436288: 03 1c 85 60 4a 65 c0 add -0x3f9ab5a0(,%eax,4),%ebx
c043628f: ff 03 incl (%ebx)
c0436291: b8 01 00 00 00 mov $0x1,%eax
c0436296: e8 75 3f df ff call c022a210
c043629b: 89 e0 mov %esp,%eax
c043629d: 25 00 e0 ff ff and $0xffffe000,%eax
c04362a2: f6 40 08 08 testb $0x8,0x8(%eax)
c04362a6: 75 07 jne c04362af
c04362a8: 8d 46 d8 lea -0x28(%esi),%eax
c04362ab: 5b pop %ebx
c04362ac: 5e pop %esi
c04362ad: c9 leave
c04362ae: c3 ret
c04362af: e8 cc 5d 09 00 call c04cc080
c04362b4: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
c04362b8: eb ee jmp c04362a8

While percpu_add(sockets_in_use, 1) translates to a single instruction :

c0436275: 64 83 05 20 5f 6a c0 addl $0x1,%fs:0xc06a5f20

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2009-04-05 07:41:09 +0800

28 Mar, 2009

3 commits

8651d5c0b lsm: Remove the socket_post_accept() hook ... Browse Code »

The socket_post_accept() hook is not currently used by any in-tree modules
and its existence continues to cause problems by confusing people about
what can be safely accomplished using this hook. If a legitimate need for
this hook arises in the future it can always be reintroduced.

Signed-off-by: Paul Moore
Signed-off-by: James Morris

Paul Moore
2009-03-28 12:01:37 +0800
3ae5080f4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits)
fs: avoid I_NEW inodes
Merge code for single and multiple-instance mounts
Remove get_init_pts_sb()
Move common mknod_ptmx() calls into caller
Parse mount options just once and copy them to super block
Unroll essentials of do_remount_sb() into devpts
vfs: simple_set_mnt() should return void
fs: move bdev code out of buffer.c
constify dentry_operations: rest
constify dentry_operations: configfs
constify dentry_operations: sysfs
constify dentry_operations: JFS
constify dentry_operations: OCFS2
constify dentry_operations: GFS2
constify dentry_operations: FAT
constify dentry_operations: FUSE
constify dentry_operations: procfs
constify dentry_operations: ecryptfs
constify dentry_operations: CIFS
constify dentry_operations: AFS
...

Linus Torvalds
2009-03-28 07:23:12 +0800
3ba13d179 constify dentry_operations: rest ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2009-03-28 02:44:03 +0800

27 Mar, 2009

1 commit

8e9d20897 Merge branch 'bkl-removal' of git://git.lwn.net/linux-2.6 ... Browse Code »

* 'bkl-removal' of git://git.lwn.net/linux-2.6:
Rationalize fasync return values
Move FASYNC bit handling to f_op->fasync()
Use f_lock to protect f_flags
Rename struct file->f_ep_lock

Linus Torvalds
2009-03-27 07:14:02 +0800

16 Mar, 2009

1 commit

76398425b Move FASYNC bit handling to f_op->fasync() ... Browse Code »

Removing the BKL from FASYNC handling ran into the challenge of keeping the
setting of the FASYNC bit in filp->f_flags atomic with regard to calls to
the underlying fasync() function. Andi Kleen suggested moving the handling
of that bit into fasync(); this patch does exactly that. As a result, we
have a couple of internal API changes: fasync() must now manage the FASYNC
bit, and it will be called without the BKL held.

As it happens, every fasync() implementation in the kernel with one
exception calls fasync_helper(). So, if we make fasync_helper() set the
FASYNC bit, we can avoid making any changes to the other fasync()
functions - as long as those functions, themselves, have proper locking.
Most fasync() implementations do nothing but call fasync_helper() - which
has its own lock - so they are easily verified as correct. The BKL had
already been pushed down into the rest.

The networking code has its own version of fasync_helper(), so that code
has been augmented with explicit FASYNC bit handling.

Cc: Al Viro
Cc: David Miller
Reviewed-by: Christoph Hellwig
Signed-off-by: Jonathan Corbet

Jonathan Corbet
2009-03-16 22:32:27 +0800

16 Feb, 2009

1 commit

20d494735 net: socket infrastructure for SO_TIMESTAMPING ... Browse Code »

The overlap with the old SO_TIMESTAMP[NS] options is handled so
that time stamping in software (net_enable_timestamp()) is
enabled when SO_TIMESTAMP[NS] and/or SO_TIMESTAMPING_RX_SOFTWARE
is set. It's disabled if all of these are off.

Signed-off-by: Patrick Ohly
Signed-off-by: David S. Miller

Patrick Ohly
2009-02-16 14:43:35 +0800

14 Jan, 2009

3 commits

3e0fa65f8 [CVE-2009-0029] System call wrappers part 22 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:27 +0800
20f37034f [CVE-2009-0029] System call wrappers part 21 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:26 +0800
754fe8d29 [CVE-2009-0029] System call wrappers part 07 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:20 +0800

05 Jan, 2009

2 commits

157cf649a sanitize audit_fd_pair() ... Browse Code »

* no allocations
* return void

Signed-off-by: Al Viro

Al Viro
2009-01-05 04:14:41 +0800
f3298dc4f sanitize audit_socketcall ... Browse Code »

* don't bother with allocations
* now that it can't fail, make it return void

Signed-off-by: Al Viro

Al Viro
2009-01-05 04:14:39 +0800

29 Dec, 2008

1 commit

0191b625c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
net: Allow dependancies of FDDI & Tokenring to be modular.
igb: Fix build warning when DCA is disabled.
net: Fix warning fallout from recent NAPI interface changes.
gro: Fix potential use after free
sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
sfc: When disabling the NIC, close the device rather than unregistering it
sfc: SFT9001: Add cable diagnostics
sfc: Add support for multiple PHY self-tests
sfc: Merge top-level functions for self-tests
sfc: Clean up PHY mode management in loopback self-test
sfc: Fix unreliable link detection in some loopback modes
sfc: Generate unique names for per-NIC workqueues
802.3ad: use standard ethhdr instead of ad_header
802.3ad: generalize out mac address initializer
802.3ad: initialize ports LACPDU from const initializer
802.3ad: remove typedef around ad_system
802.3ad: turn ports is_individual into a bool
802.3ad: turn ports is_enabled into a bool
802.3ad: make ntt bool
ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
...

Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).

Linus Torvalds
2008-12-29 04:49:40 +0800

25 Dec, 2008

1 commit

cbacc2c7f Merge branch 'next' into for-linus Browse Code »

James Morris
2008-12-25 08:40:09 +0800

24 Dec, 2008

1 commit

6332178d9 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:

drivers/net/ppp_generic.c

David S. Miller
2008-12-24 09:56:23 +0800

19 Dec, 2008

1 commit

1b08534e5 net: Fix module refcount leak in kernel_accept() ... Browse Code »

The kernel_accept() does not hold the module refcount of newsock->ops->owner,
so we need __module_get(newsock->ops->owner) code after call kernel_accept()
by hand.
In sunrpc, the module refcount is missing to hold. So this cause kernel panic.

Used following script to reproduct:

while [ 1 ];
do
mount -t nfs4 192.168.0.19:/ /mnt
touch /mnt/file
umount /mnt
lsmod | grep ipv6
done

This patch fixed the problem by add __module_get(newsock->ops->owner) to
kernel_accept(). So we do not need to used __module_get(newsock->ops->owner)
in every place when used kernel_accept().

Signed-off-by: Wei Yongjun
Signed-off-by: David S. Miller

Wei Yongjun
2008-12-19 11:35:10 +0800

04 Dec, 2008

1 commit

ec98ce480 Merge branch 'master' into next ... Browse Code »

Conflicts:
fs/nfsd/nfs4recover.c

Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().

Signed-off-by: James Morris

James Morris
2008-12-04 14:16:36 +0800

21 Nov, 2008

1 commit

6ab33d517 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:

drivers/net/ixgbe/ixgbe_main.c
include/net/mac80211.h
net/phonet/af_phonet.c

David S. Miller
2008-11-21 08:44:00 +0800

20 Nov, 2008

1 commit

de11defeb reintroduce accept4 ... Browse Code »

Introduce a new accept4() system call. The addition of this system call
matches analogous changes in 2.6.27 (dup3(), evenfd2(), signalfd4(),
inotify_init1(), epoll_create1(), pipe2()) which added new system calls
that differed from analogous traditional system calls in adding a flags
argument that can be used to access additional functionality.

The accept4() system call is exactly the same as accept(), except that
it adds a flags bit-mask argument. Two flags are initially implemented.
(Most of the new system calls in 2.6.27 also had both of these flags.)

SOCK_CLOEXEC causes the close-on-exec (FD_CLOEXEC) flag to be enabled
for the new file descriptor returned by accept4(). This is a useful
security feature to avoid leaking information in a multithreaded
program where one thread is doing an accept() at the same time as
another thread is doing a fork() plus exec(). More details here:
http://udrepper.livejournal.com/20407.html "Secure File Descriptor Handling",
Ulrich Drepper).

The other flag is SOCK_NONBLOCK, which causes the O_NONBLOCK flag
to be enabled on the new open file description created by accept4().
(This flag is merely a convenience, saving the use of additional calls
fcntl(F_GETFL) and fcntl (F_SETFL) to achieve the same result.

Here's a test program. Works on x86-32. Should work on x86-64, but
I (mtk) don't have a system to hand to test with.

It tests accept4() with each of the four possible combinations of
SOCK_CLOEXEC and SOCK_NONBLOCK set/clear in 'flags', and verifies
that the appropriate flags are set on the file descriptor/open file
description returned by accept4().

I tested Ulrich's patch in this thread by applying against 2.6.28-rc2,
and it passes according to my test program.

/* test_accept4.c

Copyright (C) 2008, Linux Foundation, written by Michael Kerrisk

Licensed under the GNU GPLv2 or later.
*/
#define _GNU_SOURCE
#include
#include
#include
#include
#include
#include
#include
#include

#define PORT_NUM 33333

#define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)

/**********************************************************************/

/* The following is what we need until glibc gets a wrapper for
accept4() */

/* Flags for socket(), socketpair(), accept4() */
#ifndef SOCK_CLOEXEC
#define SOCK_CLOEXEC O_CLOEXEC
#endif
#ifndef SOCK_NONBLOCK
#define SOCK_NONBLOCK O_NONBLOCK
#endif

#ifdef __x86_64__
#define SYS_accept4 288
#elif __i386__
#define USE_SOCKETCALL 1
#define SYS_ACCEPT4 18
#else
#error "Sorry -- don't know the syscall # on this architecture"
#endif

static int
accept4(int fd, struct sockaddr *sockaddr, socklen_t *addrlen, int flags)
{
printf("Calling accept4(): flags = %x", flags);
if (flags != 0) {
printf(" (");
if (flags & SOCK_CLOEXEC)
printf("SOCK_CLOEXEC");
if ((flags & SOCK_CLOEXEC) && (flags & SOCK_NONBLOCK))
printf(" ");
if (flags & SOCK_NONBLOCK)
printf("SOCK_NONBLOCK");
printf(")");
}
printf("\n");

#if USE_SOCKETCALL
long args[6];

args[0] = fd;
args[1] = (long) sockaddr;
args[2] = (long) addrlen;
args[3] = flags;

return syscall(SYS_socketcall, SYS_ACCEPT4, args);
#else
return syscall(SYS_accept4, fd, sockaddr, addrlen, flags);
#endif
}

/**********************************************************************/

static int
do_test(int lfd, struct sockaddr_in *conn_addr,
int closeonexec_flag, int nonblock_flag)
{
int connfd, acceptfd;
int fdf, flf, fdf_pass, flf_pass;
struct sockaddr_in claddr;
socklen_t addrlen;

printf("=======================================\n");

connfd = socket(AF_INET, SOCK_STREAM, 0);
if (connfd == -1)
die("socket");
if (connect(connfd, (struct sockaddr *) conn_addr,
sizeof(struct sockaddr_in)) == -1)
die("connect");

addrlen = sizeof(struct sockaddr_in);
acceptfd = accept4(lfd, (struct sockaddr *) &claddr, &addrlen,
closeonexec_flag | nonblock_flag);
if (acceptfd == -1) {
perror("accept4()");
close(connfd);
return 0;
}

fdf = fcntl(acceptfd, F_GETFD);
if (fdf == -1)
die("fcntl:F_GETFD");
fdf_pass = ((fdf & FD_CLOEXEC) != 0) ==
((closeonexec_flag & SOCK_CLOEXEC) != 0);
printf("Close-on-exec flag is %sset (%s); ",
(fdf & FD_CLOEXEC) ? "" : "not ",
fdf_pass ? "OK" : "failed");

flf = fcntl(acceptfd, F_GETFL);
if (flf == -1)
die("fcntl:F_GETFD");
flf_pass = ((flf & O_NONBLOCK) != 0) ==
((nonblock_flag & SOCK_NONBLOCK) !=0);
printf("nonblock flag is %sset (%s)\n",
(flf & O_NONBLOCK) ? "" : "not ",
flf_pass ? "OK" : "failed");

close(acceptfd);
close(connfd);

printf("Test result: %s\n", (fdf_pass && flf_pass) ? "PASS" : "FAIL");
return fdf_pass && flf_pass;
}

static int
create_listening_socket(int port_num)
{
struct sockaddr_in svaddr;
int lfd;
int optval;

memset(&svaddr, 0, sizeof(struct sockaddr_in));
svaddr.sin_family = AF_INET;
svaddr.sin_addr.s_addr = htonl(INADDR_ANY);
svaddr.sin_port = htons(port_num);

lfd = socket(AF_INET, SOCK_STREAM, 0);
if (lfd == -1)
die("socket");

optval = 1;
if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval,
sizeof(optval)) == -1)
die("setsockopt");

if (bind(lfd, (struct sockaddr *) &svaddr,
sizeof(struct sockaddr_in)) == -1)
die("bind");

if (listen(lfd, 5) == -1)
die("listen");

return lfd;
}

int
main(int argc, char *argv[])
{
struct sockaddr_in conn_addr;
int lfd;
int port_num;
int passed;

passed = 1;

port_num = (argc > 1) ? atoi(argv[1]) : PORT_NUM;

memset(&conn_addr, 0, sizeof(struct sockaddr_in));
conn_addr.sin_family = AF_INET;
conn_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
conn_addr.sin_port = htons(port_num);

lfd = create_listening_socket(port_num);

if (!do_test(lfd, &conn_addr, 0, 0))
passed = 0;
if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, 0))
passed = 0;
if (!do_test(lfd, &conn_addr, 0, SOCK_NONBLOCK))
passed = 0;
if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, SOCK_NONBLOCK))
passed = 0;

close(lfd);

exit(passed ? EXIT_SUCCESS : EXIT_FAILURE);
}

[mtk.manpages@gmail.com: rewrote changelog, updated test program]
Signed-off-by: Ulrich Drepper
Tested-by: Michael Kerrisk
Acked-by: Michael Kerrisk
Cc:
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2008-11-20 10:49:57 +0800

14 Nov, 2008

1 commit

8192b0c48 CRED: Wrap task credential accesses in the networking subsystem ... Browse Code »

Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells
Reviewed-by: James Morris
Acked-by: Serge Hallyn
Cc: netdev@vger.kernel.org
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:10 +0800

07 Nov, 2008

1 commit

9eeda9abd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:

drivers/net/wireless/ath5k/base.c
net/8021q/vlan_core.c

David S. Miller
2008-11-07 14:43:03 +0800

04 Nov, 2008

1 commit

ab2910921 net: remove two duplicated #include ... Browse Code »

Removed duplicated #include in net/9p/trans_rdma.c
and #include in net/socket.c

Signed-off-by: Jianjun Kong
Signed-off-by: David S. Miller

Jianjun Kong
2008-11-04 10:23:09 +0800

02 Nov, 2008

1 commit

233e70f42 saner FASYNC handling on file close ... Browse Code »

As it is, all instances of ->release() for files that have ->fasync()
need to remember to evict file from fasync lists; forgetting that
creates a hole and we actually have a bunch that *does* forget.

So let's keep our lives simple - let __fput() check FASYNC in
file->f_flags and call ->fasync() there if it's been set. And lose that
crap in ->release() instances - leaving it there is still valid, but we
don't have to bother anymore.

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2008-11-02 00:49:46 +0800

17 Oct, 2008

1 commit

95a5afca4 net: Remove CONFIG_KMOD from net/ (towards removing CONFIG_KMOD entirely) ... Browse Code »

Some code here depends on CONFIG_KMOD to not try to load
protocol modules or similar, replace by CONFIG_MODULES
where more than just request_module depends on CONFIG_KMOD
and and also use try_then_request_module in ebtables.

Signed-off-by: Johannes Berg
Signed-off-by: Rusty Russell
Signed-off-by: David S. Miller

Johannes Berg
2008-10-17 06:24:51 +0800

23 Sep, 2008

1 commit

2d4c82667 sys_paccept: disable paccept() until API design is resolved ... Browse Code »

The reasons for disabling paccept() are as follows:

* The API is more complex than needed. There is AFAICS no demonstrated
use case that the sigset argument of this syscall serves that couldn't
equally be served by the use of pselect/ppoll/epoll_pwait + traditional
accept(). Roland seems to concur with this opinion
(http://thread.gmane.org/gmane.linux.kernel/723953/focus=732255). I
have (more than once) asked Ulrich to explain otherwise
(http://thread.gmane.org/gmane.linux.kernel/723952/focus=731018), but he
does not respond, so one is left to assume that he doesn't know of such
a case.

* The use of a sigset argument is not consistent with other I/O APIs
that can block on a single file descriptor (e.g., read(), recv(),
connect()).

* The behavior of paccept() when interrupted by a signal is IMO strange:
the kernel restarts the system call if SA_RESTART was set for the
handler. I think that it should not do this -- that it should behave
consistently with paccept()/ppoll()/epoll_pwait(), which never restart,
regardless of SA_RESTART. The reasoning here is that the very purpose
of paccept() is to wait for a connection or a signal, and that
restarting in the latter case is probably never useful. (Note: Roland
disagrees on this point, believing that rather paccept() should be
consistent with accept() in its behavior wrt EINTR
(http://thread.gmane.org/gmane.linux.kernel/723953/focus=732255).)

I believe that instead, a simpler API, consistent with Ulrich's other
recent additions, is preferable:

accept4(int fd, struct sockaddr *sa, socklen_t *salen, ind flags);

(This simpler API was originally proposed by Ulrich:
http://thread.gmane.org/gmane.linux.network/92072)

If this simpler API is added, then if we later decide that the sigset
argument really is required, then a suitable bit in 'flags' could be added
to indicate the presence of the sigset argument.

At this point, I am hoping we either will get a counter-argument from
Ulrich about why we really do need paccept()'s sigset argument, or that he
will resubmit the original accept4() patch.

Signed-off-by: Michael Kerrisk
Cc: David Miller
Cc: Davide Libenzi
Cc: Alan Cox
Cc: Ulrich Drepper
Cc: Jakub Jelinek
Cc: Roland McGrath
Cc: Oleg Nesterov
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michael Kerrisk
2008-09-23 23:09:14 +0800

27 Jul, 2008

1 commit

51cc50685 SL*B: drop kmem cache argument from constructor ... Browse Code »

Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.

Non-trivial places are:
arch/powerpc/mm/init_64.c
arch/powerpc/mm/hugetlbpage.c

This is flag day, yes.

Signed-off-by: Alexey Dobriyan
Acked-by: Pekka Enberg
Acked-by: Christoph Lameter
Cc: Jon Tollefson
Cc: Nick Piggin
Cc: Matt Mackall
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2008-07-27 03:00:07 +0800

25 Jul, 2008

2 commits

e38b36f32 flag parameters: check magic constants ... Browse Code »

This patch adds test that ensure the boundary conditions for the various
constants introduced in the previous patches is met. No code is generated.

[akpm@linux-foundation.org: fix alpha]
Signed-off-by: Ulrich Drepper
Acked-by: Davide Libenzi
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2008-07-25 01:47:29 +0800
77d272005 flag parameters: NONBLOCK in socket and socketpair ... Browse Code »

This patch introduces support for the SOCK_NONBLOCK flag in socket,
socketpair, and paccept. To do this the internal function sock_attach_fd
gets an additional parameter which it uses to set the appropriate flag for
the file descriptor.

Given that in modern, scalable programs almost all socket connections are
non-blocking and the minimal additional cost for the new functionality
I see no reason not to add this code.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include
#include
#include
#include
#include
#include
#include

#ifndef __NR_paccept
# ifdef __x86_64__
# define __NR_paccept 288
# elif defined __i386__
# define SYS_PACCEPT 18
# define USE_SOCKETCALL 1
# else
# error "need __NR_paccept"
# endif
#endif

#ifdef USE_SOCKETCALL
# define paccept(fd, addr, addrlen, mask, flags) \
({ long args[6] = { \
(long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \
syscall (__NR_socketcall, SYS_PACCEPT, args); })
#else
# define paccept(fd, addr, addrlen, mask, flags) \
syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags)
#endif

#define PORT 57392

#define SOCK_NONBLOCK O_NONBLOCK

static pthread_barrier_t b;

static void *
tf (void *arg)
{
pthread_barrier_wait (&b);
int s = socket (AF_INET, SOCK_STREAM, 0);
struct sockaddr_in sin;
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
sin.sin_port = htons (PORT);
connect (s, (const struct sockaddr *) &sin, sizeof (sin));
close (s);
pthread_barrier_wait (&b);

pthread_barrier_wait (&b);
s = socket (AF_INET, SOCK_STREAM, 0);
sin.sin_port = htons (PORT);
connect (s, (const struct sockaddr *) &sin, sizeof (sin));
close (s);
pthread_barrier_wait (&b);

return NULL;
}

int
main (void)
{
int fd;
fd = socket (PF_INET, SOCK_STREAM, 0);
if (fd == -1)
{
puts ("socket(0) failed");
return 1;
}
int fl = fcntl (fd, F_GETFL);
if (fl == -1)
{
puts ("fcntl failed");
return 1;
}
if (fl & O_NONBLOCK)
{
puts ("socket(0) set non-blocking mode");
return 1;
}
close (fd);

fd = socket (PF_INET, SOCK_STREAM|SOCK_NONBLOCK, 0);
if (fd == -1)
{
puts ("socket(SOCK_NONBLOCK) failed");
return 1;
}
fl = fcntl (fd, F_GETFL);
if (fl == -1)
{
puts ("fcntl failed");
return 1;
}
if ((fl & O_NONBLOCK) == 0)
{
puts ("socket(SOCK_NONBLOCK) does not set non-blocking mode");
return 1;
}
close (fd);

int fds[2];
if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1)
{
puts ("socketpair(0) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
fl = fcntl (fds[i], F_GETFL);
if (fl == -1)
{
puts ("fcntl failed");
return 1;
}
if (fl & O_NONBLOCK)
{
printf ("socketpair(0) set non-blocking mode for fds[%d]\n", i);
return 1;
}
close (fds[i]);
}

if (socketpair (PF_UNIX, SOCK_STREAM|SOCK_NONBLOCK, 0, fds) == -1)
{
puts ("socketpair(SOCK_NONBLOCK) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
fl = fcntl (fds[i], F_GETFL);
if (fl == -1)
{
puts ("fcntl failed");
return 1;
}
if ((fl & O_NONBLOCK) == 0)
{
printf ("socketpair(SOCK_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i);
return 1;
}
close (fds[i]);
}

pthread_barrier_init (&b, NULL, 2);

struct sockaddr_in sin;
pthread_t th;
if (pthread_create (&th, NULL, tf, NULL) != 0)
{
puts ("pthread_create failed");
return 1;
}

int s = socket (AF_INET, SOCK_STREAM, 0);
int reuse = 1;
setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse));
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
sin.sin_port = htons (PORT);
bind (s, (struct sockaddr *) &sin, sizeof (sin));
listen (s, SOMAXCONN);

pthread_barrier_wait (&b);

int s2 = paccept (s, NULL, 0, NULL, 0);
if (s2 < 0)
{
puts ("paccept(0) failed");
return 1;
}

fl = fcntl (s2, F_GETFL);
if (fl & O_NONBLOCK)
{
puts ("paccept(0) set non-blocking mode");
return 1;
}
close (s2);
close (s);

pthread_barrier_wait (&b);

s = socket (AF_INET, SOCK_STREAM, 0);
sin.sin_port = htons (PORT);
setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse));
bind (s, (struct sockaddr *) &sin, sizeof (sin));
listen (s, SOMAXCONN);

pthread_barrier_wait (&b);

s2 = paccept (s, NULL, 0, NULL, SOCK_NONBLOCK);
if (s2 < 0)
{
puts ("paccept(SOCK_NONBLOCK) failed");
return 1;
}

fl = fcntl (s2, F_GETFL);
if ((fl & O_NONBLOCK) == 0)
{
puts ("paccept(SOCK_NONBLOCK) does not set non-blocking mode");
return 1;
}
close (s2);
close (s);

pthread_barrier_wait (&b);
puts ("OK");

return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper
Acked-by: Davide Libenzi
Cc: Michael Kerrisk
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2008-07-25 01:47:29 +0800