11 Oct, 2007
4 commits
-
If kernel_accept() returns an error, it may pass back a pointer to
freed memory (which the caller should ignore). Make it pass back NULL
instead for better safety.Signed-off-by: Tony Battersby
Signed-off-by: David S. Miller -
Fix a bunch of sparse warnings. Mostly about 0 used as
NULL pointer, and shadowed variable declarations.
One notable case was that hash size should have been unsigned.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller -
This patch makes most of the generic device layer network
namespace safe. This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables. The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctlwere modified to take a network namespace argument, and
deal with it.vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces. The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace. This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.For now the ifindex generator is left global.
Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.At the same time there are assumptions in the network stack
that the ifindex of a network device won't change. Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller -
This patch passes in the namespace a new socket should be created in
and has the socket code do the appropriate reference counting. By
virtue of this all socket create methods are touched. In addition
the socket create methods are modified so that they will fail if
you attempt to create a socket in a non-default network namespace.Failing if we attempt to create a socket outside of the default
network namespace ensures that as we incrementally make the network stack
network namespace aware we will not export functionality that someone
has not audited and made certain is network namespace safe.
Allowing us to partially enable network namespaces before all of the
exotic protocols are supported.Any protocol layers I have missed will fail to compile because I now
pass an extra parameter into the socket creation code.[ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ]
Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller
28 Sep, 2007
1 commit
-
This fixes kernel bugzilla #5731
It should generate an empty packet for datagram protocols when the
socket is connected, for one.The check is doubly-wrong because all that a write() can be is a
sendmsg() call with a NULL msg_control and a single entry iovec. No
special semantics should be assigned to it, therefore the zero length
check should be removed entirely.This matches the behavior of BSD and several other systems.
Alan Cox notes that SuSv3 says the behavior of a zero length write on
non-files is "unspecified", but that's kind of useless since BSD has
defined this behavior for a quarter century and BSD is essentially
what application folks code to.Based upon a patch from Stephen Hemminger.
Signed-off-by: David S. Miller
16 Aug, 2007
1 commit
-
The recent RCU work created an unbalanced rcu_read_unlock
in __sock_create. This patch fixes that. Reported by
oleg 123.Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller
20 Jul, 2007
1 commit
-
Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).Signed-off-by: Paul Mundt
17 Jul, 2007
1 commit
-
Part two in the O_CLOEXEC saga: adding support for file descriptors received
through Unix domain sockets.The patch is once again pretty minimal, it introduces a new flag for recvmsg
and passes it just like the existing MSG_CMSG_COMPAT flag. I think this bit
is not used otherwise but the networking people will know better.This new flag is not recognized by recvfrom and recv. These functions cannot
be used for that purpose and the asymmetry this introduces is not worse than
the already existing MSG_CMSG_COMPAT situations.The patch must be applied on the patch which introduced O_CLOEXEC. It has to
remove static from the new get_unused_fd_flags function but since scm.c cannot
live in a module the function still hasn't to be exported.Here's a test program to make sure the code works. It's so much longer than
the actual patch...#include
#include
#include
#include
#include
#include
#include
#include#ifndef O_CLOEXEC
# define O_CLOEXEC 02000000
#endif
#ifndef MSG_CMSG_CLOEXEC
# define MSG_CMSG_CLOEXEC 0x40000000
#endifint
main (int argc, char *argv[])
{
if (argc > 1)
{
int fd = atol (argv[1]);
printf ("child: fd = %d\n", fd);
if (fcntl (fd, F_GETFD) == 0 || errno != EBADF)
{
puts ("file descriptor valid in child");
return 1;
}
return 0;}
struct sockaddr_un sun;
strcpy (sun.sun_path, "./testsocket");
sun.sun_family = AF_UNIX;char databuf[] = "hello";
struct iovec iov[1];
iov[0].iov_base = databuf;
iov[0].iov_len = sizeof (databuf);union
{
struct cmsghdr hdr;
char bytes[CMSG_SPACE (sizeof (int))];
} buf;
struct msghdr msg = { .msg_iov = iov, .msg_iovlen = 1,
.msg_control = buf.bytes,
.msg_controllen = sizeof (buf) };
struct cmsghdr *cmsg = CMSG_FIRSTHDR (&msg);cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_RIGHTS;
cmsg->cmsg_len = CMSG_LEN (sizeof (int));msg.msg_controllen = cmsg->cmsg_len;
pid_t child = fork ();
if (child == -1)
error (1, errno, "fork");
if (child == 0)
{
int sock = socket (PF_UNIX, SOCK_STREAM, 0);
if (sock < 0)
error (1, errno, "socket");if (bind (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
error (1, errno, "bind");
if (listen (sock, SOMAXCONN) < 0)
error (1, errno, "listen");int conn = accept (sock, NULL, NULL);
if (conn == -1)
error (1, errno, "accept");*(int *) CMSG_DATA (cmsg) = sock;
if (sendmsg (conn, &msg, MSG_NOSIGNAL) < 0)
error (1, errno, "sendmsg");return 0;
}/* For a test suite this should be more robust like a
barrier in shared memory. */
sleep (1);int sock = socket (PF_UNIX, SOCK_STREAM, 0);
if (sock < 0)
error (1, errno, "socket");if (connect (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
error (1, errno, "connect");
unlink (sun.sun_path);*(int *) CMSG_DATA (cmsg) = -1;
if (recvmsg (sock, &msg, MSG_CMSG_CLOEXEC) < 0)
error (1, errno, "recvmsg");int fd = *(int *) CMSG_DATA (cmsg);
if (fd == -1)
error (1, 0, "no descriptor received");char fdname[20];
snprintf (fdname, sizeof (fdname), "%d", fd);
execl ("/proc/self/exe", argv[0], fdname, NULL);
puts ("execl failed");
return 1;
}[akpm@linux-foundation.org: Fix fastcall inconsistency noted by Michael Buesch]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Ulrich Drepper
Cc: Ingo Molnar
Cc: Michael Buesch
Cc: Michael Kerrisk
Acked-by: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 May, 2007
1 commit
-
SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
Signed-off-by: Christoph Lameter
Cc: David Howells
Cc: Jens Axboe
Cc: Steven French
Cc: Michael Halcrow
Cc: OGAWA Hirofumi
Cc: Miklos Szeredi
Cc: Steven Whitehouse
Cc: Roman Zippel
Cc: David Woodhouse
Cc: Dave Kleikamp
Cc: Trond Myklebust
Cc: "J. Bruce Fields"
Cc: Anton Altaparmakov
Cc: Mark Fasheh
Cc: Paul Mackerras
Cc: Christoph Hellwig
Cc: Jan Kara
Cc: David Chinner
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 May, 2007
1 commit
-
1) Introduces a new method in 'struct dentry_operations'. This method
called d_dname() might be called from d_path() to build a pathname for
special filesystems. It is called without locks.Future patches (if we succeed in having one common dentry for all
pipes/sockets) may need to change prototype of this method, but we now
use : char *d_dname(struct dentry *dentry, char *buffer, int buflen);2) Adds a dynamic_dname() helper function that eases d_dname() implementations
3) Defines d_dname method for sockets : No more sprintf() at socket
creation. This is delayed up to the moment someone does an access to
/proc/pid/fd/...4) Defines d_dname method for pipes : No more sprintf() at pipe
creation. This is delayed up to the moment someone does an access to
/proc/pid/fd/...A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a
*nice* speedup on my Pentium(M) 1.6 Ghz :3.090 s instead of 3.450 s
Signed-off-by: Eric Dumazet
Acked-by: Christoph Hellwig
Acked-by: Linus Torvalds
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 May, 2007
1 commit
-
I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Apr, 2007
3 commits
-
Kernel: arch/x86_64/boot/bzImage is ready (#2)
MODPOST 1816 modules
WARNING: "__sock_recv_timestamp" [net/sctp/sctp.ko] undefined!
WARNING: "__sock_recv_timestamp" [net/packet/af_packet.ko] undefined!
WARNING: "__sock_recv_timestamp" [net/key/af_key.ko] undefined!
WARNING: "__sock_recv_timestamp" [net/ipv6/ipv6.ko] undefined!
WARNING: "__sock_recv_timestamp" [net/atm/atm.ko] undefined!
make[2]: *** [__modpost] Error 1
make[1]: *** [modules] Error 2
make: *** [_all] Error 2Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: David S. Miller -
Now that network timestamps use ktime_t infrastructure, we can add a new
SOL_SOCKET sockopt SO_TIMESTAMPNS.This command is similar to SO_TIMESTAMP, but permits transmission of
a 'timespec struct' instead of a 'timeval struct' control message.
(nanosecond resolution instead of microsecond)Control message is labelled SCM_TIMESTAMPNS instead of SCM_TIMESTAMP
A socket cannot mix SO_TIMESTAMP and SO_TIMESTAMPNS : the two modes are
mutually exclusive.sock_recv_timestamp() became too big to be fully inlined so I added a
__sock_recv_timestamp() helper function.Signed-off-by: Eric Dumazet
CC: linux-arch@vger.kernel.org
Signed-off-by: David S. Miller -
Fix whitespace around keywords. Fix indentation especially of switch
statements.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller
27 Mar, 2007
1 commit
-
* d_alloc() in sock_attach_fd() fails leaving ->f_dentry of new file NULL
* bail out to out_fd label, doing fput()/__fput() on new file
* but __fput() assumes valid ->f_dentry and dereferences itSigned-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller
18 Feb, 2007
1 commit
-
Provide an audit record of the descriptor pair returned by pipe() and
socketpair(). Rewritten from the original posted to linux-audit by
John D. RamsdellSigned-off-by: Al Viro
13 Feb, 2007
1 commit
-
Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Feb, 2007
1 commit
-
Signed-off-by: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller
09 Feb, 2007
2 commits
-
GCC (correctly) says:
net/socket.c: In function ‘sys_sendto’:
net/socket.c:1510: warning: ‘err’ may be used uninitialized in this function
net/socket.c: In function ‘sys_recvfrom’:
net/socket.c:1571: warning: ‘err’ may be used uninitialized in this functionsock_from_file() either returns filp->private_data or it
sets *err and returns NULL.Callers return "err" on NULL, but filp->private_data could
be NULL.Some minor rearrangements of error handling in sys_sendto
and sys_recvfrom solves the issue.Signed-off-by: David S. Miller
-
I believe dead code from sock_from_file() can be cleaned up.
All sockets are now built using sock_attach_fd(), that puts the 'sock' pointer
into file->private_data and &socket_file_ops into file->f_opI could not find a place where file->private_data could be set to NULL,
keeping opened the file.So to get 'sock' from a 'file' pointer, either :
- This is a socket file (f_op == &socket_file_ops), and we can directly get
'sock' from private_data.
- This is not a socket, we return -ENOTSOCK and dont even try to find a socket
via dentry/inode :)Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
09 Dec, 2006
1 commit
-
Signed-off-by: Josef Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Dec, 2006
3 commits
-
We currently insert socket dentries into the global dentry hashtable. This
is suboptimal because there is currently no way these entries can be used
for a lookup(). (/proc/xxx/fd/xxx uses a different mechanism). Inserting
them in dentry hashtable slows dcache lookups.To let __dpath() still work correctly (ie not adding a " (deleted)") after
dentry name, we do :- Right after d_alloc(), pretend they are hashed by clearing the
DCACHE_UNHASHED bit.- Call d_instantiate() instead of d_add() : dentry is not inserted in
hash table.__dpath() & friends work as intended during dentry lifetime.
- At dismantle time, once dput() must clear the dentry, setting again
DCACHE_UNHASHED bit inside the custom d_delete() function provided by
socket code, so that dput() can just kill_it.Signed-off-by: Eric Dumazet
Cc: Al Viro
Acked-by: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Replace all uses of kmem_cache_t with struct kmem_cache.
The patch was generated using the following script:
#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#set -e
for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
doneThe script was run like this
sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
SLAB_KERNEL is an alias of GFP_KERNEL.
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Dec, 2006
1 commit
-
This patch contains the scheduled removal of the frame diverter.
Signed-off-by: Adrian Bunk
Signed-off-by: David S. Miller
02 Oct, 2006
1 commit
-
File handles can be requested to send sigio and sigurg to processes. By
tracking the destination processes using struct pid instead of pid_t we make
the interface safe from all potential pid wrap around problems.Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Oct, 2006
2 commits
-
This patch removes readv() and writev() methods and replaces them with
aio_read()/aio_write() methods.Signed-off-by: Badari Pulavarty
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().Signed-off-by: Badari Pulavarty
Signed-off-by: Christoph Hellwig
Cc: Michael Holzheu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Sep, 2006
7 commits
-
Signed-off-by: Brian Haley
Signed-off-by: David S. Miller -
No need to set ei->socket.flags to zero twice.
Signed-off-by: David S. Miller
-
The sock_register() doesn't change the family, so the protocols can
define it read-only. No caller ever checks return value from
sock_unregister()Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller -
Replace the gross custom locking done in socket code for net_family[]
with simple RCU usage. Some reordering necessary to avoid sleep issues
with sock_alloc.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller -
Make socket.c conform to current style:
* run through Lindent
* get rid of unneeded casts
* split assignment and comparsion where possibleSigned-off-by: Stephen Hemminger
Signed-off-by: David S. Miller -
This patch implements wrapper functions that provide a convenient way
to access the sockets API for in-kernel users like sunrpc, cifs &
ocfs2 etc and any future users.Signed-off-by: Sridhar Samudrala
Acked-by: James Morris
Signed-off-by: David S. Miller -
Add NetLabel support to the SELinux LSM and modify the
socket_post_create() LSM hook to return an error code. The most
significant part of this patch is the addition of NetLabel hooks into
the following SELinux LSM hooks:* selinux_file_permission()
* selinux_socket_sendmsg()
* selinux_socket_post_create()
* selinux_socket_sock_rcv_skb()
* selinux_socket_getpeersec_stream()
* selinux_socket_getpeersec_dgram()
* selinux_sock_graft()
* selinux_inet_conn_request()The basic reasoning behind this patch is that outgoing packets are
"NetLabel'd" by labeling their socket and the NetLabel security
attributes are checked via the additional hook in
selinux_socket_sock_rcv_skb(). NetLabel itself is only a labeling
mechanism, similar to filesystem extended attributes, it is up to the
SELinux enforcement mechanism to perform the actual access checks.In addition to the changes outlined above this patch also includes
some changes to the extended bitmap (ebitmap) and multi-level security
(mls) code to import and export SELinux TE/MLS attributes into and out
of NetLabel.Signed-off-by: Paul Moore
Signed-off-by: David S. Miller
01 Sep, 2006
1 commit
-
This patch limits the warning messages when socket allocation failures
happen. It happens under memory pressure.Signed-off-by: Akinobu Mita
Signed-off-by: David S. Miller
01 Jul, 2006
1 commit
-
Signed-off-by: Jörn Engel
Signed-off-by: Adrian Bunk
23 Jun, 2006
1 commit
-
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.The filesystem is then required to manually set the superblock and root dentry
pointers. For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing. In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.The patch also makes the following changes:
(*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
pointer argument and return an integer, so most filesystems have to change
very little.(*) If one of the convenience function is not used, then get_sb() should
normally call simple_set_mnt() to instantiate the vfsmount. This will
always return 0, and so can be tail-called from get_sb().(*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
dcache upon superblock destruction rather than shrink_dcache_anon().This is required because the superblock may now have multiple trees that
aren't actually bound to s_root, but that still need to be cleaned up. The
currently called functions assume that the whole tree is rooted at s_root,
and that anonymous dentries are not the roots of trees which results in
dentries being left unculled.However, with the way NFS superblock sharing are currently set to be
implemented, these assumptions are violated: the root of the filesystem is
simply a dummy dentry and inode (the real inode for '/' may well be
inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
with child trees.[*] Anonymous until discovered from another tree.
(*) The documentation has been adjusted, including the additional bit of
changing ext2_* into foo_* in the documentation.[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Nathan Scott
Cc: Roland Dreier
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 May, 2006
1 commit
-
On Thursday 23 March 2006 09:08, John D. Ramsdell wrote:
> I noticed that a socketcall(bind) and socketcall(connect) event contain a
> record of type=SOCKADDR, but I cannot see one for a system call event
> associated with socketcall(accept). Recording the sockaddr of an accepted
> socket is important for cross platform information flow analysThanks for pointing this out. The following patch should address this.
Signed-off-by: Steve Grubb
Signed-off-by: Al Viro
20 Apr, 2006
1 commit
-
This applies to 2.6.17-rc2.
There is a missing initialization of err in sockfd_lookup_light() that
could return random error for an invalid file handle.Signed-off-by: Hua Zhong
Signed-off-by: David S. Miller