10 Jan, 2012
1 commit
-
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
igmp: Avoid zero delay when receiving odd mixture of IGMP queries
netdev: make net_device_ops const
bcm63xx: make ethtool_ops const
usbnet: make ethtool_ops const
net: Fix build with INET disabled.
net: introduce netif_addr_lock_nested() and call if when appropriate
net: correct lock name in dev_[uc/mc]_sync documentations.
net: sk_update_clone is only used in net/core/sock.c
8139cp: fix missing napi_gro_flush.
pktgen: set correct max and min in pktgen_setup_inject()
smsc911x: Unconditionally include linux/smscphy.h in smsc911x.h
asix: fix infinite loop in rx_fixup()
net: Default UDP and UNIX diag to 'n'.
r6040: fix typo in use of MCR0 register bits
net: fix sock_clone reference mismatch with tcp memcontrol
09 Jan, 2012
1 commit
-
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
reiserfs: Properly display mount options in /proc/mounts
vfs: prevent remount read-only if pending removes
vfs: count unlinked inodes
vfs: protect remounting superblock read-only
vfs: keep list of mounts for each superblock
vfs: switch ->show_options() to struct dentry *
vfs: switch ->show_path() to struct dentry *
vfs: switch ->show_devname() to struct dentry *
vfs: switch ->show_stats to struct dentry *
switch security_path_chmod() to struct path *
vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
vfs: trim includes a bit
switch mnt_namespace ->root to struct mount
vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
vfs: opencode mntget() mnt_set_mountpoint()
vfs: spread struct mount - remaining argument of next_mnt()
vfs: move fsnotify junk to struct mount
vfs: move mnt_devname
vfs: move mnt_list to struct mount
vfs: switch pnode.h macros to struct mount *
...
08 Jan, 2012
1 commit
-
Signed-off-by: David S. Miller
04 Jan, 2012
1 commit
-
Signed-off-by: Al Viro
31 Dec, 2011
3 commits
-
While it's not too late fix the recently added RQLEN diag extension
to report rqlen and wqlen in the same way as TCP does.I.e. for listening sockets the ack backlog length (which is the input
queue length for socket) in rqlen and the max ack backlog length in
wqlen, and what the CINQ/OUTQ ioctls do for established.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Currently tcp diag reports rqlen and wqlen values similar to how
the CINQ/COUTQ iotcls do. To make unix diag report these values
in the same way move the respective code into helpers.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
[ Fix indentation of sock_diag*() calls. -DaveM ]
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller
27 Dec, 2011
2 commits
-
Otherwise we leave uninitialized kernel memory in there.
Reported-by: Eric Dumazet
Signed-off-by: David S. Miller -
The NLA_PUT macro should accept the actual attribute length, not
the amount of elements in array :(Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller
21 Dec, 2011
1 commit
-
Otherwise getting
| net/unix/diag.c:312:16: error: expected declaration specifiers or ‘...’ before string constant
| net/unix/diag.c:313:1: error: expected declaration specifiers or ‘...’ before string constantSigned-off-by: Cyrill Gorcunov
Signed-off-by: David S. Miller
17 Dec, 2011
10 commits
-
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
When establishing a unix connection on stream sockets the
server end receives an skb with socket in its receive queue.Report who is waiting for these ends to be accepted for
listening sockets via NLA.There's a lokcing issue with this -- the unix sk state lock is
required to access the peer, and it is taken under the listening
sk's queue lock. Strictly speaking the queue lock should be taken
inside the state lock, but since in this case these two sockets
are different it shouldn't lead to deadlock.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Report the peer socket inode ID as NLA. With this it's finally
possible to find out the other end of an interesting unix connection.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Actually, the socket path if it's not anonymous doesn't give
a clue to which file the socket is bound to. Even if the path
is absolute, it can be unlinked and then new socket can be
bound to it.With this NLA it's possible to check which file a particular
socket is really bound to.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Report the sun_path when requested as NLA. With leading '\0' if
present but without the leading AF_UNIX bits.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
The socket inode is used as a key for lookup. This is effectively
the only really unique ID of a unix socket, but using this for
search currently has one problem -- it is O(number of sockets) :(Does it worth fixing this lookup or inventing some other ID for
unix sockets?Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Walk the unix sockets table and fill the core response structure,
which includes type, state and inode.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Includes basic module_init/_exit functionality, dump/get_exact stubs
and declares the basic API structures for request and response.Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller -
Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller
27 Nov, 2011
1 commit
-
poll() call may be blocked by concurrent reading from the same stream
socket.Signed-off-by: Alexey Moiseytsev
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller
29 Sep, 2011
1 commit
-
Since commit 7361c36c5224 (af_unix: Allow credentials to work across
user and pid namespaces) af_unix performance dropped a lot.This is because we now take a reference on pid and cred in each write(),
and release them in read(), usually done from another process,
eventually from another cpu. This triggers false sharing.# Events: 154K cycles
#
# Overhead Command Shared Object Symbol
# ........ ....... .................. .........................
#
10.40% hackbench [kernel.kallsyms] [k] put_pid
8.60% hackbench [kernel.kallsyms] [k] unix_stream_recvmsg
7.87% hackbench [kernel.kallsyms] [k] unix_stream_sendmsg
6.11% hackbench [kernel.kallsyms] [k] do_raw_spin_lock
4.95% hackbench [kernel.kallsyms] [k] unix_scm_to_skb
4.87% hackbench [kernel.kallsyms] [k] pid_nr_ns
4.34% hackbench [kernel.kallsyms] [k] cred_to_ucred
2.39% hackbench [kernel.kallsyms] [k] unix_destruct_scm
2.24% hackbench [kernel.kallsyms] [k] sub_preempt_count
1.75% hackbench [kernel.kallsyms] [k] fget_light
1.51% hackbench [kernel.kallsyms] [k]
__mutex_lock_interruptible_slowpath
1.42% hackbench [kernel.kallsyms] [k] sock_alloc_send_pskbThis patch includes SCM_CREDENTIALS information in a af_unix message/skb
only if requested by the sender, [man 7 unix for details how to include
ancillary data using sendmsg() system call]Note: This might break buggy applications that expected SCM_CREDENTIAL
from an unaware write() system call, and receiver not using SO_PASSCRED
socket option.If SOCK_PASSCRED is set on source or destination socket, we still
include credentials for mere write() syscalls.Performance boost in hackbench : more than 50% gain on a 16 thread
machine (2 quad-core cpus, 2 threads per core)hackbench 20 thread 2000
4.228 sec instead of 9.102 sec
Signed-off-by: Eric Dumazet
Acked-by: Tim Chen
Signed-off-by: David S. Miller
17 Sep, 2011
1 commit
-
This reverts commit 0856a304091b33a8e8f9f9c98e776f425af2b625.
As requested by Eric Dumazet, it has various ref-counting
problems and has introduced regressions. Eric will add
a more suitable version of this performance fix.Signed-off-by: David S. Miller
25 Aug, 2011
1 commit
-
Patch series 109f6e39..7361c36c back in 2.6.36 added functionality to
allow credentials to work across pid namespaces for packets sent via
UNIX sockets. However, the atomic reference counts on pid and
credentials caused plenty of cache bouncing when there are numerous
threads of the same pid sharing a UNIX socket. This patch mitigates the
problem by eliminating extraneous reference counts on pid and
credentials on both send and receive path of UNIX sockets. I found a 2x
improvement in hackbench's threaded case.On the receive path in unix_dgram_recvmsg, currently there is an
increment of reference count on pid and credentials in scm_set_cred.
Then there are two decrement of the reference counts. Once in scm_recv
and once when skb_free_datagram call skb->destructor function
unix_destruct_scm. One pair of increment and decrement of ref count on
pid and credentials can be eliminated from the receive path. Until we
destroy the skb, we already set a reference when we created the skb on
the send side.On the send path, there are two increments of ref count on pid and
credentials, once in scm_send and once in unix_scm_to_skb. Then there
is a decrement of the reference counts in scm_destroy's call to
scm_destroy_cred at the end of unix_dgram_sendmsg functions. One pair
of increment and decrement of the reference counts can be removed so we
only need to increment the ref counts once.By incorporating these changes, for hackbench running on a 4 socket
NHM-EX machine with 40 cores, the execution of hackbench on
50 groups of 20 threads sped up by factor of 2.Hackbench command used for testing:
./hackbench 50 thread 2000Signed-off-by: Tim Chen
Signed-off-by: David S. Miller
20 Jul, 2011
1 commit
-
combination of kern_path_parent() and lookup_create(). Does *not*
expose struct nameidata to caller. Syscalls converted to that...Signed-off-by: Al Viro
24 May, 2011
1 commit
-
The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces. Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers. The behavior of %pK depends on the kptr_restrict sysctl.If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs. If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
(currently in the LSM tree), kernel pointers using %pK are printed as 0's.
If kptr_restrict is set to 2, kernel pointers using %pK are printed as
0's regardless of privileges. Replacing with 0's was chosen over the
default "(null)", which cannot be parsed by userland %p, which expects
"(nil)".The supporting code for kptr_restrict and %pK are currently in the -mm
tree. This patch converts users of %p in net/ to %pK. Cases of printing
pointers to the syslog are not covered, since this would eliminate useful
information for postmortem debugging and the reading of the syslog is
already optionally protected by the dmesg_restrict sysctl.Signed-off-by: Dan Rosenberg
Cc: James Morris
Cc: Eric Dumazet
Cc: Thomas Graf
Cc: Eugene Teo
Cc: Kees Cook
Cc: Ingo Molnar
Cc: David S. Miller
Cc: Peter Zijlstra
Cc: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller
02 May, 2011
1 commit
-
This fixes the following oops discovered by Dan Aloni:
> Anyway, the following is the output of the Oops that I got on the
> Ubuntu kernel on which I first detected the problem
> (2.6.37-12-generic). The Oops that followed will be more useful, I
> guess.>[ 5594.669852] BUG: unable to handle kernel NULL pointer dereference
> at (null)
> [ 5594.681606] IP: [] unix_dgram_recvmsg+0x1fb/0x420
> [ 5594.687576] PGD 2a05d067 PUD 2b951067 PMD 0
> [ 5594.693720] Oops: 0002 [#1] SMP
> [ 5594.699888] last sysfs file:The bug was that unix domain sockets use a pseduo packet for
connecting and accept uses that psudo packet to get the socket.
In the buggy seqpacket case we were allowing unconnected
sockets to call recvmsg and try to receive the pseudo packet.That is always wrong and as of commit 7361c36c5 the pseudo
packet had become enough different from a normal packet
that the kernel started oopsing.Do for seqpacket_recv what was done for seqpacket_send in 2.5
and only allow it on connected seqpacket sockets.Cc: stable@kernel.org
Tested-by: Dan Aloni
Signed-off-by: Eric W. Biederman
Signed-off-by: David S. Miller
31 Mar, 2011
1 commit
-
Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: Lucas De Marchi
17 Mar, 2011
1 commit
-
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1480 commits)
bonding: enable netpoll without checking link status
xfrm: Refcount destination entry on xfrm_lookup
net: introduce rx_handler results and logic around that
bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flag
bonding: wrap slave state work
net: get rid of multiple bond-related netdevice->priv_flags
bonding: register slave pointer for rx_handler
be2net: Bump up the version number
be2net: Copyright notice change. Update to Emulex instead of ServerEngines
e1000e: fix kconfig for crc32 dependency
netfilter ebtables: fix xt_AUDIT to work with ebtables
xen network backend driver
bonding: Improve syslog message at device creation time
bonding: Call netif_carrier_off after register_netdevice
bonding: Incorrect TX queue offset
net_sched: fix ip_tos2prio
xfrm: fix __xfrm_route_forward()
be2net: Fix UDP packet detected status in RX compl
Phonet: fix aligned-mode pipe socket buffer header reserve
netxen: support for GbE port settings
...Fix up conflicts in drivers/staging/brcm80211/brcmsmac/wl_mac80211.c
with the staging updates.
16 Mar, 2011
2 commits
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (57 commits)
tidy the trailing symlinks traversal up
Turn resolution of trailing symlinks iterative everywhere
simplify link_path_walk() tail
Make trailing symlink resolution in path_lookupat() iterative
update nd->inode in __do_follow_link() instead of after do_follow_link()
pull handling of one pathname component into a helper
fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH
Allow passing O_PATH descriptors via SCM_RIGHTS datagrams
readlinkat(), fchownat() and fstatat() with empty relative pathnames
Allow O_PATH for symlinks
New kind of open files - "location only".
ext4: Copy fs UUID to superblock
ext3: Copy fs UUID to superblock.
vfs: Export file system uuid via /proc//mountinfo
unistd.h: Add new syscalls numbers to asm-generic
x86: Add new syscalls for x86_64
x86: Add new syscalls for x86_32
fs: Remove i_nlink check from file system link callback
fs: Don't allow to create hardlink for deleted file
vfs: Add open by file handle support
...
15 Mar, 2011
2 commits
-
Just need to make sure that AF_UNIX garbage collector won't
confuse O_PATHed socket on filesystem for real AF_UNIX opened
socket.Signed-off-by: Al Viro
-
We latch our state using a spinlock not a r/w kind of lock.
Signed-off-by: Daniel Baluta
Signed-off-by: David S. Miller
14 Mar, 2011
1 commit
-
all remaining callers pass LOOKUP_PARENT to it, so
flags argument can die; renamed to kern_path_parent()Signed-off-by: Al Viro
11 Mar, 2011
1 commit
-
Conflicts:
drivers/net/bnx2x/bnx2x_cmn.c
08 Mar, 2011
2 commits
-
Signed-off-by: Hagen Paul Pfeifer
Signed-off-by: David S. Miller -
The unix_dgram_recvmsg and unix_stream_recvmsg routines in
net/af_unix.c utilize mutex_lock(&u->readlock) calls in order to
serialize read operations of multiple threads on a single socket. This
implies that, if all n threads of a process block in an AF_UNIX recv
call trying to read data from the same socket, one of these threads
will be sleeping in state TASK_INTERRUPTIBLE and all others in state
TASK_UNINTERRUPTIBLE. Provided that a particular signal is supposed to
be handled by a signal handler defined by the process and that none of
this threads is blocking the signal, the complete_signal routine in
kernel/signal.c will select the 'first' such thread it happens to
encounter when deciding which thread to notify that a signal is
supposed to be handled and if this is one of the TASK_UNINTERRUPTIBLE
threads, the signal won't be handled until the one thread not blocking
on the u->readlock mutex is woken up because some data to process has
arrived (if this ever happens). The included patch fixes this by
changing mutex_lock to mutex_lock_interruptible and handling possible
error returns in the same way interruptions are handled by the actual
receive-code.Signed-off-by: Rainer Weikusat
Signed-off-by: David S. Miller
23 Feb, 2011
1 commit
-
Add proper RCU annotations/verbs to sk_wq and wq members
Fix __sctp_write_space() sk_sleep() abuse (and sock->wq access)
Fix sunrpc sk_sleep() abuse too
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
20 Jan, 2011
1 commit
-
Signed-off-by: Alban Crequy
Reviewed-by: Ian Molton
Signed-off-by: David S. Miller
19 Jan, 2011
1 commit
-
Linux Socket Filters can already be successfully attached and detached on unix
sockets with setsockopt(sockfd, SOL_SOCKET, SO_{ATTACH,DETACH}_FILTER, ...).
See: Documentation/networking/filter.txtBut the filter was never used in the unix socket code so it did not work. This
patch uses sk_filter() to filter buffers before delivery.This short program demonstrates the problem on SOCK_DGRAM.
int main(void) {
int i, j, ret;
int sv[2];
struct pollfd fds[2];
char *message = "Hello world!";
char buffer[64];
struct sock_filter ins[32] = {{0,},};
struct sock_fprog filter;socketpair(AF_UNIX, SOCK_DGRAM, 0, sv);
for (i = 0 ; i < 2 ; i++) {
fds[i].fd = sv[i];
fds[i].events = POLLIN;
fds[i].revents = 0;
}for(j = 1 ; j < 13 ; j++) {
/* Set a socket filter to truncate the message */
memset(ins, 0, sizeof(ins));
ins[0].code = BPF_RET|BPF_K;
ins[0].k = j;
filter.len = 1;
filter.filter = ins;
setsockopt(sv[1], SOL_SOCKET, SO_ATTACH_FILTER, &filter, sizeof(filter));/* send a message */
send(sv[0], message, strlen(message) + 1, 0);/* The filter should let the message pass but truncated. */
poll(fds, 2, 0);/* Receive the truncated message*/
ret = recv(sv[1], buffer, 64, 0);
printf("received %d bytes, expected %d\n", ret, j);
}for (i = 0 ; i < 2 ; i++)
close(sv[i]);return 0;
}Signed-off-by: Alban Crequy
Reviewed-by: Ian Molton
Signed-off-by: David S. Miller