Doug / smarc-fsl-linux-kernel | Embedian Git Server

28 Feb, 2013

1 commit

b67bfe0d4 hlist: drop the node parameter from iterators ... Browse Code »

I'm not sure why, but the hlist for each entry iterators were conceived

list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin
Acked-by: Paul E. McKenney
Signed-off-by: Sasha Levin
Cc: Wu Fengguang
Cc: Marcelo Tosatti
Cc: Gleb Natapov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sasha Levin
2013-02-28 11:10:24 +0800

27 Feb, 2013

1 commit

d895cb1af Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs pile (part one) from Al Viro:
"Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
locking violations, etc.

The most visible changes here are death of FS_REVAL_DOT (replaced with
"has ->d_weak_revalidate()") and a new helper getting from struct file
to inode. Some bits of preparation to xattr method interface changes.

Misc patches by various people sent this cycle *and* ocfs2 fixes from
several cycles ago that should've been upstream right then.

PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
saner proc_get_inode() calling conventions
proc: avoid extra pde_put() in proc_fill_super()
fs: change return values from -EACCES to -EPERM
fs/exec.c: make bprm_mm_init() static
ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
ocfs2: fix possible use-after-free with AIO
ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
target: writev() on single-element vector is pointless
export kernel_write(), convert open-coded instances
fs: encode_fh: return FILEID_INVALID if invalid fid_type
kill f_vfsmnt
vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
nfsd: handle vfs_getattr errors in acl protocol
switch vfs_getattr() to struct path
default SET_PERSONALITY() in linux/elf.h
ceph: prepopulate inodes only when request is aborted
d_hash_and_lookup(): export, switch open-coded instances
9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
9p: split dropping the acls from v9fs_set_create_acl()
...

Linus Torvalds
2013-02-27 12:16:07 +0800

23 Feb, 2013

1 commit

496ad9aa8 new helper: file_inode(file) ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-02-23 12:31:31 +0800

19 Feb, 2013

2 commits

ece31ffd5 net: proc: change proc_net_remove to remove_proc_entry ... Browse Code »

proc_net_remove is only used to remove proc entries
that under /proc/net,it's not a general function for
removing proc entries of netns. if we want to remove
some proc entries which under /proc/net/stat/, we still
need to call remove_proc_entry.

this patch use remove_proc_entry to replace proc_net_remove.
we can remove proc_net_remove after this patch.

Signed-off-by: Gao feng
Signed-off-by: David S. Miller

Gao feng
2013-02-19 03:53:08 +0800
d4beaa66a net: proc: change proc_net_fops_create to proc_create ... Browse Code »

Right now, some modules such as bonding use proc_create
to create proc entries under /proc/net/, and other modules
such as ipv4 use proc_net_fops_create.

It looks a little chaos.this patch changes all of
proc_net_fops_create to proc_create. we can remove
proc_net_fops_create after this patch.

Signed-off-by: Gao feng
Signed-off-by: David S. Miller

Gao feng
2013-02-19 03:53:08 +0800

10 Jan, 2013

1 commit

b4fff5f8b unix: Use FIELD_SIZEOF() in af_unix_init(). ... Browse Code »

Signed-off-by: YOSHIFUJI Hideaki
Signed-off-by: David S. Miller

YOSHIFUJI Hideaki / 吉藤英明
2013-01-10 15:38:24 +0800

19 Nov, 2012

1 commit

464dc801c net: Don't export sysctls to unprivileged users ... Browse Code »

In preparation for supporting the creation of network namespaces
by unprivileged users, modify all of the per net sysctl exports
and refuse to allow them to unprivileged users.

This makes it safe for unprivileged users in general to access
per net sysctls, and allows sysctls to be exported to unprivileged
users on an individual basis as they are deemed safe.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
2012-11-19 09:30:55 +0800

24 Oct, 2012

1 commit

e4e541a84 sock-diag: Report shutdown for inet and unix sockets (v2) ... Browse Code »

Make it simple -- just put new nlattr with just sk->sk_shutdown bits.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2012-10-24 02:57:52 +0800

18 Sep, 2012

1 commit

e04dae840 af_unix: old_cred is surplus ... Browse Code »

Signed-off-by: Alan Cox
Signed-off-by: David S. Miller

Alan Cox
2012-09-18 01:00:13 +0800

11 Sep, 2012

1 commit

15e473046 netlink: Rename pid to portid to avoid confusion ... Browse Code »

It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.

I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.

I have successfully built an allyesconfig kernel with this change.

Signed-off-by: "Eric W. Biederman"
Acked-by: Stephen Hemminger
Signed-off-by: David S. Miller

Eric W. Biederman
2012-09-11 03:30:41 +0800

01 Sep, 2012

1 commit

fc61b928d af_unix: fix shutdown parameter checking ... Browse Code »

Return -EINVAL rather than 0 given an invalid "mode" parameter.

Signed-off-by: Xi Wang
Signed-off-by: David S. Miller

Xi Wang
2012-09-01 03:55:37 +0800

22 Aug, 2012

1 commit

e0e3cea46 af_netlink: force credentials passing [CVE-2012-3520] ... Browse Code »

Pablo Neira Ayuso discovered that avahi and
potentially NetworkManager accept spoofed Netlink messages because of a
kernel bug. The kernel passes all-zero SCM_CREDENTIALS ancillary data
to the receiver if the sender did not provide such data, instead of not
including any such data at all or including the correct data from the
peer (as it is the case with AF_UNIX).

This bug was introduced in commit 16e572626961
(af_unix: dont send SCM_CREDENTIALS by default)

This patch forces passing credentials for netlink, as
before the regression.

Another fix would be to not add SCM_CREDENTIALS in
netlink messages if not provided by the sender, but it
might break some programs.

With help from Florian Weimer & Petr Matousek

This issue is designated as CVE-2012-3520

Signed-off-by: Eric Dumazet
Cc: Petr Matousek
Cc: Florian Weimer
Cc: Pablo Neira Ayuso
Signed-off-by: David S. Miller

Eric Dumazet
2012-08-22 05:53:01 +0800

02 Aug, 2012

1 commit

a0e881b7c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull second vfs pile from Al Viro:
"The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
deadlock reproduced by xfstests 068), symlink and hardlink restriction
patches, plus assorted cleanups and fixes.

Note that another fsfreeze deadlock (emergency thaw one) is *not*
dealt with - the series by Fernando conflicts a lot with Jan's, breaks
userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
for massive vfsmount leak; this is going to be handled next cycle.
There probably will be another pull request, but that stuff won't be
in it."

Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
delousing target_core_file a bit
Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
fs: Remove old freezing mechanism
ext2: Implement freezing
btrfs: Convert to new freezing mechanism
nilfs2: Convert to new freezing mechanism
ntfs: Convert to new freezing mechanism
fuse: Convert to new freezing mechanism
gfs2: Convert to new freezing mechanism
ocfs2: Convert to new freezing mechanism
xfs: Convert to new freezing code
ext4: Convert to new freezing mechanism
fs: Protect write paths by sb_start_write - sb_end_write
fs: Skip atime update on frozen filesystem
fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
fs: Improve filesystem freezing handling
switch the protection of percpu_counter list to spinlock
nfsd: Push mnt_want_write() outside of i_mutex
btrfs: Push mnt_want_write() outside of i_mutex
fat: Push mnt_want_write() outside of i_mutex
...

Linus Torvalds
2012-08-02 01:26:23 +0800

30 Jul, 2012

3 commits

faf020102 clean unix_bind() up a bit ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-07-30 01:24:15 +0800
a8104a9fc pull mnt_want_write()/mnt_drop_write() into kern_path_create()/done_path_create() resp. ... Browse Code »

One side effect - attempt to create a cross-device link on a read-only fs fails
with EROFS instead of EXDEV now. Makes more sense, POSIX allows, etc.

Signed-off-by: Al Viro

Al Viro
2012-07-30 01:24:15 +0800
921a1650d new helper: done_path_create() ... Browse Code »

releases what needs to be released after {kern,user}_path_create()

Signed-off-by: Al Viro

Al Viro
2012-07-30 01:24:13 +0800

17 Jul, 2012

1 commit

51d7cccf0 net: make sock diag per-namespace ... Browse Code »

Before this patch sock_diag works for init_net only and dumps
information about sockets from all namespaces.

This patch expands sock_diag for all name-spaces.
It creates a netlink kernel socket for each netns and filters
data during dumping.

v2: filter accoding with netns in all places
remove an unused variable.

Cc: "David S. Miller"
Cc: Alexey Kuznetsov
Cc: James Morris
Cc: Hideaki YOSHIFUJI
Cc: Patrick McHardy
Cc: Pavel Emelyanov
CC: Eric Dumazet
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Andrew Vagin
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Andrey Vagin
2012-07-17 13:31:34 +0800

28 Jun, 2012

1 commit

4245375db unix_diag: Do not use RTA_PUT() macros ... Browse Code »

Also, no need to trim on nlmsg_put() failure, nothing has been added
yet. We also want to use nlmsg_end(), nlmsg_new() and nlmsg_free().

Signed-off-by: Thomas Graf
Signed-off-by: David S. Miller

Thomas Graf
2012-06-28 06:36:43 +0800

27 Jun, 2012

1 commit

b61bb0197 unix_diag: Move away from NLMSG_PUT(). ... Browse Code »

And use nlmsg_data() while we're here too and remove useless
casts.

Signed-off-by: David S. Miller

David S. Miller
2012-06-27 12:41:00 +0800

10 Jun, 2012

1 commit

8b51b064a af_unix: remove unix_iter_state ... Browse Code »

As pointed out by Michael Tokarev , struct unix_iter_state is no longer
needed.

Suggested-by: Michael Tokarev
Signed-off-by: Eric Dumazet
Cc: Steven Whitehouse
Cc: Pavel Emelyanov
Signed-off-by: David S. Miller

Eric Dumazet
2012-06-10 10:06:21 +0800

09 Jun, 2012

1 commit

7123aaa3a af_unix: speedup /proc/net/unix ... Browse Code »

/proc/net/unix has quadratic behavior, and can hold unix_table_lock for
a while if high number of unix sockets are alive. (90 ms for 200k
sockets...)

We already have a hash table, so its quite easy to use it.

Problem is unbound sockets are still hashed in a single hash slot
(unix_socket_table[UNIX_HASH_TABLE])

This patch also spreads unbound sockets to 256 hash slots, to speedup
both /proc/net/unix and unix_diag.

Time to read /proc/net/unix with 200k unix sockets :
(time dd if=/proc/net/unix of=/dev/null bs=4k)

before : 520 secs
after : 2 secs

Signed-off-by: Eric Dumazet
Cc: Steven Whitehouse
Cc: Pavel Emelyanov
Signed-off-by: David S. Miller

Eric Dumazet
2012-06-09 05:27:23 +0800

26 Apr, 2012

1 commit

8dcf01fc0 net: sock_diag_handler structs can be const ... Browse Code »

read only, so change it to const.

Signed-off-by: Shan Wei
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Shan Wei
2012-04-26 08:46:59 +0800

21 Apr, 2012

2 commits

ec8f23ce0 net: Convert all sysctl registrations to register_net_sysctl ... Browse Code »

This results in code with less boiler plate that is a bit easier
to read.

Additionally stops us from using compatibility code in the sysctl
core, hastening the day when the compatibility code can be removed.

Signed-off-by: Eric W. Biederman
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Eric W. Biederman
2012-04-21 09:22:30 +0800
5dd3df105 net: Move all of the network sysctls without a namespace into init_net. ... Browse Code »

This makes it clearer which sysctls are relative to your current network
namespace.

This makes it a little less error prone by not exposing sysctls for the
initial network namespace in other namespaces.

This is the same way we handle all of our other network interfaces to
userspace and I can't honestly remember why we didn't do this for
sysctls right from the start.

Signed-off-by: Eric W. Biederman
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Eric W. Biederman
2012-04-21 09:21:17 +0800

16 Apr, 2012

1 commit

95c961747 net: cleanup unsigned to unsigned int ... Browse Code »

Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-04-16 00:44:40 +0800

04 Apr, 2012

1 commit

eb6a24816 af_unix: reduce high order page allocations ... Browse Code »

unix_dgram_sendmsg() currently builds linear skbs, and this can stress
page allocator with high order page allocations. When memory gets
fragmented, this can eventually fail.

We can try to use order-2 allocations for skb head (SKB_MAX_ALLOC) plus
up to 16 page fragments to lower pressure on buddy allocator.

This patch has no effect on messages of less than 16064 bytes.
(on 64bit arches with PAGE_SIZE=4096)

For bigger messages (from 16065 to 81600 bytes), this patch brings
reliability at the expense of performance penalty because of extra pages
allocations.

netperf -t DG_STREAM -T 0,2 -- -m 16064 -s 200000
->4086040 Messages / 10s

netperf -t DG_STREAM -T 0,2 -- -m 16068 -s 200000
->3901747 Messages / 10s

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
2012-04-04 04:43:18 +0800

24 Mar, 2012

1 commit

626cf2366 poll: add poll_requested_events() and poll_does_not_wait() functions ... Browse Code »

In some cases the poll() implementation in a driver has to do different
things depending on the events the caller wants to poll for. An example
is when a driver needs to start a DMA engine if the caller polls for
POLLIN, but doesn't want to do that if POLLIN is not requested but instead
only POLLOUT or POLLPRI is requested. This is something that can happen
in the video4linux subsystem among others.

Unfortunately, the current epoll/poll/select implementation doesn't
provide that information reliably. The poll_table_struct does have it: it
has a key field with the event mask. But once a poll() call matches one
or more bits of that mask any following poll() calls are passed a NULL
poll_table pointer.

Also, the eventpoll implementation always left the key field at ~0 instead
of using the requested events mask.

This was changed in eventpoll.c so the key field now contains the actual
events that should be polled for as set by the caller.

The solution to the NULL poll_table pointer is to set the qproc field to
NULL in poll_table once poll() matches the events, not the poll_table
pointer itself. That way drivers can obtain the mask through a new
poll_requested_events inline.

The poll_table_struct can still be NULL since some kernel code calls it
internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h). In
that case poll_requested_events() returns ~0 (i.e. all events).

Very rarely drivers might want to know whether poll_wait will actually
wait. If another earlier file descriptor in the set already matched the
events the caller wanted to wait for, then the kernel will return from the
select() call without waiting. This might be useful information in order
to avoid doing expensive work.

A new helper function poll_does_not_wait() is added that drivers can use
to detect this situation. This is now used in sock_poll_wait() in
include/net/sock.h. This was the only place in the kernel that needed
this information.

Drivers should no longer access any of the poll_table internals, but use
the poll_requested_events() and poll_does_not_wait() access functions
instead. In order to enforce that the poll_table fields are now prepended
with an underscore and a comment was added warning against using them
directly.

This required a change in unix_dgram_poll() in unix/af_unix.c which used
the key field to get the requested events. It's been replaced by a call
to poll_requested_events().

For qproc it was especially important to change its name since the
behavior of that field changes with this patch since this function pointer
can now be NULL when that wasn't possible in the past.

Any driver accessing the qproc or key fields directly will now fail to compile.

Some notes regarding the correctness of this patch: the driver's poll()
function is called with a 'struct poll_table_struct *wait' argument. This
pointer may or may not be NULL, drivers can never rely on it being one or
the other as that depends on whether or not an earlier file descriptor in
the select()'s fdset matched the requested events.

There are only three things a driver can do with the wait argument:

1) obtain the key field:

events = wait ? wait->key : ~0;

This will still work although it should be replaced with the new
poll_requested_events() function (which does exactly the same).
This will now even work better, since wait is no longer set to NULL
unnecessarily.

2) use the qproc callback. This could be deadly since qproc can now be
NULL. Renaming qproc should prevent this from happening. There are no
kernel drivers that actually access this callback directly, BTW.

3) test whether wait == NULL to determine whether poll would return without
waiting. This is no longer sufficient as the correct test is now
wait == NULL || wait->_qproc == NULL.

However, the worst that can happen here is a slight performance hit in
the case where wait != NULL and wait->_qproc == NULL. In that case the
driver will assume that poll_wait() will actually add the fd to the set
of waiting file descriptors. Of course, poll_wait() will not do that
since it tests for wait->_qproc. This will not break anything, though.

There is only one place in the whole kernel where this happens
(sock_poll_wait() in include/net/sock.h) and that code will be replaced
by a call to poll_does_not_wait() in the next patch.

Note that even if wait->_qproc != NULL drivers cannot rely on poll_wait()
actually waiting. The next file descriptor from the set might match the
event mask and thus any possible waits will never happen.

Signed-off-by: Hans Verkuil
Reviewed-by: Jonathan Corbet
Reviewed-by: Al Viro
Cc: Davide Libenzi
Signed-off-by: Hans de Goede
Cc: Mauro Carvalho Chehab
Cc: David Miller
Cc: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hans Verkuil
2012-03-24 07:58:38 +0800

22 Mar, 2012

1 commit

e2a0883e4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs pile 1 from Al Viro:
"This is _not_ all; in particular, Miklos' and Jan's stuff is not there
yet."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
ext4: initialization of ext4_li_mtx needs to be done earlier
debugfs-related mode_t whack-a-mole
hfsplus: add an ioctl to bless files
hfsplus: change finder_info to u32
hfsplus: initialise userflags
qnx4: new helper - try_extent()
qnx4: get rid of qnx4_bread/qnx4_getblk
take removal of PF_FORKNOEXEC to flush_old_exec()
trim includes in inode.c
um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
um: embed ->stub_pages[] into mmu_context
gadgetfs: list_for_each_safe() misuse
ocfs2: fix leaks on failure exits in module_init
ecryptfs: make register_filesystem() the last potential failure exit
ntfs: forgets to unregister sysctls on register_filesystem() failure
logfs: missing cleanup on register_filesystem() failure
jfs: mising cleanup on register_filesystem() failure
make configfs_pin_fs() return root dentry on success
configfs: configfs_create_dir() has parent dentry in dentry->d_parent
configfs: sanitize configfs_create()
...

Linus Torvalds
2012-03-22 04:36:41 +0800

21 Mar, 2012

2 commits

68ac1234f switch touch_atime to struct path ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-03-21 09:29:41 +0800
40ffe67d2 switch unix_sock to struct path ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-03-21 09:29:41 +0800

27 Feb, 2012

1 commit

80d326fab netlink: add netlink_dump_control structure for netlink_dump_start() ... Browse Code »

Davem considers that the argument list of this interface is getting
out of control. This patch tries to address this issue following
his proposal:

struct netlink_dump_control c = { .dump = dump, .done = done, ... };

netlink_dump_start(..., &c);

Suggested by David S. Miller.

Signed-off-by: Pablo Neira Ayuso
Signed-off-by: David S. Miller

Pablo Neira Ayuso
2012-02-27 03:10:06 +0800

23 Feb, 2012

1 commit

9f6f9af76 af_unix: MSG_TRUNC support for dgram sockets ... Browse Code »

Piergiorgio Beruto expressed the need to fetch size of first datagram in
queue for AF_UNIX sockets and suggested a patch against SIOCINQ ioctl.

I suggested instead to implement MSG_TRUNC support as a recv() input
flag, as already done for RAW, UDP & NETLINK sockets.

len = recv(fd, &byte, 1, MSG_PEEK | MSG_TRUNC);

MSG_TRUNC asks recv() to return the real length of the packet, even when
is was longer than the passed buffer.

There is risk that a userland application used MSG_TRUNC by accident
(since it had no effect on af_unix sockets) and this might break after
this patch.

Signed-off-by: Eric Dumazet
Tested-by: Piergiorgio Beruto
CC: Michael Kerrisk
Signed-off-by: David S. Miller

Eric Dumazet
2012-02-23 03:47:02 +0800

22 Feb, 2012

2 commits

fc0d75364 unix: Support peeking offset for stream sockets ... Browse Code »

The same here -- we can protect the sk_peek_off manipulations with
the unix_sk->readlock mutex.

The peeking of data from a stream socket is done in the datagram style,
i.e. even if there's enough room for more data in the user buffer, only
the head skb's data is copied in there. This feature is preserved when
peeking data from a given offset -- the data is read till the nearest
skb's boundary.

Signed-off-by: Pavel Emelyanov
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Pavel Emelyanov
2012-02-22 04:03:58 +0800
f55bb7f9c unix: Support peeking offset for datagram and seqpacket sockets ... Browse Code »

The sk_peek_off manipulations are protected with the unix_sk->readlock mutex.
This mutex is enough since all we need is to syncronize setting the offset
vs reading the queue head. The latter is fully covered with the mentioned lock.

The recently added __skb_recv_datagram's offset is used to pick the skb to
read the data from.

Signed-off-by: Pavel Emelyanov
Acked-by: Eric Dumazet
Signed-off-by: David S. Miller

Pavel Emelyanov
2012-02-22 04:03:58 +0800

31 Jan, 2012

1 commit

6f01fd6e6 af_unix: fix EPOLLET regression for stream sockets ... Browse Code »

Commit 0884d7aa24 (AF_UNIX: Fix poll blocking problem when reading from
a stream socket) added a regression for epoll() in Edge Triggered mode
(EPOLLET)

Appropriate fix is to use skb_peek()/skb_unlink() instead of
skb_dequeue(), and only call skb_unlink() when skb is fully consumed.

This remove the need to requeue a partial skb into sk_receive_queue head
and the extra sk->sk_data_ready() calls that added the regression.

This is safe because once skb is given to sk_receive_queue, it is not
modified by a writer, and readers are serialized by u->readlock mutex.

This also reduce number of spinlock acquisition for small reads or
MSG_PEEK users so should improve overall performance.

Reported-by: Nick Mathewson
Signed-off-by: Eric Dumazet
Cc: Alexey Moiseytsev
Signed-off-by: David S. Miller

Eric Dumazet
2012-01-31 01:45:07 +0800

10 Jan, 2012

1 commit

38e5781bb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
igmp: Avoid zero delay when receiving odd mixture of IGMP queries
netdev: make net_device_ops const
bcm63xx: make ethtool_ops const
usbnet: make ethtool_ops const
net: Fix build with INET disabled.
net: introduce netif_addr_lock_nested() and call if when appropriate
net: correct lock name in dev_[uc/mc]_sync documentations.
net: sk_update_clone is only used in net/core/sock.c
8139cp: fix missing napi_gro_flush.
pktgen: set correct max and min in pktgen_setup_inject()
smsc911x: Unconditionally include linux/smscphy.h in smsc911x.h
asix: fix infinite loop in rx_fixup()
net: Default UDP and UNIX diag to 'n'.
r6040: fix typo in use of MCR0 register bits
net: fix sock_clone reference mismatch with tcp memcontrol

Linus Torvalds
2012-01-10 06:46:52 +0800

09 Jan, 2012

1 commit

972b2c719 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
reiserfs: Properly display mount options in /proc/mounts
vfs: prevent remount read-only if pending removes
vfs: count unlinked inodes
vfs: protect remounting superblock read-only
vfs: keep list of mounts for each superblock
vfs: switch ->show_options() to struct dentry *
vfs: switch ->show_path() to struct dentry *
vfs: switch ->show_devname() to struct dentry *
vfs: switch ->show_stats to struct dentry *
switch security_path_chmod() to struct path *
vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
vfs: trim includes a bit
switch mnt_namespace ->root to struct mount
vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
vfs: opencode mntget() mnt_set_mountpoint()
vfs: spread struct mount - remaining argument of next_mnt()
vfs: move fsnotify junk to struct mount
vfs: move mnt_devname
vfs: move mnt_list to struct mount
vfs: switch pnode.h macros to struct mount *
...

Linus Torvalds
2012-01-09 04:19:57 +0800

08 Jan, 2012

1 commit

6d62a66e4 net: Default UDP and UNIX diag to 'n'. ... Browse Code »

Signed-off-by: David S. Miller

David S. Miller
2012-01-08 04:13:06 +0800

04 Jan, 2012

1 commit

04fc66e78 switch ->path_mknod() to umode_t ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:55:19 +0800

31 Dec, 2011

1 commit

c9da99e64 unix_diag: Fixup RQLEN extension report ... Browse Code »

While it's not too late fix the recently added RQLEN diag extension
to report rqlen and wqlen in the same way as TCP does.

I.e. for listening sockets the ack backlog length (which is the input
queue length for socket) in rqlen and the max ack backlog length in
wqlen, and what the CINQ/OUTQ ioctls do for established.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
2011-12-31 05:46:02 +0800