Eric Lee / smarc-fsl-linux-kernel

02 Nov, 2017

1 commit

b24413180 License cleanup: add SPDX GPL-2.0 license identifier to files with no license ... Browse Code »

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if
Reviewed-by: Philippe Ombredanne
Reviewed-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
8 years ago

21 Jul, 2017

1 commit

7a68ada6e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Browse Code »

David S. Miller
8 years ago

19 Jul, 2017

1 commit

e06fdaf40 Merge tag 'gcc-plugins-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux ... Browse Code »

Pull structure randomization updates from Kees Cook:
"Now that IPC and other changes have landed, enable manual markings for
randstruct plugin, including the task_struct.

This is the rest of what was staged in -next for the gcc-plugins, and
comes in three patches, largest first:

- mark "easy" structs with __randomize_layout

- mark task_struct with an optional anonymous struct to isolate the
__randomize_layout section

- mark structs to opt _out_ of automated marking (which will come
later)

And, FWIW, this continues to pass allmodconfig (normal and patched to
enable gcc-plugins) builds of x86_64, i386, arm64, arm, powerpc, and
s390 for me"

* tag 'gcc-plugins-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
randstruct: opt-out externally exposed function pointer structs
task_struct: Allow randomized layout
randstruct: Mark various structs for randomization

Linus Torvalds
8 years ago

17 Jul, 2017

1 commit

27eac47b0 net/unix: drop obsolete fd-recursion limits ... Browse Code »

All unix sockets now account inflight FDs to the respective sender.
This was introduced in:

commit 712f4aad406bb1ed67f3f98d04c044191f0ff593
Author: willy tarreau
Date: Sun Jan 10 07:54:56 2016 +0100

unix: properly account for FDs passed over unix sockets

and further refined in:

commit 415e3d3e90ce9e18727e8843ae343eda5a58fad6
Author: Hannes Frederic Sowa
Date: Wed Feb 3 02:11:03 2016 +0100

unix: correctly track in-flight fds in sending process user_struct

Hence, regardless of the stacking depth of FDs, the total number of
inflight FDs is limited, and accounted. There is no known way for a
local user to exceed those limits or exploit the accounting.

Furthermore, the GC logic is independent of the recursion/stacking depth
as well. It solely depends on the total number of inflight FDs,
regardless of their layout.

Lastly, the current `recursion_level' suffers a TOCTOU race, since it
checks and inherits depths only at queue time. If we consider `A
Cc: Simon McVittie
Signed-off-by: David Herrmann
Reviewed-by: Tom Gundersen
Signed-off-by: David S. Miller

David Herrmann
8 years ago

06 Jul, 2017

1 commit

5518b69b7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from David Miller:
"Reasonably busy this cycle, but perhaps not as busy as in the 4.12
merge window:

1) Several optimizations for UDP processing under high load from
Paolo Abeni.

2) Support pacing internally in TCP when using the sch_fq packet
scheduler for this is not practical. From Eric Dumazet.

3) Support mutliple filter chains per qdisc, from Jiri Pirko.

4) Move to 1ms TCP timestamp clock, from Eric Dumazet.

5) Add batch dequeueing to vhost_net, from Jason Wang.

6) Flesh out more completely SCTP checksum offload support, from
Davide Caratti.

7) More plumbing of extended netlink ACKs, from David Ahern, Pablo
Neira Ayuso, and Matthias Schiffer.

8) Add devlink support to nfp driver, from Simon Horman.

9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa
Prabhu.

10) Add stack depth tracking to BPF verifier and use this information
in the various eBPF JITs. From Alexei Starovoitov.

11) Support XDP on qed device VFs, from Yuval Mintz.

12) Introduce BPF PROG ID for better introspection of installed BPF
programs. From Martin KaFai Lau.

13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann.

14) For loads, allow narrower accesses in bpf verifier checking, from
Yonghong Song.

15) Support MIPS in the BPF selftests and samples infrastructure, the
MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David
Daney.

16) Support kernel based TLS, from Dave Watson and others.

17) Remove completely DST garbage collection, from Wei Wang.

18) Allow installing TCP MD5 rules using prefixes, from Ivan
Delalande.

19) Add XDP support to Intel i40e driver, from Björn Töpel

20) Add support for TC flower offload in nfp driver, from Simon
Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub
Kicinski, and Bert van Leeuwen.

21) IPSEC offloading support in mlx5, from Ilan Tayari.

22) Add HW PTP support to macb driver, from Rafal Ozieblo.

23) Networking refcount_t conversions, From Elena Reshetova.

24) Add sock_ops support to BPF, from Lawrence Brako. This is useful
for tuning the TCP sockopt settings of a group of applications,
currently via CGROUPs"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits)
net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap
dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap
cxgb4: Support for get_ts_info ethtool method
cxgb4: Add PTP Hardware Clock (PHC) support
cxgb4: time stamping interface for PTP
nfp: default to chained metadata prepend format
nfp: remove legacy MAC address lookup
nfp: improve order of interfaces in breakout mode
net: macb: remove extraneous return when MACB_EXT_DESC is defined
bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case
bpf: fix return in load_bpf_file
mpls: fix rtm policy in mpls_getroute
net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t
net, ax25: convert ax25_route.refcount from atomic_t to refcount_t
net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t
net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t
...

Linus Torvalds
8 years ago

01 Jul, 2017

2 commits

8c9814b97 net: convert unix_address.refcnt from atomic_t to refcount_t ... Browse Code »

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Signed-off-by: David S. Miller

Reshetova, Elena
8 years ago
3859a271a randstruct: Mark various structs for randomization ... Browse Code »

This marks many critical kernel structures for randomization. These are
structures that have been targeted in the past in security exploits, or
contain functions pointers, pointers to function pointer tables, lists,
workqueues, ref-counters, credentials, permissions, or are otherwise
sensitive. This initial list was extracted from Brad Spengler/PaX Team's
code in the last public patch of grsecurity/PaX based on my understanding
of the code. Changes or omissions from the original code are mine and
don't reflect the original grsecurity/PaX code.

Left out of this list is task_struct, which requires special handling
and will be covered in a subsequent patch.

Signed-off-by: Kees Cook

Kees Cook
8 years ago

20 Jun, 2017

1 commit

ac6424b98 sched/wait: Rename wait_queue_t => wait_queue_entry_t ... Browse Code »

Rename:

wait_queue_t => wait_queue_entry_t

'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.

Start sorting this out by renaming it to 'wait_queue_entry_t'.

This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.

Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
8 years ago

05 Sep, 2016

1 commit

6e1ce3c34 af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock' ... Browse Code »

Right now we use the 'readlock' both for protecting some of the af_unix
IO path and for making the bind be single-threaded.

The two are independent, but using the same lock makes for a nasty
deadlock due to ordering with regards to filesystem locking. The bind
locking would want to nest outside the VSF pathname locking, but the IO
locking wants to nest inside some of those same locks.

We tried to fix this earlier with commit c845acb324aa ("af_unix: Fix
splice-bind deadlock") which moved the readlock inside the vfs locks,
but that caused problems with overlayfs that will then call back into
filesystem routines that take the lock in the wrong order anyway.

Splitting the locks means that we can go back to having the bind lock be
the outermost lock, and we don't have any deadlocks with lock ordering.

Acked-by: Rainer Weikusat
Acked-by: Al Viro
Signed-off-by: Linus Torvalds
Acked-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Linus Torvalds
9 years ago

08 Feb, 2016

1 commit

415e3d3e9 unix: correctly track in-flight fds in sending process user_struct ... Browse Code »

The commit referenced in the Fixes tag incorrectly accounted the number
of in-flight fds over a unix domain socket to the original opener
of the file-descriptor. This allows another process to arbitrary
deplete the original file-openers resource limit for the maximum of
open files. Instead the sending processes and its struct cred should
be credited.

To do so, we add a reference counted struct user_struct pointer to the
scm_fp_list and use it to account for the number of inflight unix fds.

Fixes: 712f4aad406bb1 ("unix: properly account for FDs passed over unix sockets")
Reported-by: David Herrmann
Cc: David Herrmann
Cc: Willy Tarreau
Cc: Linus Torvalds
Suggested-by: Linus Torvalds
Signed-off-by: Hannes Frederic Sowa
Signed-off-by: David S. Miller

Hannes Frederic Sowa
9 years ago

24 Nov, 2015

1 commit

7d267278a unix: avoid use-after-free in ep_remove_wait_queue ... Browse Code »

Rainer Weikusat writes:
An AF_UNIX datagram socket being the client in an n:1 association with
some server socket is only allowed to send messages to the server if the
receive queue of this socket contains at most sk_max_ack_backlog
datagrams. This implies that prospective writers might be forced to go
to sleep despite none of the message presently enqueued on the server
receive queue were sent by them. In order to ensure that these will be
woken up once space becomes again available, the present unix_dgram_poll
routine does a second sock_poll_wait call with the peer_wait wait queue
of the server socket as queue argument (unix_dgram_recvmsg does a wake
up on this queue after a datagram was received). This is inherently
problematic because the server socket is only guaranteed to remain alive
for as long as the client still holds a reference to it. In case the
connection is dissolved via connect or by the dead peer detection logic
in unix_dgram_sendmsg, the server socket may be freed despite "the
polling mechanism" (in particular, epoll) still has a pointer to the
corresponding peer_wait queue. There's no way to forcibly deregister a
wait queue with epoll.

Based on an idea by Jason Baron, the patch below changes the code such
that a wait_queue_t belonging to the client socket is enqueued on the
peer_wait queue of the server whenever the peer receive queue full
condition is detected by either a sendmsg or a poll. A wake up on the
peer queue is then relayed to the ordinary wait queue of the client
socket via wake function. The connection to the peer wait queue is again
dissolved if either a wake up is about to be relayed or the client
socket reconnects or a dead peer is detected or the client socket is
itself closed. This enables removing the second sock_poll_wait from
unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
that no blocked writer sleeps forever.

Signed-off-by: Rainer Weikusat
Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets")
Reviewed-by: Jason Baron
Signed-off-by: David S. Miller

Rainer Weikusat
10 years ago

08 Oct, 2015

1 commit

c72eda060 af_unix: constify the sock parameter in unix_sk() ... Browse Code »

Make unix_sk() just like inet[6]_sk() by constify'ing the sock
parameter.

Signed-off-by: Paul Moore
Signed-off-by: David S. Miller

Paul Moore
10 years ago

30 Sep, 2015

1 commit

4613012db af_unix: Convert the unix_sk macro to an inline function for type safety ... Browse Code »

As suggested by Eric Dumazet this change replaces the
#define with a static inline function to enjoy
complaints by the compiler when misusing the API.

Signed-off-by: Aaron Conole
Signed-off-by: David S. Miller

Aaron Conole
10 years ago

11 Jun, 2015

1 commit

37a9a8df8 net/unix: support SCM_SECURITY for stream sockets ... Browse Code »

SCM_SECURITY was originally only implemented for datagram sockets,
not for stream sockets. However, SCM_CREDENTIALS is supported on
Unix stream sockets. For consistency, implement Unix stream support
for SCM_SECURITY as well. Also clean up the existing code and get
rid of the superfluous UNIXSID macro.

Motivated by https://bugzilla.redhat.com/show_bug.cgi?id=1224211,
where systemd was using SCM_CREDENTIALS and assumed wrongly that
SCM_SECURITY was also supported on Unix stream sockets.

Signed-off-by: Stephen Smalley
Acked-by: Paul Moore
Signed-off-by: David S. Miller

Stephen Smalley
10 years ago

10 Aug, 2013

1 commit

e370a7236 af_unix: improve STREAM behavior with fragmented memory ... Browse Code »

unix_stream_sendmsg() currently uses order-2 allocations,
and we had numerous reports this can fail.

The __GFP_REPEAT flag present in sock_alloc_send_pskb() is
not helping.

This patch extends the work done in commit eb6a24816b247c
("af_unix: reduce high order page allocations) for
datagram sockets.

This opens the possibility of zero copy IO (splice() and
friends)

The trick is to not use skb_pull() anymore in recvmsg() path,
and instead add a @consumed field in UNIXCB() to track amount
of already read payload in the skb.

There is a performance regression for large sends
because of extra page allocations that will be addressed
in a follow-up patch, allowing sock_alloc_send_pskb()
to attempt high order page allocations.

Signed-off-by: Eric Dumazet
Cc: David Rientjes
Signed-off-by: David S. Miller

Eric Dumazet
12 years ago

01 Aug, 2013

1 commit

b60a8280b af_unix.h: Remove extern from function prototypes ... Browse Code »

There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Reflow modified prototypes to 80 columns.

Signed-off-by: Joe Perches
Signed-off-by: David S. Miller

Joe Perches
12 years ago

02 May, 2013

1 commit

60bc851ae af_unix: fix a fatal race with bit fields ... Browse Code »

Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
uses 64bit instructions to manipulate them.
If the 64bit word includes any atomic_t or spinlock_t, we can lose
critical concurrent changes.

This is happening in af_unix, where unix_sk(sk)->gc_candidate/
gc_maybe_cycle/lock share the same 64bit word.

This leads to fatal deadlock, as one/several cpus spin forever
on a spinlock that will never be available again.

A safer way would be to use a long to store flags.
This way we are sure compiler/arch wont do bad things.

As we own unix_gc_lock spinlock when clearing or setting bits,
we can use the non atomic __set_bit()/__clear_bit().

recursion_level can share the same 64bit location with the spinlock,
as it is set only with this spinlock held.

[1] bug fixed in gcc-4.8.0 :
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080

Reported-by: Ambrose Feinstein
Signed-off-by: Eric Dumazet
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Signed-off-by: David S. Miller

Eric Dumazet
12 years ago

08 Apr, 2013

1 commit

6b0ee8c03 scm: Stop passing struct cred ... Browse Code »

Now that uids and gids are completely encapsulated in kuid_t
and kgid_t we no longer need to pass struct cred which allowed
us to test both the uid and the user namespace for equality.

Passing struct cred potentially allows us to pass the entire group
list as BSD does but I don't believe the cost of cache line misses
justifies retaining code for a future potential application.

Signed-off-by: "Eric W. Biederman"
Signed-off-by: David S. Miller

Eric W. Biederman
12 years ago

22 Oct, 2012

1 commit

0da5f7c6d unix: Remove unused field from unix_sock ... Browse Code »

The struct sock *other one seem to be unused. Grep and make do not object.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
13 years ago

09 Jun, 2012

1 commit

7123aaa3a af_unix: speedup /proc/net/unix ... Browse Code »

/proc/net/unix has quadratic behavior, and can hold unix_table_lock for
a while if high number of unix sockets are alive. (90 ms for 200k
sockets...)

We already have a hash table, so its quite easy to use it.

Problem is unbound sockets are still hashed in a single hash slot
(unix_socket_table[UNIX_HASH_TABLE])

This patch also spreads unbound sockets to 256 hash slots, to speedup
both /proc/net/unix and unix_diag.

Time to read /proc/net/unix with 200k unix sockets :
(time dd if=/proc/net/unix of=/dev/null bs=4k)

before : 520 secs
after : 2 secs

Signed-off-by: Eric Dumazet
Cc: Steven Whitehouse
Cc: Pavel Emelyanov
Signed-off-by: David S. Miller

Eric Dumazet
13 years ago

16 Apr, 2012

1 commit

95c961747 net: cleanup unsigned to unsigned int ... Browse Code »

Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
13 years ago

21 Mar, 2012

1 commit

40ffe67d2 switch unix_sock to struct path ... Browse Code »

Signed-off-by: Al Viro

Al Viro
13 years ago

31 Dec, 2011

1 commit

885ee74d5 af_unix: Move CINQ/COUTQ code to helpers ... Browse Code »

Currently tcp diag reports rqlen and wqlen values similar to how
the CINQ/COUTQ iotcls do. To make unix diag report these values
in the same way move the respective code into helpers.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
14 years ago

17 Dec, 2011

1 commit

fa7ff56f7 af_unix: Export stuff required for diag module ... Browse Code »

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
14 years ago

25 Apr, 2011

1 commit

2a9e95070 net: Remove __KERNEL__ cpp checks from include/net ... Browse Code »

These header files are never installed to user consumption, so any
__KERNEL__ cpp checks are superfluous.

Projects should also not copy these files into their userland utility
sources and try to use them there. If they insist on doing so, the
onus is on them to sanitize the headers as needed.

Signed-off-by: David S. Miller

David S. Miller
14 years ago

30 Nov, 2010

1 commit

25888e303 af_unix: limit recursion level ... Browse Code »

Its easy to eat all kernel memory and trigger NMI watchdog, using an
exploit program that queues unix sockets on top of others.

lkml ref : http://lkml.org/lkml/2010/11/25/8

This mechanism is used in applications, one choice we have is to have a
recursion limit.

Other limits might be needed as well (if we queue other types of files),
since the passfd mechanism is currently limited by socket receive queue
sizes only.

Add a recursion_level to unix socket, allowing up to 4 levels.

Each time we send an unix socket through sendfd mechanism, we copy its
recursion level (plus one) to receiver. This recursion level is cleared
when socket receive queue is emptied.

Reported-by: Марк Коренберг
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
15 years ago

17 Jun, 2010

1 commit

7361c36c5 af_unix: Allow credentials to work across user and pid namespaces. ... Browse Code »

In unix_skb_parms store pointers to struct pid and struct cred instead
of raw uid, gid, and pid values, then translate the credentials on
reception into values that are meaningful in the receiving processes
namespaces.

Signed-off-by: Eric W. Biederman
Acked-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Eric W. Biederman
15 years ago

02 May, 2010

1 commit

438154823 net: sock_def_readable() and friends RCU conversion ... Browse Code »

sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
need two atomic operations (and associated dirtying) per incoming
packet.

RCU conversion is pretty much needed :

1) Add a new structure, called "struct socket_wq" to hold all fields
that will need rcu_read_lock() protection (currently: a
wait_queue_head_t and a struct fasync_struct pointer).

[Future patch will add a list anchor for wakeup coalescing]

2) Attach one of such structure to each "struct socket" created in
sock_alloc_inode().

3) Respect RCU grace period when freeing a "struct socket_wq"

4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
socket_wq"

5) Change sk_sleep() function to use new sk->sk_wq instead of
sk->sk_sleep

6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
a rcu_read_lock() section.

7) Change all sk_has_sleeper() callers to :
- Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
- Use wq_has_sleeper() to eventually wakeup tasks.
- Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)

8) sock_wake_async() is modified to use rcu protection as well.

9) Exceptions :
macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
instead of dynamically allocated ones. They dont need rcu freeing.

Some cleanups or followups are probably needed, (possible
sk_callback_lock conversion to a spinlock for example...).

Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller

Eric Dumazet
15 years ago

27 Nov, 2008

1 commit

5f23b7349 net: Fix soft lockups/OOM issues w/ unix garbage collector ... Browse Code »

This is an implementation of David Miller's suggested fix in:
https://bugzilla.redhat.com/show_bug.cgi?id=470201

It has been updated to use wait_event() instead of
wait_event_interruptible().

Paraphrasing the description from the above report, it makes sendmsg()
block while UNIX garbage collection is in progress. This avoids a
situation where child processes continue to queue new FDs over a
AF_UNIX socket to a parent which is in the exit path and running
garbage collection on these FDs. This contention can result in soft
lockups and oom-killing of unrelated processes.

Signed-off-by: dann frazier
Signed-off-by: David S. Miller

dann frazier
17 years ago

10 Nov, 2008

1 commit

6209344f5 net: unix: fix inflight counting bug in garbage collector ... Browse Code »

Previously I assumed that the receive queues of candidates don't
change during the GC. This is only half true, nothing can be received
from the queues (see comment in unix_gc()), but buffers could be added
through the other half of the socket pair, which may still have file
descriptors referring to it.

This can result in inc_inflight_move_tail() erronously increasing the
"inflight" counter for a unix socket for which dec_inflight() wasn't
previously called. This in turn can trigger the "BUG_ON(total_refs <
inflight_refs)" in a later garbage collection run.

Fix this by only manipulating the "inflight" counter for sockets which
are candidates themselves. Duplicating the file references in
unix_attach_fds() is also needed to prevent a socket becoming a
candidate for GC while the skb that contains it is not yet queued.

Reported-by: Andrea Bittau
Signed-off-by: Miklos Szeredi
CC: stable@kernel.org
Signed-off-by: Linus Torvalds

Miklos Szeredi
17 years ago

27 Jul, 2008

1 commit

516e0cc56 [PATCH] f_count may wrap around ... Browse Code »

make it atomic_long_t; while we are at it, get rid of useless checks in affs,
hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0.

Signed-off-by: Al Viro

Al Viro
17 years ago

29 Jan, 2008

2 commits

27147c9e6 [AF_UNIX]: Remove unused declaration of sysctl_unix_max_dgram_qlen. ... Browse Code »

Signed-off-by: Denis V. Lunev
Signed-off-by: David S. Miller

Denis V. Lunev
18 years ago
97577e382 [UNIX]: Extend unix_sysctl_(un)register prototypes ... Browse Code »

Add the struct net * argument to both of them to use in
the future. Also make the register one return an error code.

It is useless right now, but will make the future patches
much simpler.

Signed-off-by: Pavel Emelyanov
Acked-by: Eric W. Biederman
Signed-off-by: Herbert Xu
Signed-off-by: David S. Miller

Pavel Emelyanov
18 years ago

11 Nov, 2007

1 commit

9305cfa44 [AF_UNIX]: Make unix_tot_inflight counter non-atomic ... Browse Code »

This counter is _always_ modified under the unix_gc_lock spinlock,
so its atomicity can be provided w/o additional efforts.

Signed-off-by: Pavel Emelyanov
Signed-off-by: David S. Miller

Pavel Emelyanov
18 years ago

31 Jul, 2007

1 commit

131116989 [AF_UNIX]: Make code static. ... Browse Code »

The following code can now become static:
- struct unix_socket_table
- unix_table_lock

Signed-off-by: Adrian Bunk
Signed-off-by: David S. Miller

Adrian Bunk
18 years ago

12 Jul, 2007

1 commit

1fd05ba5a [AF_UNIX]: Rewrite garbage collector, fixes race. ... Browse Code »

Throw out the old mark & sweep garbage collector and put in a
refcounting cycle detecting one.

The old one had a race with recvmsg, that resulted in false positives
and hence data loss. The old algorithm operated on all unix sockets
in the system, so any additional locking would have meant performance
problems for all users of these.

The new algorithm instead only operates on "in flight" sockets, which
are very rare, and the additional locking for these doesn't negatively
impact the vast majority of users.

In fact it's probable, that there weren't *any* heavy senders of
sockets over sockets, otherwise the above race would have been
discovered long ago.

The patch works OK with the app that exposed the race with the old
code. The garbage collection has also been verified to work in a few
simple cases.

Signed-off-by: Miklos Szeredi
Signed-off-by: David S. Miller

Miklos Szeredi
18 years ago

04 Jun, 2007

1 commit

1c92b4e50 [AF_UNIX]: Make socket locking much less confusing. ... Browse Code »

The unix_state_*() locking macros imply that there is some
rwlock kind of thing going on, but the implementation is
actually a spinlock which makes the code more confusing than
it needs to be.

So use plain unix_state_lock and unix_state_unlock.

Signed-off-by: David S. Miller

David S. Miller
18 years ago

03 Aug, 2006

1 commit

dc49c1f94 [AF_UNIX]: Kernel memory leak fix for af_unix datagram getpeersec patch ... Browse Code »

From: Catherine Zhang

This patch implements a cleaner fix for the memory leak problem of the
original unix datagram getpeersec patch. Instead of creating a
security context each time a unix datagram is sent, we only create the
security context when the receiver requests it.

This new design requires modification of the current
unix_getsecpeer_dgram LSM hook and addition of two new hooks, namely,
secid_to_secctx and release_secctx. The former retrieves the security
context and the latter releases it. A hook is required for releasing
the security context because it is up to the security module to decide
how that's done. In the case of Selinux, it's a simple kfree
operation.

Acked-by: Stephen Smalley
Signed-off-by: David S. Miller

Catherine Zhang
19 years ago

04 Jul, 2006

1 commit

a09785a24 [PATCH] lockdep: annotate af_unix locking ... Browse Code »

Teach special (recursive) locking code to the lock validator. Also splits
af_unix's sk_receive_queue.lock class from the other networking skb-queue
locks. Has no effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
19 years ago

30 Jun, 2006

1 commit

877ce7c1b [AF_UNIX]: Datagram getpeersec ... Browse Code »

This patch implements an API whereby an application can determine the
label of its peer's Unix datagram sockets via the auxiliary data mechanism of
recvmsg.

Patch purpose:

This patch enables a security-aware application to retrieve the
security context of the peer of a Unix datagram socket. The application
can then use this security context to determine the security context for
processing on behalf of the peer who sent the packet.

Patch design and implementation:

The design and implementation is very similar to the UDP case for INET
sockets. Basically we build upon the existing Unix domain socket API for
retrieving user credentials. Linux offers the API for obtaining user
credentials via ancillary messages (i.e., out of band/control messages
that are bundled together with a normal message). To retrieve the security
context, the application first indicates to the kernel such desire by
setting the SO_PASSSEC option via getsockopt. Then the application
retrieves the security context using the auxiliary data mechanism.

An example server application for Unix datagram socket should look like this:

toggle = 1;
toggle_len = sizeof(toggle);

setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len);
recvmsg(sockfd, &msg_hdr, 0);
if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
if (cmsg_hdr->cmsg_len cmsg_level == SOL_SOCKET &&
cmsg_hdr->cmsg_type == SCM_SECURITY) {
memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
}
}

sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow
a server socket to receive security context of the peer.

Testing:

We have tested the patch by setting up Unix datagram client and server
applications. We verified that the server can retrieve the security context
using the auxiliary data mechanism of recvmsg.

Signed-off-by: Catherine Zhang
Acked-by: Acked-by: James Morris
Signed-off-by: David S. Miller

Catherine Zhang
19 years ago