24 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    released under the gpl version 2 or later

    and 1 additional normalized pattern(s):

    this program is free software you can redistribute it and or
    modify it under the terms of the gnu general public license
    as published by the free software foundation either version
    2 of the license or at your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 1 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Richard Fontana
    Reviewed-by: Kate Stewart
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190520071858.828691433@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

28 Feb, 2019

2 commits

  • We need this functionality for the io_uring file registration, but
    we cannot rely on it since CONFIG_UNIX can be modular. Move the helpers
    to a separate file, that's always builtin to the kernel if CONFIG_UNIX is
    m/y.

    No functional changes in this patch, just moving code around.

    Reviewed-by: Hannes Reinecke
    Acked-by: David S. Miller
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • The submission queue (SQ) and completion queue (CQ) rings are shared
    between the application and the kernel. This eliminates the need to
    copy data back and forth to submit and complete IO.

    IO submissions use the io_uring_sqe data structure, and completions
    are generated in the form of io_uring_cqe data structures. The SQ
    ring is an index into the io_uring_sqe array, which makes it possible
    to submit a batch of IOs without them being contiguous in the ring.
    The CQ ring is always contiguous, as completion events are inherently
    unordered, and hence any io_uring_cqe entry can point back to an
    arbitrary submission.

    Two new system calls are added for this:

    io_uring_setup(entries, params)
    Sets up an io_uring instance for doing async IO. On success,
    returns a file descriptor that the application can mmap to
    gain access to the SQ ring, CQ ring, and io_uring_sqes.

    io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
    Initiates IO against the rings mapped to this fd, or waits for
    them to complete, or both. The behavior is controlled by the
    parameters passed in. If 'to_submit' is non-zero, then we'll
    try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
    kernel will wait for 'min_complete' events, if they aren't
    already available. It's valid to set IORING_ENTER_GETEVENTS
    and 'min_complete' == 0 at the same time, this allows the
    kernel to return already completed events without waiting
    for them. This is useful only for polling, as for IRQ
    driven IO, the application can just check the CQ ring
    without entering the kernel.

    With this setup, it's possible to do async IO with a single system
    call. Future developments will enable polled IO with this interface,
    and polled submission as well. The latter will enable an application
    to do IO without doing ANY system calls at all.

    For IRQ driven IO, an application only needs to enter the kernel for
    completions if it wants to wait for them to occur.

    Each io_uring is backed by a workqueue, to support buffered async IO
    as well. We will only punt to an async context if the command would
    need to wait for IO on the device side. Any data that can be accessed
    directly in the page cache is done inline. This avoids the slowness
    issue of usual threadpools, since cached data is accessed as quickly
    as a sync interface.

    Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c

    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Jens Axboe
     

22 Mar, 2017

1 commit

  • Dmitry has reported that a BUG_ON() condition in unix_notinflight()
    may be triggered by a simple code that forwards unix socket in an
    SCM_RIGHTS message.
    That is caused by incorrect unix socket GC implementation in unix_gc().

    The GC first collects list of candidates, then (a) decrements their
    "children's" inflight counter, (b) checks which inflight counters are
    now 0, and then (c) increments all inflight counters back.
    (a) and (c) are done by calling scan_children() with inc_inflight or
    dec_inflight as the second argument.

    Commit 6209344f5a37 ("net: unix: fix inflight counting bug in garbage
    collector") changed scan_children() such that it no longer considers
    sockets that do not have UNIX_GC_CANDIDATE flag. It also added a block
    of code that that unsets this flag _before_ invoking
    scan_children(, dec_iflight, ). This may lead to incorrect inflight
    counters for some sockets.

    This change fixes this bug by changing order of operations:
    UNIX_GC_CANDIDATE is now unset only after all inflight counters are
    restored to the original state.

    kernel BUG at net/unix/garbage.c:149!
    RIP: 0010:[] []
    unix_notinflight+0x3b4/0x490 net/unix/garbage.c:149
    Call Trace:
    [] unix_detach_fds.isra.19+0xff/0x170 net/unix/af_unix.c:1487
    [] unix_destruct_scm+0xf9/0x210 net/unix/af_unix.c:1496
    [] skb_release_head_state+0x101/0x200 net/core/skbuff.c:655
    [] skb_release_all+0x1a/0x60 net/core/skbuff.c:668
    [] __kfree_skb+0x1a/0x30 net/core/skbuff.c:684
    [] kfree_skb+0x184/0x570 net/core/skbuff.c:705
    [] unix_release_sock+0x5b5/0xbd0 net/unix/af_unix.c:559
    [] unix_release+0x49/0x90 net/unix/af_unix.c:836
    [] sock_release+0x92/0x1f0 net/socket.c:570
    [] sock_close+0x1b/0x20 net/socket.c:1017
    [] __fput+0x34e/0x910 fs/file_table.c:208
    [] ____fput+0x1a/0x20 fs/file_table.c:244
    [] task_work_run+0x1a0/0x280 kernel/task_work.c:116
    [< inline >] exit_task_work include/linux/task_work.h:21
    [] do_exit+0x183a/0x2640 kernel/exit.c:828
    [] do_group_exit+0x14e/0x420 kernel/exit.c:931
    [] get_signal+0x663/0x1880 kernel/signal.c:2307
    [] do_signal+0xc5/0x2190 arch/x86/kernel/signal.c:807
    [] exit_to_usermode_loop+0x1ea/0x2d0
    arch/x86/entry/common.c:156
    [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
    [] syscall_return_slowpath+0x4d3/0x570
    arch/x86/entry/common.c:259
    [] entry_SYSCALL_64_fastpath+0xc4/0xc6

    Link: https://lkml.org/lkml/2017/3/6/252
    Signed-off-by: Andrey Ulanov
    Reported-by: Dmitry Vyukov
    Fixes: 6209344 ("net: unix: fix inflight counting bug in garbage collector")
    Signed-off-by: David S. Miller

    Andrey Ulanov
     

08 Feb, 2016

1 commit

  • The commit referenced in the Fixes tag incorrectly accounted the number
    of in-flight fds over a unix domain socket to the original opener
    of the file-descriptor. This allows another process to arbitrary
    deplete the original file-openers resource limit for the maximum of
    open files. Instead the sending processes and its struct cred should
    be credited.

    To do so, we add a reference counted struct user_struct pointer to the
    scm_fp_list and use it to account for the number of inflight unix fds.

    Fixes: 712f4aad406bb1 ("unix: properly account for FDs passed over unix sockets")
    Reported-by: David Herrmann
    Cc: David Herrmann
    Cc: Willy Tarreau
    Cc: Linus Torvalds
    Suggested-by: Linus Torvalds
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

11 Jan, 2016

1 commit

  • It is possible for a process to allocate and accumulate far more FDs than
    the process' limit by sending them over a unix socket then closing them
    to keep the process' fd count low.

    This change addresses this problem by keeping track of the number of FDs
    in flight per user and preventing non-privileged processes from having
    more FDs in flight than their configured FD limit.

    Reported-by: socketpair@gmail.com
    Reported-by: Tetsuo Handa
    Mitigates: CVE-2013-4312 (Linux 2.0+)
    Suggested-by: Linus Torvalds
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: Willy Tarreau
    Signed-off-by: David S. Miller

    willy tarreau
     

24 Apr, 2015

1 commit


08 Oct, 2014

1 commit


02 May, 2013

1 commit

  • Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
    uses 64bit instructions to manipulate them.
    If the 64bit word includes any atomic_t or spinlock_t, we can lose
    critical concurrent changes.

    This is happening in af_unix, where unix_sk(sk)->gc_candidate/
    gc_maybe_cycle/lock share the same 64bit word.

    This leads to fatal deadlock, as one/several cpus spin forever
    on a spinlock that will never be available again.

    A safer way would be to use a long to store flags.
    This way we are sure compiler/arch wont do bad things.

    As we own unix_gc_lock spinlock when clearing or setting bits,
    we can use the non atomic __set_bit()/__clear_bit().

    recursion_level can share the same 64bit location with the spinlock,
    as it is set only with this spinlock held.

    [1] bug fixed in gcc-4.8.0 :
    http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080

    Reported-by: Ambrose Feinstein
    Signed-off-by: Eric Dumazet
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Feb, 2013

1 commit


15 Mar, 2011

1 commit


30 Nov, 2010

1 commit

  • Its easy to eat all kernel memory and trigger NMI watchdog, using an
    exploit program that queues unix sockets on top of others.

    lkml ref : http://lkml.org/lkml/2010/11/25/8

    This mechanism is used in applications, one choice we have is to have a
    recursion limit.

    Other limits might be needed as well (if we queue other types of files),
    since the passfd mechanism is currently limited by socket receive queue
    sizes only.

    Add a recursion_level to unix socket, allowing up to 4 levels.

    Each time we send an unix socket through sendfd mechanism, we copy its
    recursion level (plus one) to receiver. This recursion level is cleared
    when socket receive queue is emptied.

    Reported-by: Марк Коренберг
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Nov, 2010

1 commit

  • Vegard Nossum found a unix socket OOM was possible, posting an exploit
    program.

    My analysis is we can eat all LOWMEM memory before unix_gc() being
    called from unix_release_sock(). Moreover, the thread blocked in
    unix_gc() can consume huge amount of time to perform cleanup because of
    huge working set.

    One way to handle this is to have a sensible limit on unix_tot_inflight,
    tested from wait_for_unix_gc() and to force a call to unix_gc() if this
    limit is hit.

    This solves the OOM and also reduce overall latencies, and should not
    slowdown normal workloads.

    Reported-by: Vegard Nossum
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 May, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

03 Dec, 2008

1 commit


27 Nov, 2008

1 commit

  • This is an implementation of David Miller's suggested fix in:
    https://bugzilla.redhat.com/show_bug.cgi?id=470201

    It has been updated to use wait_event() instead of
    wait_event_interruptible().

    Paraphrasing the description from the above report, it makes sendmsg()
    block while UNIX garbage collection is in progress. This avoids a
    situation where child processes continue to queue new FDs over a
    AF_UNIX socket to a parent which is in the exit path and running
    garbage collection on these FDs. This contention can result in soft
    lockups and oom-killing of unrelated processes.

    Signed-off-by: dann frazier
    Signed-off-by: David S. Miller

    dann frazier
     

12 Nov, 2008

1 commit


10 Nov, 2008

1 commit

  • Previously I assumed that the receive queues of candidates don't
    change during the GC. This is only half true, nothing can be received
    from the queues (see comment in unix_gc()), but buffers could be added
    through the other half of the socket pair, which may still have file
    descriptors referring to it.

    This can result in inc_inflight_move_tail() erronously increasing the
    "inflight" counter for a unix socket for which dec_inflight() wasn't
    previously called. This in turn can trigger the "BUG_ON(total_refs <
    inflight_refs)" in a later garbage collection run.

    Fix this by only manipulating the "inflight" counter for sockets which
    are candidates themselves. Duplicating the file references in
    unix_attach_fds() is also needed to prevent a socket becoming a
    candidate for GC while the skb that contains it is not yet queued.

    Reported-by: Andrea Bittau
    Signed-off-by: Miklos Szeredi
    CC: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

02 Nov, 2008

1 commit


27 Jul, 2008

1 commit


11 Nov, 2007

2 commits


12 Jul, 2007

1 commit

  • Throw out the old mark & sweep garbage collector and put in a
    refcounting cycle detecting one.

    The old one had a race with recvmsg, that resulted in false positives
    and hence data loss. The old algorithm operated on all unix sockets
    in the system, so any additional locking would have meant performance
    problems for all users of these.

    The new algorithm instead only operates on "in flight" sockets, which
    are very rare, and the additional locking for these doesn't negatively
    impact the vast majority of users.

    In fact it's probable, that there weren't *any* heavy senders of
    sockets over sockets, otherwise the above race would have been
    discovered long ago.

    The patch works OK with the app that exposed the race with the old
    code. The garbage collection has also been verified to work in a few
    simple cases.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: David S. Miller

    Miklos Szeredi
     

15 Feb, 2007

1 commit

  • After Al Viro (finally) succeeded in removing the sched.h #include in module.h
    recently, it makes sense again to remove other superfluous sched.h includes.
    There are quite a lot of files which include it but don't actually need
    anything defined in there. Presumably these includes were once needed for
    macros that used to live in sched.h, but moved to other header files in the
    course of cleaning it up.

    To ease the pain, this time I did not fiddle with any header files and only
    removed #includes from .c-files, which tend to cause less trouble.

    Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
    arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
    allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
    configs in arch/arm/configs on arm. I also checked that no new warnings were
    introduced by the patch (actually, some warnings are removed that were emitted
    by unnecessarily included header files).

    Signed-off-by: Tim Schmielau
    Acked-by: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Schmielau
     

11 Feb, 2007

1 commit


09 Dec, 2006

1 commit


21 Mar, 2006

1 commit

  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Arjan van de Ven
     

04 Jan, 2006

1 commit


30 Aug, 2005

2 commits

  • Lots of places just needs the states, not even linux/tcp.h, where this
    enum was, needs it.

    This speeds up development of the refactorings as less sources are
    rebuilt when things get moved from net/tcp.h.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Remove the "list" member of struct sk_buff, as it is entirely
    redundant. All SKB list removal callers know which list the
    SKB is on, so storing this in sk_buff does nothing other than
    taking up some space.

    Two tricky bits were SCTP, which I took care of, and two ATM
    drivers which Francois Romieu fixed
    up.

    Signed-off-by: David S. Miller
    Signed-off-by: Francois Romieu

    David S. Miller
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds