18 Dec, 2019

1 commit

  • commit 3253d9d093376d62b4a56e609f15d2ec5085ac73 upstream.

    Andreas Grünbacher reports that on the two filesystems that support
    iomap directio, it's possible for splice() to return -EAGAIN (instead of
    a short splice) if the pipe being written to has less space available in
    its pipe buffers than the length supplied by the calling process.

    Months ago we fixed splice_direct_to_actor to clamp the length of the
    read request to the size of the splice pipe. Do the same to do_splice.

    Fixes: 17614445576b6 ("splice: don't read more than available pipe space")
    Reported-by: syzbot+3c01db6025f26530cf8d@syzkaller.appspotmail.com
    Reported-by: Andreas Grünbacher
    Reviewed-by: Andreas Grünbacher
    Signed-off-by: Darrick J. Wong
    Signed-off-by: Greg Kroah-Hartman

    Darrick J. Wong
     

01 Jun, 2019

1 commit


21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

27 Apr, 2019

1 commit

  • Pull tracing fixes from Steven Rostedt:
    "Three tracing fixes:

    - Use "nosteal" for ring buffer splice pages

    - Memory leak fix in error path of trace_pid_write()

    - Fix preempt_enable_no_resched() (use preempt_enable()) in ring
    buffer code"

    * tag 'trace-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    trace: Fix preempt_enable_no_resched() abuse
    tracing: Fix a memory leak by early error exit in trace_pid_write()
    tracing: Fix buffer_ref pipe ops

    Linus Torvalds
     

26 Apr, 2019

1 commit

  • This fixes multiple issues in buffer_pipe_buf_ops:

    - The ->steal() handler must not return zero unless the pipe buffer has
    the only reference to the page. But generic_pipe_buf_steal() assumes
    that every reference to the pipe is tracked by the page's refcount,
    which isn't true for these buffers - buffer_pipe_buf_get(), which
    duplicates a buffer, doesn't touch the page's refcount.
    Fix it by using generic_pipe_buf_nosteal(), which refuses every
    attempted theft. It should be easy to actually support ->steal, but the
    only current users of pipe_buf_steal() are the virtio console and FUSE,
    and they also only use it as an optimization. So it's probably not worth
    the effort.
    - The ->get() and ->release() handlers can be invoked concurrently on pipe
    buffers backed by the same struct buffer_ref. Make them safe against
    concurrency by using refcount_t.
    - The pointers stored in ->private were only zeroed out when the last
    reference to the buffer_ref was dropped. As far as I know, this
    shouldn't be necessary anyway, but if we do it, let's always do it.

    Link: http://lkml.kernel.org/r/20190404215925.253531-1-jannh@google.com

    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: Al Viro
    Cc: stable@vger.kernel.org
    Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer")
    Signed-off-by: Jann Horn
    Signed-off-by: Steven Rostedt (VMware)

    Jann Horn
     

15 Apr, 2019

2 commits

  • Merge page ref overflow branch.

    Jann Horn reported that he can overflow the page ref count with
    sufficient memory (and a filesystem that is intentionally extremely
    slow).

    Admittedly it's not exactly easy. To have more than four billion
    references to a page requires a minimum of 32GB of kernel memory just
    for the pointers to the pages, much less any metadata to keep track of
    those pointers. Jann needed a total of 140GB of memory and a specially
    crafted filesystem that leaves all reads pending (in order to not ever
    free the page references and just keep adding more).

    Still, we have a fairly straightforward way to limit the two obvious
    user-controllable sources of page references: direct-IO like page
    references gotten through get_user_pages(), and the splice pipe page
    duplication. So let's just do that.

    * branch page-refs:
    fs: prevent page refcount overflow in pipe_buf_get
    mm: prevent get_user_pages() from overflowing page refcount
    mm: add 'try_get_page()' helper function
    mm: make page ref count overflow check tighter and more explicit

    Linus Torvalds
     
  • Change pipe_buf_get() to return a bool indicating whether it succeeded
    in raising the refcount of the page (if the thing in the pipe is a page).
    This removes another mechanism for overflowing the page refcount. All
    callers converted to handle a failure.

    Reported-by: Jann Horn
    Signed-off-by: Matthew Wilcox
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

13 Mar, 2019

1 commit

  • Pull misc vfs updates from Al Viro:
    "Assorted fixes (really no common topic here)"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: Make __vfs_write() static
    vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1
    pipe: stop using ->can_merge
    splice: don't merge into linked buffers
    fs: move generic stat response attr handling to vfs_getattr_nosec
    orangefs: don't reinitialize result_mask in ->getattr
    fs/devpts: always delete dcache dentry-s in dput()

    Linus Torvalds
     

05 Mar, 2019

2 commits

  • The current implementation of splice() and tee() ignores O_NONBLOCK set
    on pipe file descriptors and checks only the SPLICE_F_NONBLOCK flag for
    blocking on pipe arguments. This is inconsistent since splice()-ing
    from/to non-pipe file descriptors does take O_NONBLOCK into
    consideration.

    Fix this by promoting O_NONBLOCK, when set on a pipe, to
    SPLICE_F_NONBLOCK.

    Some context for how the current implementation of splice() leads to
    inconsistent behavior. In the ongoing work[1] to add VM tracing
    capability to trace-cmd we stream tracing data over named FIFOs or
    vsockets from guests back to the host.

    When we receive SIGINT from user to stop tracing, we set O_NONBLOCK on
    the input file descriptor and set SPLICE_F_NONBLOCK for the next call to
    splice(). If splice() was blocked waiting on data from the input FIFO,
    after SIGINT splice() restarts with the same arguments (no
    SPLICE_F_NONBLOCK) and blocks again instead of returning -EAGAIN when no
    data is available.

    This differs from the splice() behavior when reading from a vsocket or
    when we're doing a traditional read()/write() loop (trace-cmd's
    --nosplice argument).

    With this patch applied we get the same behavior in all situations after
    setting O_NONBLOCK which also matches the behavior of doing a
    read()/write() loop instead of splice().

    This change does have potential of breaking users who don't expect
    EAGAIN from splice() when SPLICE_F_NONBLOCK is not set. OTOH programs
    that set O_NONBLOCK and don't anticipate EAGAIN are arguably buggy[2].

    [1] https://github.com/skaslev/trace-cmd/tree/vsock
    [2] https://github.com/torvalds/linux/blob/d47e3da1759230e394096fd742aad423c291ba48/fs/read_write.c#L1425

    Signed-off-by: Slavomir Kaslev
    Reviewed-by: Steven Rostedt (VMware)
    Signed-off-by: Linus Torvalds

    Slavomir Kaslev
     
  • Every in-kernel use of this function defined it to KERNEL_DS (either as
    an actual define, or as an inline function). It's an entirely
    historical artifact, and long long long ago used to actually read the
    segment selector valueof '%ds' on x86.

    Which in the kernel is always KERNEL_DS.

    Inspired by a patch from Jann Horn that just did this for a very small
    subset of users (the ones in fs/), along with Al who suggested a script.
    I then just took it to the logical extreme and removed all the remaining
    gunk.

    Roughly scripted with

    git grep -l '(get_ds())' -- :^tools/ | xargs sed -i 's/(get_ds())/(KERNEL_DS)/'
    git grep -lw 'get_ds' -- :^tools/ | xargs sed -i '/^#define get_ds()/d'

    plus manual fixups to remove a few unusual usage patterns, the couple of
    inline function cases and to fix up a comment that had become stale.

    The 'get_ds()' function remains in an x86 kvm selftest, since in user
    space it actually does something relevant.

    Inspired-by: Jann Horn
    Inspired-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Feb, 2019

2 commits

  • Al Viro pointed out that since there is only one pipe buffer type to which
    new data can be appended, it isn't necessary to have a ->can_merge field in
    struct pipe_buf_operations, we can just check for a magic type.

    Suggested-by: Al Viro
    Signed-off-by: Jann Horn
    Signed-off-by: Al Viro

    Jann Horn
     
  • Before this patch, it was possible for two pipes to affect each other after
    data had been transferred between them with tee():

    ============
    $ cat tee_test.c

    int main(void) {
    int pipe_a[2];
    if (pipe(pipe_a)) err(1, "pipe");
    int pipe_b[2];
    if (pipe(pipe_b)) err(1, "pipe");
    if (write(pipe_a[1], "abcd", 4) != 4) err(1, "write");
    if (tee(pipe_a[0], pipe_b[1], 2, 0) != 2) err(1, "tee");
    if (write(pipe_b[1], "xx", 2) != 2) err(1, "write");

    char buf[5];
    if (read(pipe_a[0], buf, 4) != 4) err(1, "read");
    buf[4] = 0;
    printf("got back: '%s'\n", buf);
    }
    $ gcc -o tee_test tee_test.c
    $ ./tee_test
    got back: 'abxx'
    $
    ============

    As suggested by Al Viro, fix it by creating a separate type for
    non-mergeable pipe buffers, then changing the types of buffers in
    splice_pipe_to_pipe() and link_pipe().

    Cc:
    Fixes: 7c77f0b3f920 ("splice: implement pipe to pipe splicing")
    Fixes: 70524490ee2e ("[PATCH] splice: add support for sys_tee()")
    Suggested-by: Al Viro
    Signed-off-by: Jann Horn
    Signed-off-by: Al Viro

    Jann Horn
     

05 Dec, 2018

1 commit

  • In commit 4721a601099, we tried to fix a problem wherein directio reads
    into a splice pipe will bounce EFAULT/EAGAIN all the way out to
    userspace by simulating a zero-byte short read. This happens because
    some directio read implementations (xfs) will call
    bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
    reads, but as soon as we run out of pipe buffers that _get_pages call
    returns EFAULT, which the splice code translates to EAGAIN and bounces
    out to userspace.

    In that commit, the iomap code catches the EFAULT and simulates a
    zero-byte read, but that causes assertion errors on regular splice reads
    because xfs doesn't allow short directio reads.

    The brokenness is compounded by splice_direct_to_actor immediately
    bailing on do_splice_to returning actor
    (which empties out the pipe), so if userspace calls back we'll EFAULT
    again on the full pipe, and nothing ever gets copied.

    Therefore, teach splice_direct_to_actor to clamp its requests to the
    amount of free space in the pipe and remove the simulated short read
    from the iomap directio code.

    Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill")
    Reported-by: Murphy Zhou
    Ranted-by: Amir Goldstein
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Darrick J. Wong

    Darrick J. Wong
     

24 Oct, 2018

1 commit

  • In the iov_iter struct, separate the iterator type from the iterator
    direction and use accessor functions to access them in most places.

    Convert a bunch of places to use switch-statements to access them rather
    then chains of bitwise-AND statements. This makes it easier to add further
    iterator types. Also, this can be more efficient as to implement a switch
    of small contiguous integers, the compiler can use ~50% fewer compare
    instructions than it has to use bitwise-and instructions.

    Further, cease passing the iterator type into the iterator setup function.
    The iterator function can set that itself. Only the direction is required.

    Signed-off-by: David Howells

    David Howells
     

16 Jun, 2018

1 commit


13 Jun, 2018

1 commit

  • The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
    patch replaces cases of:

    kmalloc(a * b, gfp)

    with:
    kmalloc_array(a * b, gfp)

    as well as handling cases of:

    kmalloc(a * b * c, gfp)

    with:

    kmalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kmalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kmalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The tools/ directory was manually excluded, since it has its own
    implementation of kmalloc().

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kmalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kmalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kmalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kmalloc
    + kmalloc_array
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kmalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kmalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kmalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kmalloc(sizeof(THING) * C2, ...)
    |
    kmalloc(sizeof(TYPE) * C2, ...)
    |
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(C1 * C2, ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

11 Jun, 2018

1 commit


03 Apr, 2018

1 commit

  • Using the fs-internal do_vmsplice() helper allows us to get rid of the
    fs-internal call to the sys_vmsplice() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: Al Viro
    Cc: Andrew Morton
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     

25 Oct, 2017

1 commit

  • …READ_ONCE()/WRITE_ONCE()

    Please do not apply this to mainline directly, instead please re-run the
    coccinelle script shown below and apply its output.

    For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
    preference to ACCESS_ONCE(), and new code is expected to use one of the
    former. So far, there's been no reason to change most existing uses of
    ACCESS_ONCE(), as these aren't harmful, and changing them results in
    churn.

    However, for some features, the read/write distinction is critical to
    correct operation. To distinguish these cases, separate read/write
    accessors must be used. This patch migrates (most) remaining
    ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
    coccinelle script:

    ----
    // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
    // WRITE_ONCE()

    // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

    virtual patch

    @ depends on patch @
    expression E1, E2;
    @@

    - ACCESS_ONCE(E1) = E2
    + WRITE_ONCE(E1, E2)

    @ depends on patch @
    expression E;
    @@

    - ACCESS_ONCE(E)
    + READ_ONCE(E)
    ----

    Signed-off-by: Mark Rutland <mark.rutland@arm.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: davem@davemloft.net
    Cc: linux-arch@vger.kernel.org
    Cc: mpe@ellerman.id.au
    Cc: shuah@kernel.org
    Cc: snitzer@redhat.com
    Cc: thor.thayer@linux.intel.com
    Cc: tj@kernel.org
    Cc: viro@zeniv.linux.org.uk
    Cc: will.deacon@arm.com
    Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Mark Rutland
     

05 Sep, 2017

1 commit


30 Jun, 2017

1 commit


03 May, 2017

1 commit


04 Mar, 2017

1 commit

  • Pull sched.h split-up from Ingo Molnar:
    "The point of these changes is to significantly reduce the
    header footprint, to speed up the kernel build and to
    have a cleaner header structure.

    After these changes the new 's typical preprocessed
    size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K
    lines), which is around 40% faster to build on typical configs.

    Not much changed from the last version (-v2) posted three weeks ago: I
    eliminated quirks, backmerged fixes plus I rebased it to an upstream
    SHA1 from yesterday that includes most changes queued up in -next plus
    all sched.h changes that were pending from Andrew.

    I've re-tested the series both on x86 and on cross-arch defconfigs,
    and did a bisectability test at a number of random points.

    I tried to test as many build configurations as possible, but some
    build breakage is probably still left - but it should be mostly
    limited to architectures that have no cross-compiler binaries
    available on kernel.org, and non-default configurations"

    * 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits)
    sched/headers: Clean up
    sched/headers: Remove #ifdefs from
    sched/headers: Remove the include from
    sched/headers, hrtimer: Remove the include from
    sched/headers, x86/apic: Remove the header inclusion from
    sched/headers, timers: Remove the include from
    sched/headers: Remove from
    sched/headers: Remove from
    sched/core: Remove unused prefetch_stack()
    sched/headers: Remove from
    sched/headers: Remove the 'init_pid_ns' prototype from
    sched/headers: Remove from
    sched/headers: Remove from
    sched/headers: Remove the runqueue_is_locked() prototype
    sched/headers: Remove from
    sched/headers: Remove from
    sched/headers: Remove from
    sched/headers: Remove from
    sched/headers: Remove the include from
    sched/headers: Remove from
    ...

    Linus Torvalds
     

02 Mar, 2017

2 commits


20 Feb, 2017

1 commit


17 Feb, 2017

1 commit

  • Flags (PIPE_BUF_FLAG_PACKET, PIPE_BUF_FLAG_GIFT) could remain on the
    unused part of the pipe ring buffer. Previously splice_to_pipe() left
    the flags value alone, which could result in incorrect behavior.

    Uninitialized flags appears to have been there from the introduction of
    the splice syscall.

    Signed-off-by: Miklos Szeredi
    Cc: # 2.6.17+
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

27 Dec, 2016

3 commits


22 Dec, 2016

1 commit

  • Commit 8924feff66f3 ("splice: lift pipe_lock out of splice_to_pipe()")
    caused a regression when there were no more readers left on a pipe that
    was being spliced into: rather than the expected SIGPIPE and -EPIPE
    return value, the writer would end up waiting forever for space to free
    up (which obviously was not going to happen with no readers around).

    Fixes: 8924feff66f3 ("splice: lift pipe_lock out of splice_to_pipe()")
    Reported-and-tested-by: Andreas Schwab
    Debugged-by: Al Viro
    Cc: stable@kernel.org # v4.9
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

14 Dec, 2016

1 commit

  • Pull block layer updates from Jens Axboe:
    "This is the main block pull request this series. Contrary to previous
    release, I've kept the core and driver changes in the same branch. We
    always ended up having dependencies between the two for obvious
    reasons, so makes more sense to keep them together. That said, I'll
    probably try and keep more topical branches going forward, especially
    for cycles that end up being as busy as this one.

    The major parts of this pull request is:

    - Improved support for O_DIRECT on block devices, with a small
    private implementation instead of using the pig that is
    fs/direct-io.c. From Christoph.

    - Request completion tracking in a scalable fashion. This is utilized
    by two components in this pull, the new hybrid polling and the
    writeback queue throttling code.

    - Improved support for polling with O_DIRECT, adding a hybrid mode
    that combines pure polling with an initial sleep. From me.

    - Support for automatic throttling of writeback queues on the block
    side. This uses feedback from the device completion latencies to
    scale the queue on the block side up or down. From me.

    - Support from SMR drives in the block layer and for SD. From Hannes
    and Shaun.

    - Multi-connection support for nbd. From Josef.

    - Cleanup of request and bio flags, so we have a clear split between
    which are bio (or rq) private, and which ones are shared. From
    Christoph.

    - A set of patches from Bart, that improve how we handle queue
    stopping and starting in blk-mq.

    - Support for WRITE_ZEROES from Chaitanya.

    - Lightnvm updates from Javier/Matias.

    - Supoort for FC for the nvme-over-fabrics code. From James Smart.

    - A bunch of fixes from a whole slew of people, too many to name
    here"

    * 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits)
    blk-stat: fix a few cases of missing batch flushing
    blk-flush: run the queue when inserting blk-mq flush
    elevator: make the rqhash helpers exported
    blk-mq: abstract out blk_mq_dispatch_rq_list() helper
    blk-mq: add blk_mq_start_stopped_hw_queue()
    block: improve handling of the magic discard payload
    blk-wbt: don't throttle discard or write zeroes
    nbd: use dev_err_ratelimited in io path
    nbd: reset the setup task for NBD_CLEAR_SOCK
    nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME
    nvme-fabrics: Add target support for FC transport
    nvme-fabrics: Add host support for FC transport
    nvme-fabrics: Add FC transport LLDD api definitions
    nvme-fabrics: Add FC transport FC-NVME definitions
    nvme-fabrics: Add FC transport error codes to nvme.h
    Add type 0x28 NVME type code to scsi fc headers
    nvme-fabrics: patch target code in prep for FC transport support
    nvme-fabrics: set sqe.command_id in core not transports
    parser: add u64 number parser
    nvme-rdma: align to generic ib_event logging helper
    ...

    Linus Torvalds
     

27 Nov, 2016

1 commit

  • Botched calculation of number of pages. As the result,
    we were dropping pieces when doing splice to pipe from
    e.g. 9p.

    Reported-by: Alexei Starovoitov
    Tested-by: Alexei Starovoitov
    Signed-off-by: Al Viro

    Al Viro
     

11 Nov, 2016

1 commit

  • i_size check is a leftover from the horrors that used to play with
    the page cache in that function. With the switch to ->read_iter(),
    it's neither needed nor correct - for gfs2 it ends up being buggy,
    since i_size is not guaranteed to be correct until later (inside
    ->read_iter()).

    Spotted-by: Abhi Das
    Signed-off-by: Al Viro

    Al Viro
     

01 Nov, 2016

1 commit


11 Oct, 2016

1 commit

  • by making sure we call iov_iter_advance() on original
    iov_iter even if direct_IO (done on its copy) has returned 0.
    It's a no-op for old iov_iter flavours and does the right thing
    (== truncation of the stuff we'd allocated, but not filled) in
    ITER_PIPE case. Failures (e.g. -EIO) get caught and dealt with
    by cleanup in generic_file_read_iter().

    Signed-off-by: Al Viro

    Al Viro
     

06 Oct, 2016

4 commits