13 Feb, 2019

3 commits

  • commit 97e1532ef81acb31c30f9e75bf00306c33a77812 upstream.

    Dereferencing req->page_descs[0] will Oops if req->max_pages is zero.

    Reported-by: syzbot+c1e36d30ee3416289cc0@syzkaller.appspotmail.com
    Tested-by: syzbot+c1e36d30ee3416289cc0@syzkaller.appspotmail.com
    Fixes: b2430d7567a3 ("fuse: add per-page descriptor to fuse_req")
    Cc: # v3.9
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • commit a2ebba824106dabe79937a9f29a875f837e1b6d4 upstream.

    NR_WRITEBACK_TEMP is accounted on the temporary page in the request, not
    the page cache page.

    Fixes: 8b284dc47291 ("fuse: writepages: handle same page rewrites")
    Cc: # v3.13
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • commit 9509941e9c534920ccc4771ae70bd6cbbe79df1c upstream.

    Some of the pipe_buf_release() handlers seem to assume that the pipe is
    locked - in particular, anon_pipe_buf_release() accesses pipe->tmp_page
    without taking any extra locks. From a glance through the callers of
    pipe_buf_release(), it looks like FUSE is the only one that calls
    pipe_buf_release() without having the pipe locked.

    This bug should only lead to a memory leak, nothing terrible.

    Fixes: dd3bb14f44a6 ("fuse: support splice() writing to fuse device")
    Cc: stable@vger.kernel.org
    Signed-off-by: Jann Horn
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     

20 Dec, 2018

1 commit

  • commit 2e64ff154ce6ce9a8dc0f9556463916efa6ff460 upstream.

    When FUSE_OPEN returns ENOSYS, the no_open bit is set on the connection.

    Because the FUSE_RELEASE and FUSE_RELEASEDIR paths share code, this
    incorrectly caused the FUSE_RELEASEDIR request to be dropped and never sent
    to userspace.

    Pass an isdir bool to distinguish between FUSE_RELEASE and FUSE_RELEASEDIR
    inside of fuse_file_put.

    Fixes: 7678ac50615d ("fuse: support clients that don't implement 'open'")
    Cc: # v3.14
    Signed-off-by: Chad Austin
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Chad Austin
     

21 Nov, 2018

7 commits

  • commit 2d84a2d19b6150c6dbac1e6ebad9c82e4c123772 upstream.

    In current fuse_drop_waiting() implementation it's possible that
    fuse_wait_aborted() will not be woken up in the unlikely case that
    fuse_abort_conn() + fuse_wait_aborted() runs in between checking
    fc->connected and calling atomic_dec(&fc->num_waiting).

    Do the atomic_dec_and_test() unconditionally, which also provides the
    necessary barrier against reordering with the fc->connected check.

    The explicit smp_mb() in fuse_wait_aborted() is not actually needed, since
    the spin_unlock() in fuse_abort_conn() provides the necessary RELEASE
    barrier after resetting fc->connected. However, this is not a performance
    sensitive path, and adding the explicit barrier makes it easier to
    document.

    Signed-off-by: Miklos Szeredi
    Fixes: b8f95e5d13f5 ("fuse: umount should wait for all requests")
    Cc: #v4.19
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • commit 7fabaf303458fcabb694999d6fa772cc13d4e217 upstream.

    fuse_request_send_notify_reply() may fail if the connection was reset for
    some reason (e.g. fs was unmounted). Don't leak request reference in this
    case. Besides leaking memory, this resulted in fc->num_waiting not being
    decremented and hence fuse_wait_aborted() left in a hanging and unkillable
    state.

    Fixes: 2d45ba381a74 ("fuse: add retrieve request")
    Fixes: b8f95e5d13f5 ("fuse: umount should wait for all requests")
    Reported-and-tested-by: syzbot+6339eda9cb4ebbc4c37b@syzkaller.appspotmail.com
    Signed-off-by: Miklos Szeredi
    Cc: #v2.6.36
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • commit ebacb81273599555a7a19f7754a1451206a5fc4f upstream.

    In async IO blocking case the additional reference to the io is taken for
    it to survive fuse_aio_complete(). In non blocking case this additional
    reference is not needed, however we still reference io to figure out
    whether to wait for completion or not. This is wrong and will lead to
    use-after-free. Fix it by storing blocking information in separate
    variable.

    This was spotted by KASAN when running generic/208 fstest.

    Signed-off-by: Lukas Czerner
    Reported-by: Zorro Lang
    Signed-off-by: Miklos Szeredi
    Fixes: 744742d692e3 ("fuse: Add reference counting for fuse_io_priv")
    Cc: # v4.6
    Signed-off-by: Greg Kroah-Hartman

    Lukas Czerner
     
  • commit 4c316f2f3ff315cb48efb7435621e5bfb81df96d upstream.

    Otherwise fuse_dev_do_write() could come in and finish off the request, and
    the set_bit(FR_SENT, ...) could trigger the WARN_ON(test_bit(FR_SENT, ...))
    in request_end().

    Signed-off-by: Miklos Szeredi
    Reported-by: syzbot+ef054c4d3f64cd7f7cec@syzkaller.appspotmai
    Fixes: 46c34a348b0a ("fuse: no fc->lock for pqueue parts")
    Cc: # v4.2
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • commit 908a572b80f6e9577b45e81b3dfe2e22111286b8 upstream.

    Using waitqueue_active() is racy. Make sure we issue a wake_up()
    unconditionally after storing into fc->blocked. After that it's okay to
    optimize with waitqueue_active() since the first wake up provides the
    necessary barrier for all waiters, not the just the woken one.

    Signed-off-by: Miklos Szeredi
    Fixes: 3c18ef8117f0 ("fuse: optimize wake_up")
    Cc: # v3.10
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • commit d2d2d4fb1f54eff0f3faa9762d84f6446a4bc5d0 upstream.

    After we found req in request_find() and released the lock,
    everything may happen with the req in parallel:

    cpu0 cpu1
    fuse_dev_do_write() fuse_dev_do_write()
    req = request_find(fpq, ...) ...
    spin_unlock(&fpq->lock) ...
    ... req = request_find(fpq, oh.unique)
    ... spin_unlock(&fpq->lock)
    queue_interrupt(&fc->iq, req); ...
    ... ...
    ... ...
    request_end(fc, req);
    fuse_put_request(fc, req);
    ... queue_interrupt(&fc->iq, req);

    Signed-off-by: Kirill Tkhai
    Signed-off-by: Miklos Szeredi
    Fixes: 46c34a348b0a ("fuse: no fc->lock for pqueue parts")
    Cc: # v4.2
    Signed-off-by: Greg Kroah-Hartman

    Kirill Tkhai
     
  • commit bc78abbd55dd28e2287ec6d6502b842321a17c87 upstream.

    We may pick freed req in this way:

    [cpu0] [cpu1]
    fuse_dev_do_read() fuse_dev_do_write()
    list_move_tail(&req->list, ...); ...
    spin_unlock(&fpq->lock); ...
    ... request_end(fc, req);
    ... fuse_put_request(fc, req);
    if (test_bit(FR_INTERRUPTED, ...))
    queue_interrupt(fiq, req);

    Fix that by keeping req alive until we finish all manipulations.

    Reported-by: syzbot+4e975615ca01f2277bdd@syzkaller.appspotmail.com
    Signed-off-by: Kirill Tkhai
    Signed-off-by: Miklos Szeredi
    Fixes: 46c34a348b0a ("fuse: no fc->lock for pqueue parts")
    Cc: # v4.2
    Signed-off-by: Greg Kroah-Hartman

    Kirill Tkhai
     

22 Aug, 2018

2 commits

  • Pull fuse update from Miklos Szeredi:
    "Various bug fixes and cleanups"

    * tag 'fuse-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: reduce allocation size for splice_write
    fuse: use kvmalloc to allocate array of pipe_buffer structs.
    fuse: convert last timespec use to timespec64
    fs: fuse: Adding new return type vm_fault_t
    fuse: simplify fuse_abort_conn()
    fuse: Add missed unlock_page() to fuse_readpages_fill()
    fuse: Don't access pipe->buffers without pipe_lock()
    fuse: fix initial parallel dirops
    fuse: Fix oops at process_init_reply()
    fuse: umount should wait for all requests
    fuse: fix unlocked access to processing queue
    fuse: fix double request_end()

    Linus Torvalds
     
  • …iederm/user-namespace

    Pull core signal handling updates from Eric Biederman:
    "It was observed that a periodic timer in combination with a
    sufficiently expensive fork could prevent fork from every completing.
    This contains the changes to remove the need for that restart.

    This set of changes is split into several parts:

    - The first part makes PIDTYPE_TGID a proper pid type instead
    something only for very special cases. The part starts using
    PIDTYPE_TGID enough so that in __send_signal where signals are
    actually delivered we know if the signal is being sent to a a group
    of processes or just a single process.

    - With that prep work out of the way the logic in fork is modified so
    that fork logically makes signals received while it is running
    appear to be received after the fork completes"

    * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (22 commits)
    signal: Don't send signals to tasks that don't exist
    signal: Don't restart fork when signals come in.
    fork: Have new threads join on-going signal group stops
    fork: Skip setting TIF_SIGPENDING in ptrace_init_task
    signal: Add calculate_sigpending()
    fork: Unconditionally exit if a fatal signal is pending
    fork: Move and describe why the code examines PIDNS_ADDING
    signal: Push pid type down into complete_signal.
    signal: Push pid type down into __send_signal
    signal: Push pid type down into send_signal
    signal: Pass pid type into do_send_sig_info
    signal: Pass pid type into send_sigio_to_task & send_sigurg_to_task
    signal: Pass pid type into group_send_sig_info
    signal: Pass pid and pid type into send_sigqueue
    posix-timers: Noralize good_sigevent
    signal: Use PIDTYPE_TGID to clearly store where file signals will be sent
    pid: Implement PIDTYPE_TGID
    pids: Move the pgrp and session pid pointers from task_struct to signal_struct
    kvm: Don't open code task_pid in kvm_vcpu_ioctl
    pids: Compute task_tgid using signal->leader_pid
    ...

    Linus Torvalds
     

14 Aug, 2018

1 commit

  • Pull vfs icache updates from Al Viro:

    - NFS mkdir/open_by_handle race fix

    - analogous solution for FUSE, replacing the one currently in mainline

    - new primitive to be used when discarding halfway set up inodes on
    failed object creation; gives sane warranties re icache lookups not
    returning such doomed by still not freed inodes. A bunch of
    filesystems switched to that animal.

    - Miklos' fix for last cycle regression in iget5_locked(); -stable will
    need a slightly different variant, unfortunately.

    - misc bits and pieces around things icache-related (in adfs and jfs).

    * 'work.mkdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    jfs: don't bother with make_bad_inode() in ialloc()
    adfs: don't put inodes into icache
    new helper: inode_fake_hash()
    vfs: don't evict uninitialized inode
    jfs: switch to discard_new_inode()
    ext2: make sure that partially set up inodes won't be returned by ext2_iget()
    udf: switch to discard_new_inode()
    ufs: switch to discard_new_inode()
    btrfs: switch to discard_new_inode()
    new primitive: discard_new_inode()
    kill d_instantiate_no_diralias()
    nfs_instantiate(): prevent multiple aliases for directory inode

    Linus Torvalds
     

02 Aug, 2018

1 commit

  • The only user is fuse_create_new_entry(), and there it's used to
    mitigate the same mkdir/open-by-handle race as in nfs_mkdir().
    The same solution applies - unhash the mkdir argument, then
    call d_splice_alias() and if that returns a reference to preexisting
    alias, dput() and report success. ->mkdir() argument left unhashed
    negative with the preexisting alias moved in the right place is just
    fine from the ->mkdir() callers point of view.

    Cc: Miklos Szeredi
    Signed-off-by: Al Viro

    Al Viro
     

26 Jul, 2018

12 commits

  • The 'bufs' array contains 'pipe->buffers' elements, but the
    fuse_dev_splice_write() uses only 'pipe->nrbufs' elements.

    So reduce the allocation size to 'pipe->nrbufs' elements.

    Signed-off-by: Andrey Ryabinin
    Signed-off-by: Miklos Szeredi

    Andrey Ryabinin
     
  • The amount of pipe->buffers is basically controlled by userspace by
    fcntl(... F_SETPIPE_SZ ...) so it could be large. High order allocations
    could be slow (if memory is heavily fragmented) or may fail if the order
    is larger than PAGE_ALLOC_COSTLY_ORDER.

    Since the 'bufs' doesn't need to be physically contiguous, use
    the kvmalloc_array() to allocate memory. If high order
    page isn't available, the kvamalloc*() will fallback to 0-order.

    Signed-off-by: Andrey Ryabinin
    Signed-off-by: Miklos Szeredi

    Andrey Ryabinin
     
  • All of fuse uses 64-bit timestamps with the exception of the
    fuse_change_attributes(), so let's convert this one as well.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Miklos Szeredi

    Arnd Bergmann
     
  • Use new return type vm_fault_t for fault handler in struct
    vm_operations_struct. For now, this is just documenting that the function
    returns a VM_FAULT value rather than an errno. Once all instances are
    converted, vm_fault_t will become a distinct type.

    commit 1c8f422059ae ("mm: change return type to vm_fault_t")

    Signed-off-by: Souptick Joarder
    Reviewed-by: Matthew Wilcox
    Signed-off-by: Miklos Szeredi

    Souptick Joarder
     
  • Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • The above error path returns with page unlocked, so this place seems also
    to behave the same.

    Fixes: f8dbdf81821b ("fuse: rework fuse_readpages()")
    Signed-off-by: Kirill Tkhai
    Signed-off-by: Miklos Szeredi

    Kirill Tkhai
     
  • fuse_dev_splice_write() reads pipe->buffers to determine the size of
    'bufs' array before taking the pipe_lock(). This is not safe as
    another thread might change the 'pipe->buffers' between the allocation
    and taking the pipe_lock(). So we end up with too small 'bufs' array.

    Move the bufs allocations inside pipe_lock()/pipe_unlock() to fix this.

    Fixes: dd3bb14f44a6 ("fuse: support splice() writing to fuse device")
    Signed-off-by: Andrey Ryabinin
    Cc: # v2.6.35
    Signed-off-by: Miklos Szeredi

    Andrey Ryabinin
     
  • If parallel dirops are enabled in FUSE_INIT reply, then first operation may
    leave fi->mutex held.

    Reported-by: syzbot
    Fixes: 5c672ab3f0ee ("fuse: serialize dirops by default")
    Cc: # v4.7
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • syzbot is hitting NULL pointer dereference at process_init_reply().
    This is because deactivate_locked_super() is called before response for
    initial request is processed.

    Fix this by aborting and waiting for all requests (including FUSE_INIT)
    before resetting fc->sb.

    Original patch by Tetsuo Handa .

    Reported-by: syzbot
    Fixes: e27c9d3877a0 ("fuse: fuse: add time_gran to INIT_OUT")
    Cc: # v3.19
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • fuse_abort_conn() does not guarantee that all async requests have actually
    finished aborting (i.e. their ->end() function is called). This could
    actually result in still used inodes after umount.

    Add a helper to wait until all requests are fully done. This is done by
    looking at the "num_waiting" counter. When this counter drops to zero, we
    can be sure that no more requests are outstanding.

    Fixes: 0d8e84b0432b ("fuse: simplify request abort")
    Cc: # v4.2
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • fuse_dev_release() assumes that it's the only one referencing the
    fpq->processing list, but that's not true, since fuse_abort_conn() can be
    doing the same without any serialization between the two.

    Fixes: c3696046beb3 ("fuse: separate pqueue for clones")
    Cc: # v4.2
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Refcounting of request is broken when fuse_abort_conn() is called and
    request is on the fpq->io list:

    - ref is taken too late
    - then it is not dropped

    Fixes: 0d8e84b0432b ("fuse: simplify request abort")
    Cc: # v4.2
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

21 Jul, 2018

1 commit


12 Jul, 2018

4 commits


15 Jun, 2018

1 commit

  • Pull inode timestamps conversion to timespec64 from Arnd Bergmann:
    "This is a late set of changes from Deepa Dinamani doing an automated
    treewide conversion of the inode and iattr structures from 'timespec'
    to 'timespec64', to push the conversion from the VFS layer into the
    individual file systems.

    As Deepa writes:

    'The series aims to switch vfs timestamps to use struct timespec64.
    Currently vfs uses struct timespec, which is not y2038 safe.

    The series involves the following:
    1. Add vfs helper functions for supporting struct timepec64
    timestamps.
    2. Cast prints of vfs timestamps to avoid warnings after the switch.
    3. Simplify code using vfs timestamps so that the actual replacement
    becomes easy.
    4. Convert vfs timestamps to use struct timespec64 using a script.
    This is a flag day patch.

    Next steps:
    1. Convert APIs that can handle timespec64, instead of converting
    timestamps at the boundaries.
    2. Update internal data structures to avoid timestamp conversions'

    Thomas Gleixner adds:

    'I think there is no point to drag that out for the next merge
    window. The whole thing needs to be done in one go for the core
    changes which means that you're going to play that catchup game
    forever. Let's get over with it towards the end of the merge window'"

    * tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
    pstore: Remove bogus format string definition
    vfs: change inode times to use struct timespec64
    pstore: Convert internal records to timespec64
    udf: Simplify calls to udf_disk_stamp_to_time
    fs: nfs: get rid of memcpys for inode times
    ceph: make inode time prints to be long long
    lustre: Use long long type to print inode time
    fs: add timespec64_truncate()

    Linus Torvalds
     

13 Jun, 2018

1 commit

  • The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
    patch replaces cases of:

    kmalloc(a * b, gfp)

    with:
    kmalloc_array(a * b, gfp)

    as well as handling cases of:

    kmalloc(a * b * c, gfp)

    with:

    kmalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kmalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kmalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The tools/ directory was manually excluded, since it has its own
    implementation of kmalloc().

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kmalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kmalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kmalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kmalloc
    + kmalloc_array
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kmalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kmalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kmalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kmalloc(sizeof(THING) * C2, ...)
    |
    kmalloc(sizeof(TYPE) * C2, ...)
    |
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(C1 * C2, ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

07 Jun, 2018

1 commit

  • Pull fuse updates from Miklos Szeredi:
    "The most interesting part of this update is user namespace support,
    mostly done by Eric Biederman. This enables safe unprivileged fuse
    mounts within a user namespace.

    There are also a couple of fixes for bugs found by syzbot and
    miscellaneous fixes and cleanups"

    * tag 'fuse-update-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: don't keep dead fuse_conn at fuse_fill_super().
    fuse: fix control dir setup and teardown
    fuse: fix congested state leak on aborted connections
    fuse: Allow fully unprivileged mounts
    fuse: Ensure posix acls are translated outside of init_user_ns
    fuse: add writeback documentation
    fuse: honor AT_STATX_FORCE_SYNC
    fuse: honor AT_STATX_DONT_SYNC
    fuse: Restrict allow_other to the superblock's namespace or a descendant
    fuse: Support fuse filesystems outside of init_user_ns
    fuse: Fail all requests with invalid uids or gids
    fuse: Remove the buggy retranslation of pids in fuse_dev_do_read
    fuse: return -ECONNABORTED on /dev/fuse read after abort
    fuse: atomic_o_trunc should truncate pagecache

    Linus Torvalds
     

06 Jun, 2018

1 commit

  • struct timespec is not y2038 safe. Transition vfs to use
    y2038 safe struct timespec64 instead.

    The change was made with the help of the following cocinelle
    script. This catches about 80% of the changes.
    All the header file and logic changes are included in the
    first 5 rules. The rest are trivial substitutions.
    I avoid changing any of the function signatures or any other
    filesystem specific data structures to keep the patch simple
    for review.

    The script can be a little shorter by combining different cases.
    But, this version was sufficient for my usecase.

    virtual patch

    @ depends on patch @
    identifier now;
    @@
    - struct timespec
    + struct timespec64
    current_time ( ... )
    {
    - struct timespec now = current_kernel_time();
    + struct timespec64 now = current_kernel_time64();
    ...
    - return timespec_trunc(
    + return timespec64_trunc(
    ... );
    }

    @ depends on patch @
    identifier xtime;
    @@
    struct \( iattr \| inode \| kstat \) {
    ...
    - struct timespec xtime;
    + struct timespec64 xtime;
    ...
    }

    @ depends on patch @
    identifier t;
    @@
    struct inode_operations {
    ...
    int (*update_time) (...,
    - struct timespec t,
    + struct timespec64 t,
    ...);
    ...
    }

    @ depends on patch @
    identifier t;
    identifier fn_update_time =~ "update_time$";
    @@
    fn_update_time (...,
    - struct timespec *t,
    + struct timespec64 *t,
    ...) { ... }

    @ depends on patch @
    identifier t;
    @@
    lease_get_mtime( ... ,
    - struct timespec *t
    + struct timespec64 *t
    ) { ... }

    @te depends on patch forall@
    identifier ts;
    local idexpression struct inode *inode_node;
    identifier i_xtime =~ "^i_[acm]time$";
    identifier ia_xtime =~ "^ia_[acm]time$";
    identifier fn_update_time =~ "update_time$";
    identifier fn;
    expression e, E3;
    local idexpression struct inode *node1;
    local idexpression struct inode *node2;
    local idexpression struct iattr *attr1;
    local idexpression struct iattr *attr2;
    local idexpression struct iattr attr;
    identifier i_xtime1 =~ "^i_[acm]time$";
    identifier i_xtime2 =~ "^i_[acm]time$";
    identifier ia_xtime1 =~ "^ia_[acm]time$";
    identifier ia_xtime2 =~ "^ia_[acm]time$";
    @@
    (
    (
    - struct timespec ts;
    + struct timespec64 ts;
    |
    - struct timespec ts = current_time(inode_node);
    + struct timespec64 ts = current_time(inode_node);
    )

    i_xtime, &ts)
    + timespec64_equal(&inode_node->i_xtime, &ts)
    |
    - timespec_equal(&ts, &inode_node->i_xtime)
    + timespec64_equal(&ts, &inode_node->i_xtime)
    |
    - timespec_compare(&inode_node->i_xtime, &ts)
    + timespec64_compare(&inode_node->i_xtime, &ts)
    |
    - timespec_compare(&ts, &inode_node->i_xtime)
    + timespec64_compare(&ts, &inode_node->i_xtime)
    |
    ts = current_time(e)
    |
    fn_update_time(..., &ts,...)
    |
    inode_node->i_xtime = ts
    |
    node1->i_xtime = ts
    |
    ts = inode_node->i_xtime
    |
    ia_xtime ...+> = ts
    |
    ts = attr1->ia_xtime
    |
    ts.tv_sec
    |
    ts.tv_nsec
    |
    btrfs_set_stack_timespec_sec(..., ts.tv_sec)
    |
    btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
    |
    - ts = timespec64_to_timespec(
    + ts =
    ...
    -)
    |
    - ts = ktime_to_timespec(
    + ts = ktime_to_timespec64(
    ...)
    |
    - ts = E3
    + ts = timespec_to_timespec64(E3)
    |
    - ktime_get_real_ts(&ts)
    + ktime_get_real_ts64(&ts)
    |
    fn(...,
    - ts
    + timespec64_to_timespec(ts)
    ,...)
    )
    ...+>
    (

    )
    |
    - timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
    + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
    |
    - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
    + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
    |
    - timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
    + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
    |
    node1->i_xtime1 =
    - timespec_trunc(attr1->ia_xtime1,
    + timespec64_trunc(attr1->ia_xtime1,
    ...)
    |
    - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
    + attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2,
    ...)
    |
    - ktime_get_real_ts(&attr1->ia_xtime1)
    + ktime_get_real_ts64(&attr1->ia_xtime1)
    |
    - ktime_get_real_ts(&attr.ia_xtime1)
    + ktime_get_real_ts64(&attr.ia_xtime1)
    )

    @ depends on patch @
    struct inode *node;
    struct iattr *attr;
    identifier fn;
    identifier i_xtime =~ "^i_[acm]time$";
    identifier ia_xtime =~ "^ia_[acm]time$";
    expression e;
    @@
    (
    - fn(node->i_xtime);
    + fn(timespec64_to_timespec(node->i_xtime));
    |
    fn(...,
    - node->i_xtime);
    + timespec64_to_timespec(node->i_xtime));
    |
    - e = fn(attr->ia_xtime);
    + e = fn(timespec64_to_timespec(attr->ia_xtime));
    )

    @ depends on patch forall @
    struct inode *node;
    struct iattr *attr;
    identifier i_xtime =~ "^i_[acm]time$";
    identifier ia_xtime =~ "^ia_[acm]time$";
    identifier fn;
    @@
    {
    + struct timespec ts;
    i_xtime);
    fn (...,
    - &node->i_xtime,
    + &ts,
    ...);
    |
    + ts = timespec64_to_timespec(attr->ia_xtime);
    fn (...,
    - &attr->ia_xtime,
    + &ts,
    ...);
    )
    ...+>
    }

    @ depends on patch forall @
    struct inode *node;
    struct iattr *attr;
    struct kstat *stat;
    identifier ia_xtime =~ "^ia_[acm]time$";
    identifier i_xtime =~ "^i_[acm]time$";
    identifier xtime =~ "^[acm]time$";
    identifier fn, ret;
    @@
    {
    + struct timespec ts;
    i_xtime);
    ret = fn (...,
    - &node->i_xtime,
    + &ts,
    ...);
    |
    + ts = timespec64_to_timespec(node->i_xtime);
    ret = fn (...,
    - &node->i_xtime);
    + &ts);
    |
    + ts = timespec64_to_timespec(attr->ia_xtime);
    ret = fn (...,
    - &attr->ia_xtime,
    + &ts,
    ...);
    |
    + ts = timespec64_to_timespec(attr->ia_xtime);
    ret = fn (...,
    - &attr->ia_xtime);
    + &ts);
    |
    + ts = timespec64_to_timespec(stat->xtime);
    ret = fn (...,
    - &stat->xtime);
    + &ts);
    )
    ...+>
    }

    @ depends on patch @
    struct inode *node;
    struct inode *node2;
    identifier i_xtime1 =~ "^i_[acm]time$";
    identifier i_xtime2 =~ "^i_[acm]time$";
    identifier i_xtime3 =~ "^i_[acm]time$";
    struct iattr *attrp;
    struct iattr *attrp2;
    struct iattr attr ;
    identifier ia_xtime1 =~ "^ia_[acm]time$";
    identifier ia_xtime2 =~ "^ia_[acm]time$";
    struct kstat *stat;
    struct kstat stat1;
    struct timespec64 ts;
    identifier xtime =~ "^[acmb]time$";
    expression e;
    @@
    (
    ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ;
    |
    node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
    |
    node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
    |
    node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
    |
    stat->xtime = node2->i_xtime1;
    |
    stat1.xtime = node2->i_xtime1;
    |
    ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ;
    |
    ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
    |
    - e = node->i_xtime1;
    + e = timespec64_to_timespec( node->i_xtime1 );
    |
    - e = attrp->ia_xtime1;
    + e = timespec64_to_timespec( attrp->ia_xtime1 );
    |
    node->i_xtime1 = current_time(...);
    |
    node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
    - e;
    + timespec_to_timespec64(e);
    |
    node->i_xtime1 = node->i_xtime3 =
    - e;
    + timespec_to_timespec64(e);
    |
    - node->i_xtime1 = e;
    + node->i_xtime1 = timespec_to_timespec64(e);
    )

    Signed-off-by: Deepa Dinamani
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:

    Deepa Dinamani
     

31 May, 2018

4 commits

  • syzbot is reporting use-after-free at fuse_kill_sb_blk() [1].
    Since sb->s_fs_info field is not cleared after fc was released by
    fuse_conn_put() when initialization failed, fuse_kill_sb_blk() finds
    already released fc and tries to hold the lock. Fix this by clearing
    sb->s_fs_info field after calling fuse_conn_put().

    [1] https://syzkaller.appspot.com/bug?id=a07a680ed0a9290585ca424546860464dd9658db

    Signed-off-by: Tetsuo Handa
    Reported-by: syzbot
    Fixes: 3b463ae0c626 ("fuse: invalidation reverse calls")
    Cc: John Muir
    Cc: Csaba Henk
    Cc: Anand Avati
    Cc: # v2.6.31
    Signed-off-by: Miklos Szeredi

    Tetsuo Handa
     
  • syzbot is reporting NULL pointer dereference at fuse_ctl_remove_conn() [1].
    Since fc->ctl_ndents is incremented by fuse_ctl_add_conn() when new_inode()
    failed, fuse_ctl_remove_conn() reaches an inode-less dentry and tries to
    clear d_inode(dentry)->i_private field.

    Fix by only adding the dentry to the array after being fully set up.

    When tearing down the control directory, do d_invalidate() on it to get rid
    of any mounts that might have been added.

    [1] https://syzkaller.appspot.com/bug?id=f396d863067238959c91c0b7cfc10b163638cac6
    Reported-by: syzbot
    Fixes: bafa96541b25 ("[PATCH] fuse: add control filesystem")
    Cc: # v2.6.18
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • If a connection gets aborted while congested, FUSE can leave
    nr_wb_congested[] stuck until reboot causing wait_iff_congested() to
    wait spuriously which can lead to severe performance degradation.

    The leak is caused by gating congestion state clearing with
    fc->connected test in request_end(). This was added way back in 2009
    by 26c3679101db ("fuse: destroy bdi on umount"). While the commit
    description doesn't explain why the test was added, it most likely was
    to avoid dereferencing bdi after it got destroyed.

    Since then, bdi lifetime rules have changed many times and now we're
    always guaranteed to have access to the bdi while the superblock is
    alive (fc->sb).

    Drop fc->connected conditional to avoid leaking congestion states.

    Signed-off-by: Tejun Heo
    Reported-by: Joshua Miller
    Cc: Johannes Weiner
    Cc: stable@vger.kernel.org # v2.6.29+
    Acked-by: Jan Kara
    Signed-off-by: Miklos Szeredi

    Tejun Heo
     
  • Now that the fuse and the vfs work is complete. Allow the fuse filesystem
    to be mounted by the root user in a user namespace.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Miklos Szeredi

    Eric W. Biederman