12 Feb, 2018

1 commit

  • This is the mindless scripted replacement of kernel use of POLL*
    variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
    L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
    for f in $L; do sed -i "-es/^\([^\"]*\)\(\\)/\\1E\\2/" $f; done
    done

    with de-mangling cleanups yet to come.

    NOTE! On almost all architectures, the EPOLL* constants have the same
    values as the POLL* constants do. But they keyword here is "almost".
    For various bad reasons they aren't the same, and epoll() doesn't
    actually work quite correctly in some cases due to this on Sparc et al.

    The next patch from Al will sort out the final differences, and we
    should be all done.

    Scripted-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

30 Nov, 2017

1 commit


28 Nov, 2017

1 commit


14 Sep, 2017

1 commit


12 Sep, 2017

3 commits

  • The refreshed argument isn't used by any caller, get rid of it.

    Use a helper for just updating the inode (no need to fill in a kstat).

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • If the IOCB_DSYNC flag is set a sync is not being performed by
    fuse_file_write_iter.

    Honor IOCB_DSYNC/IOCB_SYNC by setting O_DYSNC/O_SYNC respectively in the
    flags filed of the write request.

    We don't need to sync data or metadata, since fuse_perform_write() does
    write-through and the filesystem is responsible for updating file times.

    Original patch by Vitaly Zolotusky.

    Reported-by: Nate Clark
    Cc: Vitaly Zolotusky .
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Commit 0b6e9ea041e6 ("fuse: Add support for pid namespaces") broke
    Sandstorm.io development tools, which have been sending FUSE file
    descriptors across PID namespace boundaries since early 2014.

    The above patch added a check that prevented I/O on the fuse device file
    descriptor if the pid namespace of the reader/writer was different from the
    pid namespace of the mounter. With this change passing the device file
    descriptor to a different pid namespace simply doesn't work. The check was
    added because pids are transferred to/from the fuse userspace server in the
    namespace registered at mount time.

    To fix this regression, remove the checks and do the following:

    1) the pid in the request header (the pid of the task that initiated the
    filesystem operation) is translated to the reader's pid namespace. If a
    mapping doesn't exist for this pid, then a zero pid is used. Note: even if
    a mapping would exist between the initiator task's pid namespace and the
    reader's pid namespace the pid will be zero if either mapping from
    initator's to mounter's namespace or mapping from mounter's to reader's
    namespace doesn't exist.

    2) The lk.pid value in setlk/setlkw requests and getlk reply is left alone.
    Userspace should not interpret this value anyway. Also allow the
    setlk/setlkw operations if the pid of the task cannot be represented in the
    mounter's namespace (pid being zero in that case).

    Reported-by: Kenton Varda
    Signed-off-by: Miklos Szeredi
    Fixes: 0b6e9ea041e6 ("fuse: Add support for pid namespaces")
    Cc: # v4.12+
    Cc: Eric W. Biederman
    Cc: Seth Forshee

    Miklos Szeredi
     

07 Sep, 2017

2 commits

  • Pull writeback error handling updates from Jeff Layton:
    "This pile continues the work from last cycle on better tracking
    writeback errors. In v4.13 we added some basic errseq_t infrastructure
    and converted a few filesystems to use it.

    This set continues refining that infrastructure, adds documentation,
    and converts most of the other filesystems to use it. The main
    exception at this point is the NFS client"

    * tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
    ecryptfs: convert to file_write_and_wait in ->fsync
    mm: remove optimizations based on i_size in mapping writeback waits
    fs: convert a pile of fsync routines to errseq_t based reporting
    gfs2: convert to errseq_t based writeback error reporting for fsync
    fs: convert sync_file_range to use errseq_t based error-tracking
    mm: add file_fdatawait_range and file_write_and_wait
    fuse: convert to errseq_t based error tracking for fsync
    mm: consolidate dax / non-dax checks for writeback
    Documentation: add some docs for errseq_t
    errseq: rename __errseq_set to errseq_set

    Linus Torvalds
     
  • Pull file locking updates from Jeff Layton:
    "This pile just has a few file locking fixes from Ben Coddington. There
    are a couple of cleanup patches + an attempt to bring sanity to the
    l_pid value that is reported back to userland on an F_GETLK request.

    After a few gyrations, he came up with a way for filesystems to
    communicate to the VFS layer code whether the pid should be translated
    according to the namespace or presented as-is to userland"

    * tag 'locks-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
    locks: restore a warn for leaked locks on close
    fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks
    fs/locks: Use allocation rather than the stack in fcntl_getlk()

    Linus Torvalds
     

11 Aug, 2017

1 commit


03 Aug, 2017

1 commit

  • Commit 8fba54aebbdf ("fuse: direct-io: don't dirty ITER_BVEC pages") fixes
    the ITER_BVEC page deadlock for direct io in fuse by checking in
    fuse_direct_io(), whether the page is a bvec page or not, before locking
    it. However, this check is missed when the "async_dio" mount option is
    enabled. In this case, set_page_dirty_lock() is called from the req->end
    callback in request_end(), when the fuse thread is returning from userspace
    to respond to the read request. This will cause the same deadlock because
    the bvec condition is not checked in this path.

    Here is the stack of the deadlocked thread, while returning from userspace:

    [13706.656686] INFO: task glusterfs:3006 blocked for more than 120 seconds.
    [13706.657808] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
    this message.
    [13706.658788] glusterfs D ffffffff816c80f0 0 3006 1
    0x00000080
    [13706.658797] ffff8800d6713a58 0000000000000086 ffff8800d9ad7000
    ffff8800d9ad5400
    [13706.658799] ffff88011ffd5cc0 ffff8800d6710008 ffff88011fd176c0
    7fffffffffffffff
    [13706.658801] 0000000000000002 ffffffff816c80f0 ffff8800d6713a78
    ffffffff816c790e
    [13706.658803] Call Trace:
    [13706.658809] [] ? bit_wait_io_timeout+0x80/0x80
    [13706.658811] [] schedule+0x3e/0x90
    [13706.658813] [] schedule_timeout+0x1b5/0x210
    [13706.658816] [] ? gup_pud_range+0x1db/0x1f0
    [13706.658817] [] ? kvm_clock_read+0x1e/0x20
    [13706.658819] [] ? kvm_clock_get_cycles+0x9/0x10
    [13706.658822] [] ? ktime_get+0x52/0xc0
    [13706.658824] [] io_schedule_timeout+0xa4/0x110
    [13706.658826] [] bit_wait_io+0x36/0x50
    [13706.658828] [] __wait_on_bit_lock+0x76/0xb0
    [13706.658831] [] ? lock_request+0x46/0x70 [fuse]
    [13706.658834] [] __lock_page+0xaa/0xb0
    [13706.658836] [] ? wake_atomic_t_function+0x40/0x40
    [13706.658838] [] set_page_dirty_lock+0x58/0x60
    [13706.658841] [] fuse_release_user_pages+0x58/0x70 [fuse]
    [13706.658844] [] ? fuse_aio_complete+0x190/0x190 [fuse]
    [13706.658847] [] fuse_aio_complete_req+0x29/0x90 [fuse]
    [13706.658849] [] request_end+0xd9/0x190 [fuse]
    [13706.658852] [] fuse_dev_do_write+0x336/0x490 [fuse]
    [13706.658854] [] fuse_dev_write+0x6e/0xa0 [fuse]
    [13706.658857] [] ? security_file_permission+0x23/0x90
    [13706.658859] [] do_iter_readv_writev+0x60/0x90
    [13706.658862] [] ? fuse_dev_splice_write+0x350/0x350
    [fuse]
    [13706.658863] [] do_readv_writev+0x171/0x1f0
    [13706.658866] [] ? try_to_wake_up+0x210/0x210
    [13706.658868] [] vfs_writev+0x41/0x50
    [13706.658870] [] SyS_writev+0x56/0xf0
    [13706.658872] [] ? syscall_trace_leave+0xf1/0x160
    [13706.658874] [] system_call_fastpath+0x12/0x71

    Fix this by making should_dirty a fuse_io_priv parameter that can be
    checked in fuse_aio_complete_req().

    Reported-by: Tiger Yang
    Signed-off-by: Ashish Samant
    Signed-off-by: Miklos Szeredi

    Ashish Samant
     

01 Aug, 2017

1 commit


16 Jul, 2017

1 commit

  • Since commit c69899a17ca4 "NFSv4: Update of VFS byte range lock must be
    atomic with the stateid update", NFSv4 has been inserting locks in rpciod
    worker context. The result is that the file_lock's fl_nspid is the
    kworker's pid instead of the original userspace pid.

    The fl_nspid is only used to represent the namespaced virtual pid number
    when displaying locks or returning from F_GETLK. There's no reason to set
    it for every inserted lock, since we can usually just look it up from
    fl_pid. So, instead of looking up and holding struct pid for every lock,
    let's just look up the virtual pid number from fl_pid when it is needed.
    That means we can remove fl_nspid entirely.

    The translaton and presentation of fl_pid should handle the following four
    cases:

    1 - F_GETLK on a remote file with a remote lock:
    In this case, the filesystem should determine the l_pid to return here.
    Filesystems should indicate that the fl_pid represents a non-local pid
    value that should not be translated by returning an fl_pid
    Signed-off-by: Jeff Layton

    Benjamin Coddington
     

09 Jun, 2017

1 commit

  • Before the patch, the flock flag could remain uninitialized for the
    lifespan of the fuse_file allocation. Unless set to true in
    fuse_file_flock(), it would remain in an indeterminate state until read in
    an if statement in fuse_release_common(). This could consequently lead to
    taking an unexpected branch in the code.

    The bug was discovered by a runtime instrumentation designed to detect use
    of uninitialized memory in the kernel.

    Signed-off-by: Mateusz Jurczyk
    Fixes: 37fb3a30b462 ("fuse: fix flock")
    Cc: # v3.1+
    Signed-off-by: Miklos Szeredi

    Mateusz Jurczyk
     

11 May, 2017

1 commit

  • Pull NFS client updates from Trond Myklebust:
    "Highlights include:

    Stable bugfixes:
    - Fix use after free in write error path
    - Use GFP_NOIO for two allocations in writeback
    - Fix a hang in OPEN related to server reboot
    - Check the result of nfs4_pnfs_ds_connect
    - Fix an rcu lock leak

    Features:
    - Removal of the unmaintained and unused OSD pNFS layout
    - Cleanup and removal of lots of unnecessary dprintk()s
    - Cleanup and removal of some memory failure paths now that GFP_NOFS
    is guaranteed to never fail.
    - Remove the v3-only data server limitation on pNFS/flexfiles

    Bugfixes:
    - RPC/RDMA connection handling bugfixes
    - Copy offload: fixes to ensure the copied data is COMMITed to disk.
    - Readdir: switch back to using the ->iterate VFS interface
    - File locking fixes from Ben Coddington
    - Various use-after-free and deadlock issues in pNFS
    - Write path bugfixes"

    * tag 'nfs-for-4.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (89 commits)
    pNFS/flexfiles: Always attempt to call layoutstats when flexfiles is enabled
    NFSv4.1: Work around a Linux server bug...
    NFS append COMMIT after synchronous COPY
    NFSv4: Fix exclusive create attributes encoding
    NFSv4: Fix an rcu lock leak
    nfs: use kmap/kunmap directly
    NFS: always treat the invocation of nfs_getattr as cache hit when noac is on
    Fix nfs_client refcounting if kmalloc fails in nfs4_proc_exchange_id and nfs4_proc_async_renew
    NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION
    pNFS: Fix NULL dereference in pnfs_generic_alloc_ds_commits
    pNFS: Fix a typo in pnfs_generic_alloc_ds_commits
    pNFS: Fix a deadlock when coalescing writes and returning the layout
    pNFS: Don't clear the layout return info if there are segments to return
    pNFS: Ensure we commit the layout if it has been invalidated
    pNFS: Don't send COMMITs to the DSes if the server invalidated our layout
    pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure path
    pNFS: Ensure we check layout validity before marking it for return
    NFS4.1 handle interrupted slot reuse from ERR_DELAY
    NFSv4: check return value of xdr_inline_decode
    nfs/filelayout: fix NULL pointer dereference in fl_pnfs_update_layout()
    ...

    Linus Torvalds
     

21 Apr, 2017

1 commit

  • Set FL_CLOSE in fl_flags as in locks_remove_posix() when clearing locks.
    NFS will check for this flag to ensure an unlock is sent in a following
    patch.

    Fuse handles flock and posix locks differently for FL_CLOSE, and so
    requires a fixup to retain the existing behavior for flock.

    Signed-off-by: Benjamin Coddington
    Reviewed-by: Jeff Layton
    Acked-by: Miklos Szeredi
    Signed-off-by: Trond Myklebust

    Benjamin Coddington
     

18 Apr, 2017

2 commits

  • When the userspace process servicing fuse requests is running in
    a pid namespace then pids passed via the fuse fd are not being
    translated into that process' namespace. Translation is necessary
    for the pid to be useful to that process.

    Since no use case currently exists for changing namespaces all
    translations can be done relative to the pid namespace in use
    when fuse_conn_init() is called. For fuse this translates to
    mount time, and for cuse this is when /dev/cuse is opened. IO for
    this connection from another namespace will return errors.

    Requests from processes whose pid cannot be translated into the
    target namespace will have a value of 0 for in.h.pid.

    File locking changes based on previous work done by Eric
    Biederman.

    Signed-off-by: Seth Forshee
    Signed-off-by: Miklos Szeredi

    Seth Forshee
     
  • refcount_t type and corresponding API should be
    used instead of atomic_t when the variable is used as
    a reference counter. This allows to avoid accidental
    refcounter overflows that might lead to use-after-free
    situations.

    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Signed-off-by: Miklos Szeredi

    Elena Reshetova
     

04 Mar, 2017

1 commit


25 Feb, 2017

1 commit

  • ->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
    take a vma and vmf parameter when the vma already resides in vmf.

    Remove the vma parameter to simplify things.

    [arnd@arndb.de: fix ARM build]
    Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
    Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Dave Jiang
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Ross Zwisler
    Cc: Theodore Ts'o
    Cc: Darrick J. Wong
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     

23 Feb, 2017

3 commits


15 Nov, 2016

1 commit

  • If pos is at the beginning of a page and copied is zero then page is not
    zeroed but is marked uptodate.

    Fix by skipping everything except unlock/put of page if zero bytes were
    copied.

    Reported-by: Al Viro
    Fixes: 6b12c1b37e55 ("fuse: Implement write_begin/write_end callbacks")
    Cc: # v3.15+
    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

11 Oct, 2016

1 commit

  • Pull misc vfs updates from Al Viro:
    "Assorted misc bits and pieces.

    There are several single-topic branches left after this (rename2
    series from Miklos, current_time series from Deepa Dinamani, xattr
    series from Andreas, uaccess stuff from from me) and I'd prefer to
    send those separately"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
    proc: switch auxv to use of __mem_open()
    hpfs: support FIEMAP
    cifs: get rid of unused arguments of CIFSSMBWrite()
    posix_acl: uapi header split
    posix_acl: xattr representation cleanups
    fs/aio.c: eliminate redundant loads in put_aio_ring_file
    fs/internal.h: add const to ns_dentry_operations declaration
    compat: remove compat_printk()
    fs/buffer.c: make __getblk_slow() static
    proc: unsigned file descriptors
    fs/file: more unsigned file descriptors
    fs: compat: remove redundant check of nr_segs
    cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
    cifs: don't use memcpy() to copy struct iov_iter
    get rid of separate multipage fault-in primitives
    fs: Avoid premature clearing of capabilities
    fs: Give dentry to inode_change_ok() instead of inode
    fuse: Propagate dentry down to inode_change_ok()
    ceph: Propagate dentry down to inode_change_ok()
    xfs: Propagate dentry down to inode_change_ok()
    ...

    Linus Torvalds
     

08 Oct, 2016

1 commit


01 Oct, 2016

2 commits


22 Sep, 2016

1 commit

  • To avoid clearing of capabilities or security related extended
    attributes too early, inode_change_ok() will need to take dentry instead
    of inode. Propagate it down to fuse_do_setattr().

    Acked-by: Miklos Szeredi
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara

    Jan Kara
     

25 Aug, 2016

1 commit

  • When reading from a loop device backed by a fuse file it deadlocks on
    lock_page().

    This is because the page is already locked by the read() operation done on
    the loop device. In this case we don't want to either lock the page or
    dirty it.

    So do what fs/direct-io.c does: only dirty the page for ITER_IOVEC vectors.

    Reported-by: Sheng Yang
    Fixes: aa4d86163e4e ("block: loop: switch to VFS ITER_BVEC")
    Signed-off-by: Miklos Szeredi
    Cc: # v4.1+
    Reviewed-by: Sheng Yang
    Reviewed-by: Ashish Samant
    Tested-by: Sheng Yang
    Tested-by: Ashish Samant

    Miklos Szeredi
     

30 Jul, 2016

1 commit

  • Pull fuse updates from Miklos Szeredi:
    "This fixes error propagation from writeback to fsync/close for
    writeback cache mode as well as adding a missing capability flag to
    the INIT message. The rest are cleanups.

    (The commits are recent but all the code actually sat in -next for a
    while now. The recommits are due to conflict avoidance and the
    addition of Cc: stable@...)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: use filemap_check_errors()
    mm: export filemap_check_errors() to modules
    fuse: fix wrong assignment of ->flags in fuse_send_init()
    fuse: fuse_flush must check mapping->flags for errors
    fuse: fsync() did not return IO errors
    fuse: don't mess with blocking signals
    new helper: wait_event_killable_exclusive()
    fuse: improve aio directIO write performance for size extending writes

    Linus Torvalds
     

29 Jul, 2016

4 commits

  • Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • fuse_flush() calls write_inode_now() that triggers writeback, but actual
    writeback will happen later, on fuse_sync_writes(). If an error happens,
    fuse_writepage_end() will set error bit in mapping->flags. So, we have to
    check mapping->flags after fuse_sync_writes().

    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi
    Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on")
    Cc: # v3.15+

    Maxim Patlasov
     
  • Due to implementation of fuse writeback filemap_write_and_wait_range() does
    not catch errors. We have to do this directly after fuse_sync_writes()

    Signed-off-by: Alexey Kuznetsov
    Signed-off-by: Maxim Patlasov
    Signed-off-by: Miklos Szeredi
    Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on")
    Cc: # v3.15+

    Alexey Kuznetsov
     
  • There are now a number of accounting oddities such as mapped file pages
    being accounted for on the node while the total number of file pages are
    accounted on the zone. This can be coped with to some extent but it's
    confusing so this patch moves the relevant file-based accounted. Due to
    throttling logic in the page allocator for reliable OOM detection, it is
    still necessary to track dirty and writeback pages on a per-zone basis.

    [mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting]
    Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net
    Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Hillf Danton
    Acked-by: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Minchan Kim
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

30 Jun, 2016

1 commit

  • While sending the blocking directIO in fuse, the write request is broken
    into sub-requests, each of default size 128k and all the requests are sent
    in non-blocking background mode if async_dio mode is supported by libfuse.
    The process which issue the write wait for the completion of all the
    sub-requests. Sending multiple requests parallely gives a chance to perform
    parallel writes in the user space fuse implementation if it is
    multi-threaded and hence improves the performance.

    When there is a size extending aio dio write, we switch to blocking mode so
    that we can properly update the size of the file after completion of the
    writes. However, in this situation all the sub-requests are sent in
    serialized manner where the next request is sent only after receiving the
    reply of the current request. Hence the multi-threaded user space
    implementation is not utilized properly.

    This patch changes the size extending aio dio behavior to exactly follow
    blocking dio. For multi threaded fuse implementation having 10 threads and
    using buffer size of 64MB to perform async directIO, we are getting double
    the speed.

    Signed-off-by: Ashish Sangwan
    Signed-off-by: Miklos Szeredi

    Ashish Sangwan
     

18 May, 2016

1 commit

  • Pull vfs cleanups from Al Viro:
    "More cleanups from Christoph"

    * 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    nfsd: use RWF_SYNC
    fs: add RWF_DSYNC aand RWF_SYNC
    ceph: use generic_write_sync
    fs: simplify the generic_write_sync prototype
    fs: add IOCB_SYNC and IOCB_DSYNC
    direct-io: remove the offset argument to dio_complete
    direct-io: eliminate the offset argument to ->direct_IO
    xfs: eliminate the pos variable in xfs_file_dio_aio_write
    filemap: remove the pos argument to generic_file_direct_write
    filemap: remove pos variables in generic_file_read_iter

    Linus Torvalds
     

02 May, 2016

2 commits


25 Apr, 2016

1 commit

  • fuse_get_user_pages() should return error or 0. Otherwise fuse_direct_io
    read will not return 0 to indicate that read has completed.

    Fixes: 742f992708df ("fuse: return patrial success from fuse_direct_io()")
    Signed-off-by: Ashish Samant
    Signed-off-by: Seth Forshee
    Signed-off-by: Miklos Szeredi

    Ashish Samant