05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

17 Mar, 2016

2 commits


23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

08 Jan, 2016

1 commit

  • The use of wait_on_atomic_t() for waiting on I/O to complete before
    unlocking allows us to git rid of the NFS_IO_INPROGRESS flag, and thus the
    nfs_iocounter's flags member, and finally the nfs_iocounter altogether.
    The count of I/O is moved to the lock context, and the counter
    increment/decrement functions become simple enough to open-code.

    Signed-off-by: Benjamin Coddington
    [Trond: Fix up conflict with existing function nfs_wait_atomic_killable()]
    Signed-off-by: Trond Myklebust

    Benjamin Coddington
     

05 Jan, 2016

1 commit

  • * pnfs_generic:
    NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments
    NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures
    NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid()
    NFSv4.1/pNFS: Fix a race in initiate_file_draining()
    NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout
    NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode
    NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids
    NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn()
    NFS: Relax requirements in nfs_flush_incompatible
    NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid
    NFS: Allow multiple commit requests in flight per file
    NFS/pNFS: Fix up pNFS write reschedule layering violations and bugs
    NFSv4: List stateid information in the callback tracepoints
    NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALL
    NFSv4.1/pNFS: Ensure we enforce RFC5661 Section 12.5.5.2.1
    pNFS: If we have to delay the layout callback, mark the layout for return
    NFSv4.1/pNFS: Add a helper to mark the layout as returned
    pNFS: Ensure nfs4_layoutget_prepare returns the correct error

    Trond Myklebust
     

01 Jan, 2016

1 commit


29 Dec, 2015

1 commit


07 Nov, 2015

1 commit

  • …d avoiding waking kswapd

    __GFP_WAIT has been used to identify atomic context in callers that hold
    spinlocks or are in interrupts. They are expected to be high priority and
    have access one of two watermarks lower than "min" which can be referred
    to as the "atomic reserve". __GFP_HIGH users get access to the first
    lower watermark and can be called the "high priority reserve".

    Over time, callers had a requirement to not block when fallback options
    were available. Some have abused __GFP_WAIT leading to a situation where
    an optimisitic allocation with a fallback option can access atomic
    reserves.

    This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
    cannot sleep and have no alternative. High priority users continue to use
    __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
    are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
    callers that want to wake kswapd for background reclaim. __GFP_WAIT is
    redefined as a caller that is willing to enter direct reclaim and wake
    kswapd for background reclaim.

    This patch then converts a number of sites

    o __GFP_ATOMIC is used by callers that are high priority and have memory
    pools for those requests. GFP_ATOMIC uses this flag.

    o Callers that have a limited mempool to guarantee forward progress clear
    __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
    into this category where kswapd will still be woken but atomic reserves
    are not used as there is a one-entry mempool to guarantee progress.

    o Callers that are checking if they are non-blocking should use the
    helper gfpflags_allow_blocking() where possible. This is because
    checking for __GFP_WAIT as was done historically now can trigger false
    positives. Some exceptions like dm-crypt.c exist where the code intent
    is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
    flag manipulations.

    o Callers that built their own GFP flags instead of starting with GFP_KERNEL
    and friends now also need to specify __GFP_KSWAPD_RECLAIM.

    The first key hazard to watch out for is callers that removed __GFP_WAIT
    and was depending on access to atomic reserves for inconspicuous reasons.
    In some cases it may be appropriate for them to use __GFP_HIGH.

    The second key hazard is callers that assembled their own combination of
    GFP flags instead of starting with something like GFP_KERNEL. They may
    now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
    if it's missed in most cases as other activity will wake kswapd.

    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Vitaly Wool <vitalywool@gmail.com>
    Cc: Rik van Riel <riel@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman
     

23 Oct, 2015

1 commit


08 Sep, 2015

1 commit

  • The NFSv4 delegation spec allows the server to tell a client to limit how
    much data it cache after the file is closed. In return, the server
    guarantees enough free space to avoid ENOSPC situations, etc.
    Prior to this patch, we assumed we could always cache aggressively after
    close. Unfortunately, this causes problems with servers that set the
    limit to 0 and therefore do not offer any ENOSPC guarantees.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

18 Aug, 2015

2 commits


11 Jun, 2015

1 commit

  • Jerome reported seeing a warning pop when working with a swapfile on
    NFS. The nfs_swap_activate can end up calling sk_set_memalloc while
    holding the rcu_read_lock and that function can sleep.

    To fix that, we need to take a reference to the xprt while holding the
    rcu_read_lock, set the socket up for swapping and then drop that
    reference. But, xprt_put is not exported and having NFS deal with the
    underlying xprt is a bit of layering violation anyway.

    Fix this by adding a set of activate/deactivate functions that take a
    rpc_clnt pointer instead of an rpc_xprt, and have nfs_swap_activate and
    nfs_swap_deactivate call those.

    Also, add a per-rpc_clnt atomic counter to keep track of the number of
    active swapfiles associated with it. When the counter does a 0->1
    transition, we enable swapping on the xprt, when we do a 1->0 transition
    we disable swapping on it.

    This also allows us to be a bit more selective with the RPC_TASK_SWAPPER
    flag. If non-swapper and swapper clnts are sharing a xprt, then we only
    need to flag the tasks from the swapper clnt with that flag.

    Acked-by: Mel Gorman
    Reported-by: Jerome Marchand
    Signed-off-by: Jeff Layton
    Reviewed-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Jeff Layton
     

27 Apr, 2015

1 commit

  • Pull NFS client updates from Trond Myklebust:
    "Another set of mainly bugfixes and a couple of cleanups. No new
    functionality in this round.

    Highlights include:

    Stable patches:
    - Fix a regression in /proc/self/mountstats
    - Fix the pNFS flexfiles O_DIRECT support
    - Fix high load average due to callback thread sleeping

    Bugfixes:
    - Various patches to fix the pNFS layoutcommit support
    - Do not cache pNFS deviceids unless server notifications are enabled
    - Fix a SUNRPC transport reconnection regression
    - make debugfs file creation failure non-fatal in SUNRPC
    - Another fix for circular directory warnings on NFSv4 "junctioned"
    mountpoints
    - Fix locking around NFSv4.2 fallocate() support
    - Truncating NFSv4 file opens should also sync O_DIRECT writes
    - Prevent infinite loop in rpcrdma_ep_create()

    Features:
    - Various improvements to the RDMA transport code's handling of
    memory registration
    - Various code cleanups"

    * tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (55 commits)
    fs/nfs: fix new compiler warning about boolean in switch
    nfs: Remove unneeded casts in nfs
    NFS: Don't attempt to decode missing directory entries
    Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one"
    NFS: Rename idmap.c to nfs4idmap.c
    NFS: Move nfs_idmap.h into fs/nfs/
    NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h
    NFS: Add a stub for GETDEVICELIST
    nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes
    nfs: fix DIO good bytes calculation
    nfs: Fetch MOUNTED_ON_FILEID when updating an inode
    sunrpc: make debugfs file creation failure non-fatal
    nfs: fix high load average due to callback thread sleeping
    NFS: Reduce time spent holding the i_mutex during fallocate()
    NFS: Don't zap caches on fallocate()
    xprtrdma: Make rpcrdma_{un}map_one() into inline functions
    xprtrdma: Handle non-SEND completions via a callout
    xprtrdma: Add "open" memreg op
    xprtrdma: Add "destroy MRs" memreg op
    xprtrdma: Add "reset MRs" memreg op
    ...

    Linus Torvalds
     

16 Apr, 2015

1 commit


12 Apr, 2015

3 commits


28 Mar, 2015

2 commits


26 Mar, 2015

1 commit


04 Mar, 2015

2 commits

  • nfs_vm_page_mkwrite() should wait until the page cache invalidation
    is finished. This is the second patch in a 2 patch series to deprecate
    the NFS client's reliance on nfs_release_page() in the context of
    nfs_invalidate_mapping().

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • When invalidating the page cache for a regular file, we want to first
    sync all dirty data to disk and then call invalidate_inode_pages2().
    The latter relies on nfs_launder_page() and nfs_release_page() to deal
    respectively with dirty pages, and unstable written pages.

    When commit 9590544694bec ("NFS: avoid deadlocks with loop-back mounted
    NFS filesystems.") changed the behaviour of nfs_release_page(), then it
    made it possible for invalidate_inode_pages2() to fail with an EBUSY.
    Unfortunately, that error is then propagated back to read().

    Let's therefore work around the problem for now by protecting the call
    to sync the data and invalidate_inode_pages2() so that they are atomic
    w.r.t. the addition of new writes.
    Later on, we can revisit whether or not we still need nfs_launder_page()
    and nfs_release_page().

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

02 Mar, 2015

1 commit

  • The O_DIRECT code will grab the inode->i_mutex and flush out buffered
    writes, before scheduling a read or a write. However there is no
    equivalent in the buffered write code to wait for O_DIRECT to complete.

    Fixes a reported issue in xfstests generic/133, when first performing an
    O_DIRECT write followed by a buffered write.

    Signed-off-by: Trond Myklebust
    Tested-by: Chuck Lever

    Trond Myklebust
     

11 Feb, 2015

1 commit


19 Oct, 2014

1 commit

  • Pull core block layer changes from Jens Axboe:
    "This is the core block IO pull request for 3.18. Apart from the new
    and improved flush machinery for blk-mq, this is all mostly bug fixes
    and cleanups.

    - blk-mq timeout updates and fixes from Christoph.

    - Removal of REQ_END, also from Christoph. We pass it through the
    ->queue_rq() hook for blk-mq instead, freeing up one of the request
    bits. The space was overly tight on 32-bit, so Martin also killed
    REQ_KERNEL since it's no longer used.

    - blk integrity updates and fixes from Martin and Gu Zheng.

    - Update to the flush machinery for blk-mq from Ming Lei. Now we
    have a per hardware context flush request, which both cleans up the
    code should scale better for flush intensive workloads on blk-mq.

    - Improve the error printing, from Rob Elliott.

    - Backing device improvements and cleanups from Tejun.

    - Fixup of a misplaced rq_complete() tracepoint from Hannes.

    - Make blk_get_request() return error pointers, fixing up issues
    where we NULL deref when a device goes bad or missing. From Joe
    Lawrence.

    - Prep work for drastically reducing the memory consumption of dm
    devices from Junichi Nomura. This allows creating clone bio sets
    without preallocating a lot of memory.

    - Fix a blk-mq hang on certain combinations of queue depths and
    hardware queues from me.

    - Limit memory consumption for blk-mq devices for crash dump
    scenarios and drivers that use crazy high depths (certain SCSI
    shared tag setups). We now just use a single queue and limited
    depth for that"

    * 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits)
    block: Remove REQ_KERNEL
    blk-mq: allocate cpumask on the home node
    bio-integrity: remove the needless fail handle of bip_slab creating
    block: include func name in __get_request prints
    block: make blk_update_request print prefix match ratelimited prefix
    blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio
    block: fix alignment_offset math that assumes io_min is a power-of-2
    blk-mq: Make bt_clear_tag() easier to read
    blk-mq: fix potential hang if rolling wakeup depth is too high
    block: add bioset_create_nobvec()
    block: use bio_clone_fast() in blk_rq_prep_clone()
    block: misplaced rq_complete tracepoint
    sd: Honor block layer integrity handling flags
    block: Replace strnicmp with strncasecmp
    block: Add T10 Protection Information functions
    block: Don't merge requests if integrity flags differ
    block: Integrity checksum flag
    block: Relocate bio integrity flags
    block: Add a disk flag to block integrity profile
    block: Add prefix to block integrity profile flags
    ...

    Linus Torvalds
     

14 Oct, 2014

1 commit

  • REQ_KERNEL is no longer used. Remove it and drop the redundant uio
    argument to nfs_file_direct_{read,write}.

    Signed-off-by: Martin K. Petersen
    Cc: Christoph Hellwig
    Reported-by: Dan Carpenter
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

12 Oct, 2014

1 commit

  • Pull file locking related changes from Jeff Layton:
    "This release is a little more busy for file locking changes than the
    last:

    - a set of patches from Kinglong Mee to fix the lockowner handling in
    knfsd
    - a pile of cleanups to the internal file lease API. This should get
    us a bit closer to allowing for setlease methods that can block.

    There are some dependencies between mine and Bruce's trees this cycle,
    and I based my tree on top of the requisite patches in Bruce's tree"

    * tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux: (26 commits)
    locks: fix fcntl_setlease/getlease return when !CONFIG_FILE_LOCKING
    locks: flock_make_lock should return a struct file_lock (or PTR_ERR)
    locks: set fl_owner for leases to filp instead of current->files
    locks: give lm_break a return value
    locks: __break_lease cleanup in preparation of allowing direct removal of leases
    locks: remove i_have_this_lease check from __break_lease
    locks: move freeing of leases outside of i_lock
    locks: move i_lock acquisition into generic_*_lease handlers
    locks: define a lm_setup handler for leases
    locks: plumb a "priv" pointer into the setlease routines
    nfsd: don't keep a pointer to the lease in nfs4_file
    locks: clean up vfs_setlease kerneldoc comments
    locks: generic_delete_lease doesn't need a file_lock at all
    nfsd: fix potential lease memory leak in nfs4_setlease
    locks: close potential race in lease_get_mtime
    security: make security_file_set_fowner, f_setown and __f_setown void return
    locks: consolidate "nolease" routines
    locks: remove lock_may_read and lock_may_write
    lockd: rip out deferred lock handling from testlock codepath
    NFSD: Get reference of lockowner when coping file_lock
    ...

    Linus Torvalds
     

25 Sep, 2014

3 commits

  • Now that nfs_release_page() doesn't block indefinitely, other deadlock
    avoidance mechanisms aren't needed.
    - it doesn't hurt for kswapd to block occasionally. If it doesn't
    want to block it would clear __GFP_WAIT. The current_is_kswapd()
    was only added to avoid deadlocks and we have a new approach for
    that.
    - memory allocation in the SUNRPC layer can very rarely try to
    ->releasepage() a page it is trying to handle. The deadlock
    is removed as nfs_release_page() doesn't block indefinitely.

    So we don't need to set PF_FSTRANS for sunrpc network operations any
    more.

    Signed-off-by: NeilBrown
    Acked-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    NeilBrown
     
  • If nfs_release_page() is called on a sequence of pages which are all
    in the same file which is blocked on COMMIT, each page could
    contribute a 1 second delay which could be come excessive. I have
    seen delays of as much as 208 seconds.

    To keep the delay to one second, mark the bdi as write-congested
    if the commit didn't finished. Once it does finish, the
    write-congested flag will be cleared by nfs_commit_release_pages().

    With this, the longest total delay in try_to_free_pages that I have
    seen is under 3 seconds. With no waiting in nfs_release_page at all
    I have seen delays of nearly 1.5 seconds.

    Signed-off-by: NeilBrown
    Acked-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    NeilBrown
     
  • Support for loop-back mounted NFS filesystems is useful when NFS is
    used to access shared storage in a high-availability cluster.

    If the node running the NFS server fails, some other node can mount the
    filesystem and start providing NFS service. If that node already had
    the filesystem NFS mounted, it will now have it loop-back mounted.

    nfsd can suffer a deadlock when allocating memory and entering direct
    reclaim.
    While direct reclaim does not write to the NFS filesystem it can send
    and wait for a COMMIT through nfs_release_page().

    This patch modifies nfs_release_page() to wait a limited time for the
    commit to complete - one second. If the commit doesn't complete
    in this time, nfs_release_page() will fail. This means it might now
    fail in some cases where it wouldn't before. These cases are only
    when 'gfp' includes '__GFP_WAIT'.

    nfs_release_page() is only called by try_to_release_page(), and that
    can only be called on an NFS page with required 'gfp' flags from
    - page_cache_pipe_buf_steal() in splice.c
    - shrink_page_list() in vmscan.c
    - invalidate_inode_pages2_range() in truncate.c

    The first two handle failure quite safely. The last is only called
    after ->launder_page() has been called, and that will have waited
    for the commit to finish already.

    So aborting if the commit takes longer than 1 second is perfectly safe.

    Signed-off-by: NeilBrown
    Acked-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    NeilBrown
     

11 Sep, 2014

2 commits

  • sparse says:

    fs/nfs/file.c:543:60: warning: incorrect type in argument 1 (different address spaces)
    fs/nfs/file.c:543:60: expected struct rpc_xprt *xprt
    fs/nfs/file.c:543:60: got struct rpc_xprt [noderef] *cl_xprt
    fs/nfs/file.c:548:53: warning: incorrect type in argument 1 (different address spaces)
    fs/nfs/file.c:548:53: expected struct rpc_xprt *xprt
    fs/nfs/file.c:548:53: got struct rpc_xprt [noderef] *cl_xprt

    cl_xprt is RCU-managed, so we need to take care to dereference and use
    it while holding the RCU read lock.

    Cc: Mel Gorman
    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • Like all block based filesystems, the pNFS block layout driver can't read
    or write at a byte granularity and thus has to perform read-modify-write
    cycles on writes smaller than this granularity.

    Add a flag so that the core NFS code always reads a whole page when
    starting a smaller write, so that we can do it in the place where the VFS
    expects it instead of doing in very deadlock prone way in the writeback
    handler.

    Note that in theory we could do less than page size reads here for disks
    that have a smaller sector size which are served by a server with a smaller
    pnfs block size. But so far that doesn't seem like a worthwhile
    optimization.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Trond Myklebust

    Christoph Hellwig
     

10 Sep, 2014

1 commit

  • GFS2 and NFS have setlease routines that always just return -EINVAL.
    Turn that into a generic routine that can live in fs/libfs.c.

    Cc:
    Cc: Steven Whitehouse
    Cc:
    Signed-off-by: Jeff Layton
    Acked-by: Trond Myklebust
    Reviewed-by: Christoph Hellwig

    Jeff Layton
     

16 Jul, 2014

1 commit

  • The current "wait_on_bit" interface requires an 'action'
    function to be provided which does the actual waiting.
    There are over 20 such functions, many of them identical.
    Most cases can be satisfied by one of just two functions, one
    which uses io_schedule() and one which just uses schedule().

    So:
    Rename wait_on_bit and wait_on_bit_lock to
    wait_on_bit_action and wait_on_bit_lock_action
    to make it explicit that they need an action function.

    Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
    which are *not* given an action function but implicitly use
    a standard one.
    The decision to error-out if a signal is pending is now made
    based on the 'mode' argument rather than being encoded in the action
    function.

    All instances of the old wait_on_bit and wait_on_bit_lock which
    can use the new version have been changed accordingly and their
    action functions have been discarded.
    wait_on_bit{_lock} does not return any specific error code in the
    event of a signal so the caller must check for non-zero and
    interpolate their own error code as appropriate.

    The wait_on_bit() call in __fscache_wait_on_invalidate() was
    ambiguous as it specified TASK_UNINTERRUPTIBLE but used
    fscache_wait_bit_interruptible as an action function.
    David Howells confirms this should be uniformly
    "uninterruptible"

    The main remaining user of wait_on_bit{,_lock}_action is NFS
    which needs to use a freezer-aware schedule() call.

    A comment in fs/gfs2/glock.c notes that having multiple 'action'
    functions is useful as they display differently in the 'wchan'
    field of 'ps'. (and /proc/$PID/wchan).
    As the new bit_wait{,_io} functions are tagged "__sched", they
    will not show up at all, but something higher in the stack. So
    the distinction will still be visible, only with different
    function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
    gfs2/glock.c case).

    Since first version of this patch (against 3.15) two new action
    functions appeared, on in NFS and one in CIFS. CIFS also now
    uses an action function that makes the same freezer aware
    schedule call as NFS.

    Signed-off-by: NeilBrown
    Acked-by: David Howells (fscache, keys)
    Acked-by: Steven Whitehouse (gfs2)
    Acked-by: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Steve French
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
    Signed-off-by: Ingo Molnar

    NeilBrown
     

13 Jun, 2014

1 commit

  • Pull vfs updates from Al Viro:
    "This the bunch that sat in -next + lock_parent() fix. This is the
    minimal set; there's more pending stuff.

    In particular, I really hope to get acct.c fixes merged this cycle -
    we need that to deal sanely with delayed-mntput stuff. In the next
    pile, hopefully - that series is fairly short and localized
    (kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more
    iov_iter work. Most of prereqs for ->splice_write with sane locking
    order are there and Kent's dio rewrite would also fit nicely on top of
    this pile"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits)
    lock_parent: don't step on stale ->d_parent of all-but-freed one
    kill generic_file_splice_write()
    ceph: switch to iter_file_splice_write()
    shmem: switch to iter_file_splice_write()
    nfs: switch to iter_splice_write_file()
    fs/splice.c: remove unneeded exports
    ocfs2: switch to iter_file_splice_write()
    ->splice_write() via ->write_iter()
    bio_vec-backed iov_iter
    optimize copy_page_{to,from}_iter()
    bury generic_file_aio_{read,write}
    lustre: get rid of messing with iovecs
    ceph: switch to ->write_iter()
    ceph_sync_direct_write: stop poking into iov_iter guts
    ceph_sync_read: stop poking into iov_iter guts
    new helper: copy_page_from_iter()
    fuse: switch to ->write_iter()
    btrfs: switch to ->write_iter()
    ocfs2: switch to ->write_iter()
    xfs: switch to ->write_iter()
    ...

    Linus Torvalds
     

12 Jun, 2014

1 commit


02 Jun, 2014

1 commit

  • Currently, the fl_owner isn't set for flock locks. Some filesystems use
    byte-range locks to simulate flock locks and there is a common idiom in
    those that does:

    fl->fl_owner = (fl_owner_t)filp;
    fl->fl_start = 0;
    fl->fl_end = OFFSET_MAX;

    Since flock locks are generally "owned" by the open file description,
    move this into the common flock lock setup code. The fl_start and fl_end
    fields are already set appropriately, so remove the unneeded setting of
    that in flock ops in those filesystems as well.

    Finally, the lease code also sets the fl_owner as if they were owned by
    the process and not the open file description. This is incorrect as
    leases have the same ownership semantics as flock locks. Set them the
    same way. The lease code doesn't actually use the fl_owner value for
    anything, so this is more for consistency's sake than a bugfix.

    Reported-by: Trond Myklebust
    Signed-off-by: Jeff Layton
    Acked-by: Greg Kroah-Hartman (Staging portion)
    Acked-by: J. Bruce Fields

    Jeff Layton
     

07 May, 2014

1 commit