09 Jan, 2017

1 commit

  • commit c0cf3ef5e0f47e385920450b245d22bead93e7ad upstream.

    What matters when deciding if we should make a page uptodate is
    not how much we _wanted_ to copy, but how much we actually have
    copied. As it is, on architectures that do not zero tail on
    short copy we can leave uninitialized data in page marked uptodate.

    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     

14 Oct, 2016

1 commit

  • Pull NFS client updates from Anna Schumaker:
    "Highlights include:

    Stable bugfixes:
    - sunrpc: fix writ espace race causing stalls
    - NFS: Fix inode corruption in nfs_prime_dcache()
    - NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
    - NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
    - NFSv4: Open state recovery must account for file permission changes
    - NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic

    Features:
    - Add support for tracking multiple layout types with an ordered list
    - Add support for using multiple backchannel threads on the client
    - Add support for pNFS file layout session trunking
    - Delay xprtrdma use of DMA API (for device driver removal)
    - Add support for xprtrdma remote invalidation
    - Add support for larger xprtrdma inline thresholds
    - Use a scatter/gather list for sending xprtrdma RPC calls
    - Add support for the CB_NOTIFY_LOCK callback
    - Improve hashing sunrpc auth_creds by using both uid and gid

    Bugfixes:
    - Fix xprtrdma use of DMA API
    - Validate filenames before adding to the dcache
    - Fix corruption of xdr->nwords in xdr_copy_to_scratch
    - Fix setting buffer length in xdr_set_next_buffer()
    - Don't deadlock the state manager on the SEQUENCE status flags
    - Various delegation and stateid related fixes
    - Retry operations if an interrupted slot receives EREMOTEIO
    - Make nfs boot time y2038 safe"

    * tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits)
    NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
    fs: nfs: Make nfs boot time y2038 safe
    sunrpc: replace generic auth_cred hash with auth-specific function
    sunrpc: add RPCSEC_GSS hash_cred() function
    sunrpc: add auth_unix hash_cred() function
    sunrpc: add generic_auth hash_cred() function
    sunrpc: add hash_cred() function to rpc_authops struct
    Retry operation on EREMOTEIO on an interrupted slot
    pNFS: Fix atime updates on pNFS clients
    sunrpc: queue work on system_power_efficient_wq
    NFSv4.1: Even if the stateid is OK, we may need to recover the open modes
    NFSv4: If recovery failed for a specific open stateid, then don't retry
    NFSv4: Fix retry issues with nfs41_test/free_stateid
    NFSv4: Open state recovery must account for file permission changes
    NFSv4: Mark the lock and open stateids as invalid after freeing them
    NFSv4: Don't test open_stateid unless it is set
    NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
    NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation
    NFSv4: Fix a race when updating an open_stateid
    NFSv4: Fix a race in nfs_inode_reclaim_delegation()
    ...

    Linus Torvalds
     

06 Oct, 2016

1 commit


23 Sep, 2016

1 commit


20 Sep, 2016

1 commit


04 Sep, 2016

1 commit


25 Jul, 2016

1 commit


20 Jul, 2016

1 commit

  • A generic_cred can be used to look up a unx_cred or a gss_cred, so it's
    not really safe to use the the generic_cred->acred->ac_flags to store
    the NO_CRKEY_TIMEOUT flag. A lookup for a unx_cred triggered while the
    KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and
    KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated
    with the auth_cred to be in a state where they're perpetually doing 4K
    NFS_FILE_SYNC writes.

    This can be reproduced as follows:

    1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys.
    They do not need to be the same export, nor do they even need to be from
    the same NFS server. Also, v3 is fine.
    $ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5
    $ sudo mount -o v3,sec=sys server2:/export /mnt/sys

    2. As the normal user, before accessing the kerberized mount, kinit with
    a short lifetime (but not so short that renewing the ticket would leave
    you within the 4-minute window again by the time the original ticket
    expires), e.g.
    $ kinit -l 10m -r 60m

    3. Do some I/O to the kerberized mount and verify that the writes are
    wsize, UNSTABLE:
    $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1

    4. Wait until you're within 4 minutes of key expiry, then do some more
    I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets
    set. Verify that the writes are 4K, FILE_SYNC:
    $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1

    5. Now do some I/O to the sec=sys mount. This will cause
    RPC_CRED_NO_CRKEY_TIMEOUT to be set:
    $ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1

    6. Writes for that user will now be permanently 4K, FILE_SYNC for that
    user, regardless of which mount is being written to, until you reboot
    the client. Renewing the kerberos ticket (assuming it hasn't already
    expired) will have no effect. Grabbing a new kerberos ticket at this
    point will have no effect either.

    Move the flag to the auth->au_flags field (which is currently unused)
    and rename it slightly to reflect that it's no longer associated with
    the auth_cred->ac_flags. Add the rpc_auth to the arg list of
    rpcauth_cred_key_to_expire and check the au_flags there too. Finally,
    add the inode to the arg list of nfs_ctx_key_to_expire so we can
    determine the rpc_auth to pass to rpcauth_cred_key_to_expire.

    Signed-off-by: Scott Mayhew
    Signed-off-by: Trond Myklebust

    Scott Mayhew
     

06 Jul, 2016

5 commits


22 Jun, 2016

4 commits


02 May, 2016

1 commit


05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

17 Mar, 2016

2 commits


23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

08 Jan, 2016

1 commit

  • The use of wait_on_atomic_t() for waiting on I/O to complete before
    unlocking allows us to git rid of the NFS_IO_INPROGRESS flag, and thus the
    nfs_iocounter's flags member, and finally the nfs_iocounter altogether.
    The count of I/O is moved to the lock context, and the counter
    increment/decrement functions become simple enough to open-code.

    Signed-off-by: Benjamin Coddington
    [Trond: Fix up conflict with existing function nfs_wait_atomic_killable()]
    Signed-off-by: Trond Myklebust

    Benjamin Coddington
     

05 Jan, 2016

1 commit

  • * pnfs_generic:
    NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments
    NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures
    NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid()
    NFSv4.1/pNFS: Fix a race in initiate_file_draining()
    NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout
    NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode
    NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids
    NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn()
    NFS: Relax requirements in nfs_flush_incompatible
    NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid
    NFS: Allow multiple commit requests in flight per file
    NFS/pNFS: Fix up pNFS write reschedule layering violations and bugs
    NFSv4: List stateid information in the callback tracepoints
    NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALL
    NFSv4.1/pNFS: Ensure we enforce RFC5661 Section 12.5.5.2.1
    pNFS: If we have to delay the layout callback, mark the layout for return
    NFSv4.1/pNFS: Add a helper to mark the layout as returned
    pNFS: Ensure nfs4_layoutget_prepare returns the correct error

    Trond Myklebust
     

01 Jan, 2016

1 commit


29 Dec, 2015

1 commit


07 Nov, 2015

1 commit

  • …d avoiding waking kswapd

    __GFP_WAIT has been used to identify atomic context in callers that hold
    spinlocks or are in interrupts. They are expected to be high priority and
    have access one of two watermarks lower than "min" which can be referred
    to as the "atomic reserve". __GFP_HIGH users get access to the first
    lower watermark and can be called the "high priority reserve".

    Over time, callers had a requirement to not block when fallback options
    were available. Some have abused __GFP_WAIT leading to a situation where
    an optimisitic allocation with a fallback option can access atomic
    reserves.

    This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
    cannot sleep and have no alternative. High priority users continue to use
    __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
    are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
    callers that want to wake kswapd for background reclaim. __GFP_WAIT is
    redefined as a caller that is willing to enter direct reclaim and wake
    kswapd for background reclaim.

    This patch then converts a number of sites

    o __GFP_ATOMIC is used by callers that are high priority and have memory
    pools for those requests. GFP_ATOMIC uses this flag.

    o Callers that have a limited mempool to guarantee forward progress clear
    __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
    into this category where kswapd will still be woken but atomic reserves
    are not used as there is a one-entry mempool to guarantee progress.

    o Callers that are checking if they are non-blocking should use the
    helper gfpflags_allow_blocking() where possible. This is because
    checking for __GFP_WAIT as was done historically now can trigger false
    positives. Some exceptions like dm-crypt.c exist where the code intent
    is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
    flag manipulations.

    o Callers that built their own GFP flags instead of starting with GFP_KERNEL
    and friends now also need to specify __GFP_KSWAPD_RECLAIM.

    The first key hazard to watch out for is callers that removed __GFP_WAIT
    and was depending on access to atomic reserves for inconspicuous reasons.
    In some cases it may be appropriate for them to use __GFP_HIGH.

    The second key hazard is callers that assembled their own combination of
    GFP flags instead of starting with something like GFP_KERNEL. They may
    now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
    if it's missed in most cases as other activity will wake kswapd.

    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Vitaly Wool <vitalywool@gmail.com>
    Cc: Rik van Riel <riel@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman
     

23 Oct, 2015

1 commit


08 Sep, 2015

1 commit

  • The NFSv4 delegation spec allows the server to tell a client to limit how
    much data it cache after the file is closed. In return, the server
    guarantees enough free space to avoid ENOSPC situations, etc.
    Prior to this patch, we assumed we could always cache aggressively after
    close. Unfortunately, this causes problems with servers that set the
    limit to 0 and therefore do not offer any ENOSPC guarantees.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

18 Aug, 2015

2 commits


11 Jun, 2015

1 commit

  • Jerome reported seeing a warning pop when working with a swapfile on
    NFS. The nfs_swap_activate can end up calling sk_set_memalloc while
    holding the rcu_read_lock and that function can sleep.

    To fix that, we need to take a reference to the xprt while holding the
    rcu_read_lock, set the socket up for swapping and then drop that
    reference. But, xprt_put is not exported and having NFS deal with the
    underlying xprt is a bit of layering violation anyway.

    Fix this by adding a set of activate/deactivate functions that take a
    rpc_clnt pointer instead of an rpc_xprt, and have nfs_swap_activate and
    nfs_swap_deactivate call those.

    Also, add a per-rpc_clnt atomic counter to keep track of the number of
    active swapfiles associated with it. When the counter does a 0->1
    transition, we enable swapping on the xprt, when we do a 1->0 transition
    we disable swapping on it.

    This also allows us to be a bit more selective with the RPC_TASK_SWAPPER
    flag. If non-swapper and swapper clnts are sharing a xprt, then we only
    need to flag the tasks from the swapper clnt with that flag.

    Acked-by: Mel Gorman
    Reported-by: Jerome Marchand
    Signed-off-by: Jeff Layton
    Reviewed-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Jeff Layton
     

27 Apr, 2015

1 commit

  • Pull NFS client updates from Trond Myklebust:
    "Another set of mainly bugfixes and a couple of cleanups. No new
    functionality in this round.

    Highlights include:

    Stable patches:
    - Fix a regression in /proc/self/mountstats
    - Fix the pNFS flexfiles O_DIRECT support
    - Fix high load average due to callback thread sleeping

    Bugfixes:
    - Various patches to fix the pNFS layoutcommit support
    - Do not cache pNFS deviceids unless server notifications are enabled
    - Fix a SUNRPC transport reconnection regression
    - make debugfs file creation failure non-fatal in SUNRPC
    - Another fix for circular directory warnings on NFSv4 "junctioned"
    mountpoints
    - Fix locking around NFSv4.2 fallocate() support
    - Truncating NFSv4 file opens should also sync O_DIRECT writes
    - Prevent infinite loop in rpcrdma_ep_create()

    Features:
    - Various improvements to the RDMA transport code's handling of
    memory registration
    - Various code cleanups"

    * tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (55 commits)
    fs/nfs: fix new compiler warning about boolean in switch
    nfs: Remove unneeded casts in nfs
    NFS: Don't attempt to decode missing directory entries
    Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one"
    NFS: Rename idmap.c to nfs4idmap.c
    NFS: Move nfs_idmap.h into fs/nfs/
    NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h
    NFS: Add a stub for GETDEVICELIST
    nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes
    nfs: fix DIO good bytes calculation
    nfs: Fetch MOUNTED_ON_FILEID when updating an inode
    sunrpc: make debugfs file creation failure non-fatal
    nfs: fix high load average due to callback thread sleeping
    NFS: Reduce time spent holding the i_mutex during fallocate()
    NFS: Don't zap caches on fallocate()
    xprtrdma: Make rpcrdma_{un}map_one() into inline functions
    xprtrdma: Handle non-SEND completions via a callout
    xprtrdma: Add "open" memreg op
    xprtrdma: Add "destroy MRs" memreg op
    xprtrdma: Add "reset MRs" memreg op
    ...

    Linus Torvalds
     

16 Apr, 2015

1 commit


12 Apr, 2015

3 commits


28 Mar, 2015

2 commits


26 Mar, 2015

1 commit