27 Aug, 2019

1 commit

  • If the client attempts to read a page, but the read fails due to some
    spurious error (e.g. an ACCESS error or a timeout, ...) then we need
    to allow other processes to retry.
    Also try to report errors correctly when doing a synchronous readpage.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

26 Apr, 2019

2 commits


21 Feb, 2019

1 commit


01 Oct, 2018

1 commit


12 Sep, 2017

1 commit

  • Tools like tcpdump and rpcdebug can be very useful. But there are
    plenty of environments where they are difficult or impossible to
    use. For example, we've had customers report I/O failures during
    workloads so heavy that collecting network traffic or enabling
    RPC debugging are themselves onerous.

    The kernel's static tracepoints are lightweight (less likely to
    introduce timing changes) and efficient (the trace data is compact).
    They also work in scenarios where capturing network traffic is not
    possible due to lack of hardware support (some InfiniBand HCAs) or
    where data or network privacy is a concern.

    Introduce tracepoints that show when an NFS READ, WRITE, or COMMIT
    is initiated, and when it completes. Record the arguments and
    results of each operation, which are not shown by existing sunrpc
    module's tracepoints.

    For instance, the recorded offset and count can be used to match an
    "initiate" event to a "done" event. If an NFS READ result returns
    fewer bytes than requested or zero, seeing the EOF flag can be
    probative. Seeing an NFS4ERR_BAD_STATEID result is also indication
    of a particular class of problems. The timing information attached
    to each event record can often be useful as well.

    Usage example:

    [root@manet tmp]# trace-cmd record -e nfs:*initiate* -e nfs:*done
    /sys/kernel/debug/tracing/events/nfs/*initiate*/filter
    /sys/kernel/debug/tracing/events/nfs/*done/filter
    Hit Ctrl^C to stop recording
    ^CKernel buffer statistics:
    Note: "entries" are the entries left in the kernel ring buffer and are not
    recorded in the trace data. They should all be zero.

    CPU: 0
    entries: 0
    overrun: 0
    commit overrun: 0
    bytes: 3680
    oldest event ts: 78.367422
    now ts: 100.124419
    dropped events: 0
    read events: 74

    ... and so on.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

20 Aug, 2017

1 commit


21 Apr, 2017

1 commit


08 Oct, 2016

1 commit

  • After using the offset of the swap entry as the key of the swap cache,
    the page_index() becomes exactly same as page_file_index(). So the
    page_file_index() is removed and the callers are changed to use
    page_index() instead.

    Link: http://lkml.kernel.org/r/1473270649-27229-2-git-send-email-ying.huang@intel.com
    Signed-off-by: "Huang, Ying"
    Cc: Trond Myklebust
    Cc: Anna Schumaker
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Dave Hansen
    Cc: Johannes Weiner
    Cc: Dan Williams
    Cc: Joonsoo Kim
    Cc: Ross Zwisler
    Cc: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     

25 Jun, 2016

1 commit

  • Since commit 0bcbf039f6b2, nfs_readpage_release() has been used to
    unlock the page in the read code.

    Fixes: 0bcbf039f6b2 ("nfs: handle request add failure properly")
    Cc: stable@vger.kernel.org # v4.5+
    Signed-off-by: Trond Myklebust
    Signed-off-by: Anna Schumaker

    Trond Myklebust
     

05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

29 Dec, 2015

2 commits

  • When we fail to queue a read page to IO descriptor,
    we need to clean it up otherwise it is hanging around
    preventing nfs module from being removed.

    When we fail to queue a write page to IO descriptor,
    we need to clean it up and also save the failure status
    to open context. Then at file close, we can try to write
    pages back again and drop the page if it fails to writeback
    in .launder_page, which will be done in the next patch.

    Signed-off-by: Peng Tao
    Signed-off-by: Trond Myklebust

    Peng Tao
     
  • For ERESTARTSYS/EIO/EROFS/ENOSPC/E2BIG in layoutget, we
    should just bail out instead of hiding the error and
    retrying inband IO.

    Change all the call sites to pop the error all the way up.

    Signed-off-by: Peng Tao
    Signed-off-by: Trond Myklebust

    Peng Tao
     

22 Oct, 2015

1 commit

  • If non rpc-based layout driver return bad length of data, nfs retries
    by calling rpc_restart_call_prepare() that cause an NULL reference panic.

    This patch lets nfs retry through MDS for non rpc-based layout driver
    return bad length of data.

    [13034.883329] BUG: unable to handle kernel NULL pointer dereference at (null)
    [13034.884902] IP: [] rpc_restart_call_prepare+0x62/0x90 [sunrpc]
    [13034.886558] PGD 0
    [13034.888126] Oops: 0000 [#1] KASAN
    [13034.889710] Modules linked in: blocklayoutdriver(OE) nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c coretemp btrfs crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev vmw_balloon auth_rpcgss shpchp nfs_acl lockd vmw_vmci parport_pc xor raid6_pq grace parport sunrpc i2c_piix4 vmwgfx drm_kms_helper ttm drm mptspi e1000 serio_raw scsi_transport_spi mptscsih mptbase ata_generic pata_acpi [last unloaded: fscache]
    [13034.898260] CPU: 0 PID: 10112 Comm: kworker/0:1 Tainted: G OE 4.3.0-rc5+ #279
    [13034.899932] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
    [13034.903342] Workqueue: events bl_read_cleanup [blocklayoutdriver]
    [13034.905059] task: ffff88006a9148c0 ti: ffff880035e90000 task.ti: ffff880035e90000
    [13034.906827] RIP: 0010:[] [] rpc_restart_call_prepare+0x62/0x90 [sunrpc]
    [13034.910522] RSP: 0018:ffff880035e97b58 EFLAGS: 00010282
    [13034.912378] RAX: fffffbfff04a5a94 RBX: ffff880068fe4858 RCX: 0000000000000003
    [13034.914339] RDX: dffffc0000000000 RSI: 0000000000000003 RDI: 0000000000000282
    [13034.916236] RBP: ffff880035e97b68 R08: 0000000000000001 R09: 0000000000000001
    [13034.918229] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
    [13034.920007] R13: ffff880068fe4858 R14: ffff880068fe4a60 R15: 0000000000001000
    [13034.921845] FS: 0000000000000000(0000) GS:ffffffff82247000(0000) knlGS:0000000000000000
    [13034.923645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [13034.925525] CR2: 0000000000000000 CR3: 00000000063dd000 CR4: 00000000001406f0
    [13034.932808] Stack:
    [13034.934813] ffff880068fe4780 0000000000001000 ffff880035e97ba8 ffffffffa08800d2
    [13034.936675] ffffffffa088029d ffff880068fe4780 ffff880068fe4858 ffffffffa089c0a0
    [13034.938593] ffff880068fe47e0 ffff88005d59faf0 ffff880035e97be0 ffffffffa087e08f
    [13034.940454] Call Trace:
    [13034.942388] [] nfs_readpage_result+0x112/0x200 [nfs]
    [13034.944317] [] ? nfs_readpage_done+0xdd/0x160 [nfs]
    [13034.946267] [] nfs_pgio_result+0x9f/0x120 [nfs]
    [13034.948166] [] pnfs_ld_read_done+0x7c/0x1e0 [nfsv4]
    [13034.950247] [] bl_read_cleanup+0x2e/0x60 [blocklayoutdriver]
    [13034.952156] [] process_one_work+0x412/0x870
    [13034.954102] [] ? process_one_work+0x334/0x870
    [13034.955949] [] ? queue_delayed_work_on+0x40/0x40
    [13034.957985] [] worker_thread+0x81/0x6a0
    [13034.959817] [] ? process_one_work+0x870/0x870
    [13034.961785] [] kthread+0x17d/0x1a0
    [13034.963544] [] ? kthread_create_on_node+0x330/0x330
    [13034.965479] [] ? finish_task_switch+0x88/0x220
    [13034.967223] [] ? kthread_create_on_node+0x330/0x330
    [13034.968929] [] ret_from_fork+0x3f/0x70
    [13034.970534] [] ? kthread_create_on_node+0x330/0x330
    [13034.972176] Code: c7 43 50 40 84 0d a0 e8 3d fe 1c e1 48 8d 7b 58 c7 83 e4 00 00 00 00 00 00 00 e8 ca fe 1c e1 4c 8b 63 58 4c 89 e7 e8 be fe 1c e1 83 3c 24 00 74 12 48 c7 43 50 f0 a2 0e a0 b8 01 00 00 00 5b
    [13034.977148] RIP [] rpc_restart_call_prepare+0x62/0x90 [sunrpc]
    [13034.978780] RSP
    [13034.980399] CR2: 0000000000000000

    Signed-off-by: Kinglong Mee
    Signed-off-by: Trond Myklebust

    Kinglong Mee
     

21 Sep, 2015

1 commit


27 Apr, 2015

1 commit

  • Pull NFS client updates from Trond Myklebust:
    "Another set of mainly bugfixes and a couple of cleanups. No new
    functionality in this round.

    Highlights include:

    Stable patches:
    - Fix a regression in /proc/self/mountstats
    - Fix the pNFS flexfiles O_DIRECT support
    - Fix high load average due to callback thread sleeping

    Bugfixes:
    - Various patches to fix the pNFS layoutcommit support
    - Do not cache pNFS deviceids unless server notifications are enabled
    - Fix a SUNRPC transport reconnection regression
    - make debugfs file creation failure non-fatal in SUNRPC
    - Another fix for circular directory warnings on NFSv4 "junctioned"
    mountpoints
    - Fix locking around NFSv4.2 fallocate() support
    - Truncating NFSv4 file opens should also sync O_DIRECT writes
    - Prevent infinite loop in rpcrdma_ep_create()

    Features:
    - Various improvements to the RDMA transport code's handling of
    memory registration
    - Various code cleanups"

    * tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (55 commits)
    fs/nfs: fix new compiler warning about boolean in switch
    nfs: Remove unneeded casts in nfs
    NFS: Don't attempt to decode missing directory entries
    Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one"
    NFS: Rename idmap.c to nfs4idmap.c
    NFS: Move nfs_idmap.h into fs/nfs/
    NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h
    NFS: Add a stub for GETDEVICELIST
    nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes
    nfs: fix DIO good bytes calculation
    nfs: Fetch MOUNTED_ON_FILEID when updating an inode
    sunrpc: make debugfs file creation failure non-fatal
    nfs: fix high load average due to callback thread sleeping
    NFS: Reduce time spent holding the i_mutex during fallocate()
    NFS: Don't zap caches on fallocate()
    xprtrdma: Make rpcrdma_{un}map_one() into inline functions
    xprtrdma: Handle non-SEND completions via a callout
    xprtrdma: Add "open" memreg op
    xprtrdma: Add "destroy MRs" memreg op
    xprtrdma: Add "reset MRs" memreg op
    ...

    Linus Torvalds
     

24 Apr, 2015

1 commit

  • This reverts commit 5a254d08b086d80cbead2ebcee6d2a4b3a15587a.

    Since commit 5a254d08b086 ("nfs: replace nfs_add_stats with
    nfs_inc_stats when add one"), nfs_readpage and nfs_do_writepage use
    nfs_inc_stats to increment NFSIOS_READPAGES and NFSIOS_WRITEPAGES
    instead of nfs_add_stats.

    However nfs_inc_stats does not do the same thing as nfs_add_stats with
    value 1 because these functions work on distinct stats:
    nfs_inc_stats increments stats from "enum nfs_stat_eventcounters" (in
    server->io_stats->events) and nfs_add_stats those from "enum
    nfs_stat_bytecounters" (in server->io_stats->bytes).

    Signed-off-by: Nicolas Iooss
    Fixes: 5a254d08b086 ("nfs: replace nfs_add_stats with nfs_inc_stats...")
    Cc: stable@vger.kernel.org # 3.19+
    Signed-off-by: Trond Myklebust

    Nicolas Iooss
     

16 Apr, 2015

2 commits


04 Feb, 2015

2 commits


25 Nov, 2014

1 commit


26 Jun, 2014

1 commit


25 Jun, 2014

2 commits

  • struct nfs_pgio_data only exists as a member of nfs_pgio_header, but is
    passed around everywhere, because there used to be multiple _data structs
    per _header. Many of these functions then use the _data to find a pointer
    to the _header. This patch cleans this up by merging the nfs_pgio_data
    structure into nfs_pgio_header and passing nfs_pgio_header around instead.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Weston Andros Adamson
    Signed-off-by: Trond Myklebust

    Weston Andros Adamson
     
  • nfs_rw_header was used to allocate an nfs_pgio_header along with an
    nfs_pgio_data, because a _header would need at least one _data.

    Now there is only ever one nfs_pgio_data for each nfs_pgio_header -- move
    it to nfs_pgio_header and get rid of nfs_rw_header.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Weston Andros Adamson
    Signed-off-by: Trond Myklebust

    Weston Andros Adamson
     

29 May, 2014

14 commits

  • nfs_read_completion relied on the fact that there was a 1:1 mapping
    of page to nfs_request, but this has now changed.

    Regions not covered by a request have already been zeroed elsewhere.

    Signed-off-by: Weston Andros Adamson
    Signed-off-by: Trond Myklebust

    Weston Andros Adamson
     
  • Operations that modify state for a whole page must be syncronized across
    all requests within a page group. In the read path, this is calling
    unlock_page and SetPageUptodate. Both of these functions should not be
    called until all requests in a page group have reached the point where
    they would call them.

    This patch should have no effect yet since all page groups currently
    have one request, but will come into play when pg_test functions are
    modified to split pages into sub-page regions.

    Signed-off-by: Weston Andros Adamson
    Signed-off-by: Trond Myklebust

    Weston Andros Adamson
     
  • Add "page groups" - a circular list of nfs requests (struct nfs_page)
    that all reference the same page. This gives nfs read and write paths
    the ability to account for sub-page regions independently. This
    somewhat follows the design of struct buffer_head's sub-page
    accounting.

    Only "head" requests are ever added/removed from the inode list in
    the buffered write path. "head" and "sub" requests are treated the
    same through the read path and the rest of the write/commit path.
    Requests are given an extra reference across the life of the list.

    Page groups are never rejoined after being split. If the read/write
    request fails and the client falls back to another path (ie revert
    to MDS in PNFS case), the already split requests are pushed through
    the recoalescing code again, which may split them further and then
    coalesce them into properly sized requests on the wire. Fragmentation
    shouldn't be a problem with the current design, because we flush all
    requests in page group when a non-contiguous request is added, so
    the only time resplitting should occur is on a resend of a read or
    write.

    This patch lays the groundwork for sub-page splitting, but does not
    actually do any splitting. For now all page groups have one request
    as pg_test functions don't yet split pages. There are several related
    patches that are needed support multiple requests per page group.

    Signed-off-by: Weston Andros Adamson
    Signed-off-by: Trond Myklebust

    Weston Andros Adamson
     
  • @inode is passed but not used.

    Signed-off-by: Weston Andros Adamson
    Signed-off-by: Trond Myklebust

    Weston Andros Adamson
     
  • At this point the read and write structures look identical, so combine
    them into something shared by both.

    Signed-off-by: Anna Schumaker
    Signed-off-by: Trond Myklebust

    Anna Schumaker
     
  • What we have here is two functions that look identical. Let's share
    some more code!

    Signed-off-by: Anna Schumaker
    Signed-off-by: Trond Myklebust

    Anna Schumaker
     
  • Once again, these two functions look identical in the read and write
    case. Time to combine them together!

    Signed-off-by: Anna Schumaker
    Signed-off-by: Trond Myklebust

    Anna Schumaker
     
  • Most of this code is the same for both the read and write paths, so
    combine everything and use the rw_ops when necessary.

    Signed-off-by: Anna Schumaker
    Signed-off-by: Trond Myklebust

    Anna Schumaker
     
  • These functions are almost identical on both the read and write side.
    FLUSH_COND_STABLE will never be set for the read path, so leaving it in
    the generic code won't hurt anything.

    Signed-off-by: Anna Schumaker
    Signed-off-by: Trond Myklebust

    Anna Schumaker
     
  • At this point, the read and write versions of this function look
    identical so both should use the same function.

    Signed-off-by: Anna Schumaker
    Signed-off-by: Trond Myklebust

    Anna Schumaker
     
  • Write adds a little bit of code dealing with flush flags, but since
    "how" will always be 0 when reading we can share the code.

    Signed-off-by: Anna Schumaker
    Signed-off-by: Trond Myklebust

    Anna Schumaker
     
  • The read and write paths set up this struct in exactly the same way, so
    create a single shared struct.

    Signed-off-by: Anna Schumaker
    Signed-off-by: Trond Myklebust

    Anna Schumaker
     
  • Combining these functions will let me make a single nfs_rw_common_ops
    struct (see the next patch).

    Signed-off-by: Anna Schumaker
    Signed-off-by: Trond Myklebust

    Anna Schumaker
     
  • The read and write paths do exactly the same thing for the rpc_prepare
    rpc_op. This patch combines them together into a single function.

    Signed-off-by: Anna Schumaker
    Signed-off-by: Trond Myklebust

    Anna Schumaker