23 Jun, 2015

1 commit

  • This patch changes nfs4_preprocess_stateid_op so it always returns
    a valid struct file if it has been asked for that. For that we
    now allocate a temporary struct file for special stateids, and check
    permissions if we got the file structure from the stateid. This
    ensures that all callers will get their handling of special stateids
    right, and avoids code duplication.

    There is a little wart in here because the read code needs to know
    if we allocated a file structure so that it can copy around the
    read-ahead parameters. In the long run we should probably aim to
    cache full file structures used with special stateids instead.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields

    Christoph Hellwig
     

20 Jun, 2015

3 commits


05 Jun, 2015

7 commits

  • Bi-directional RPC support means code in svcrdma.ko invokes a bit of
    code in xprtrdma.ko, and vice versa. To avoid loader/linker loops,
    merge the server and client side modules together into a single
    module.

    When backchannel capabilities are added, the combined module will
    register all needed transport capabilities so that Upper Layer
    consumers automatically have everything needed to create a
    bi-directional transport connection.

    Module aliases are added for backwards compatibility with user
    space, which still may expect svcrdma.ko or xprtrdma.ko to be
    present.

    This commit reverts commit 2e8c12e1b765 ("xprtrdma: add separate
    Kconfig options for NFSoRDMA client and server support") and
    provides a single CONFIG option for enabling the new module.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • The server and client maximum are architecturally independent.
    Allow changing one without affecting the other.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • At the 2015 LSF/MM, it was requested that memory allocation
    call sites that request GFP_KERNEL allocations in a loop should be
    annotated with __GFP_NOFAIL.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Fields in struct rpcrdma_msg are __be32. Don't byte-swap these
    fields when decoding RPC calls and then swap them back for the
    reply. For the most part, they can be left alone.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • In send_write_chunks(), we have:

    for (xdr_off = rqstp->rq_res.head[0].iov_len, chunk_no = 0;
    xfer_len && chunk_no < arg_ary->wc_nchunks;
    chunk_no++) {
    . . .
    }

    Note that arg_ary->wc_nchunk is in network byte-order. For the
    comparison to work correctly, both have to be in native byte-order.

    In send_reply_chunks, we have:

    write_len = min(xfer_len, htonl(ch->rs_length));

    xfer_len is in native byte-order, and ch->rs_length is in
    network byte-order. be32_to_cpu() is the correct byte swap
    for ch->rs_length.

    As an additional clean up, replace ntohl() with be32_to_cpu() in
    a few other places.

    This appears to address a problem with large rsize hangs while
    using PHYSICAL memory registration. I suspect that is the only
    registration mode that uses more than one chunk element.

    BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=248
    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • When testing pnfs layout, nfsd got error NFS4ERR_SEQ_MISORDERED.
    It is caused by nfs return NFS4ERR_DELAY before validate_seqid(),
    don't update the sequnce id, but nfsd updates the sequnce id !!!

    According to RFC5661 20.9.3,
    " If CB_SEQUENCE returns an error, then the state of the slot
    (sequence ID, cached reply) MUST NOT change. "

    Signed-off-by: Kinglong Mee
    Signed-off-by: J. Bruce Fields

    Kinglong Mee
     
  • nfsd enters a infinite loop and prints message every 10 seconds:

    May 31 18:33:52 test-server kernel: Error sending entire callback!
    May 31 18:34:01 test-server kernel: Error sending entire callback!

    This is caused by a cb_layoutreturn getting error -10008
    (NFS4ERR_DELAY), the client crashing, and then nfsd entering the
    infinite loop:

    bc_sendto --> call_timeout --> nfsd4_cb_done --> nfsd4_cb_layout_done
    with error -10008 --> rpc_delay(task, HZ/100) --> bc_sendto ...

    Reproduced using xfstests 074 with nfs client's kdump on,
    CONFIG_DEFAULT_HUNG_TASK_TIMEOUT set, and client's blkmapd down:

    1. nfs client's write operation will get the layout of file,
    and then send getdeviceinfo,
    2. but layout segment is not recorded by client because blkmapd is down,
    3. client writes data by sending WRITE to server,
    4. nfs server recalls the layout of the file before WRITE,
    5. network error causes the client reset the session and return NFS4ERR_DELAY,
    6. so client's WRITE operation is waiting the reply.
    If the task hangs 120s, the client will crash.
    7. so that, the next bc_sendto will fail with TIMEOUT,
    and cb_status is NFS4ERR_DELAY.

    Signed-off-by: Kinglong Mee
    Signed-off-by: J. Bruce Fields

    Kinglong Mee
     

04 Jun, 2015

2 commits


01 Jun, 2015

1 commit


29 May, 2015

4 commits

  • Signed-off-by: Andreas Gruenbacher
    Signed-off-by: J. Bruce Fields

    Andreas Gruenbacher
     
  • gcc-5.0 warns about a potential uninitialized variable use in nfsd:

    fs/nfsd/nfs4state.c: In function 'nfsd4_process_open2':
    fs/nfsd/nfs4state.c:3781:3: warning: 'old_deny_bmap' may be used uninitialized in this function [-Wmaybe-uninitialized]
    reset_union_bmap_deny(old_deny_bmap, stp);
    ^
    fs/nfsd/nfs4state.c:3760:16: note: 'old_deny_bmap' was declared here
    unsigned char old_deny_bmap;
    ^

    This is a false positive, the code path that is warned about cannot
    actually be reached.

    This adds an initialization for the variable to make the warning go
    away.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: J. Bruce Fields

    Arnd Bergmann
     
  • Whether or not a file system supports acls can be determined with
    IS_POSIXACL(inode) and does not require trying to fetch any acls; the code for
    computing the supported_attrs and aclsupport attributes can be simplified.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: J. Bruce Fields

    Andreas Gruenbacher
     
  • NFSv2 can set the atime and/or mtime of a file to specific timestamps but not
    to the server's current time. To implement the equivalent of utimes("file",
    NULL), it uses a heuristic.

    NFSv3 and later do support setting the atime and/or mtime to the server's
    current time directly. The NFSv2 heuristic is still enabled, and causes
    timestamps to be set wrong sometimes.

    Fix this by moving the heuristic into the NFSv2 specific code. We can leave it
    out of the create code path: the owner can always set timestamps arbitrarily,
    and the workaround would never trigger.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields

    Andreas Gruenbacher
     

07 May, 2015

1 commit

  • The NFSv3 READDIRPLUS gets some of the returned attributes from the
    readdir, and some from an inode returned from a new lookup. The two
    objects could be different thanks to intervening renames.

    The attributes in READDIRPLUS are optional, so let's just skip them if
    we notice this case.

    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     

05 May, 2015

9 commits

  • The 'overloads-avoided' counter itself was removed several years ago by
    commit 78c210e (Revert "knfsd: avoid overloading the CPU scheduler with
    enormous load averages").

    Signed-off-by: Scott Mayhew
    Signed-off-by: J. Bruce Fields

    Scott Mayhew
     
  • Signed-off-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields

    Christoph Hellwig
     
  • With sessions in v4.1 or later we don't need to manually probe the backchannel
    connection, so we can declare it up instantly after setting up the RPC client.

    Note that we really should split nfsd4_run_cb_work in the long run, this is
    just the least intrusive fix for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields

    Christoph Hellwig
     
  • Checking the rpc_client pointer is not a reliable way to detect
    backchannel changes: cl_cb_client is changed only after shutting down
    the rpc client, so the condition cl_cb_client = tk_client will always be
    true.

    Check the RPC_TASK_KILLED flag instead, and rewrite the code to avoid
    the buggy cl_callbacks list and fix the lifetime rules due to double
    calls of the ->prepare callback operations method for this retry case.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields

    Christoph Hellwig
     
  • We must only increment the sequence id if the client has seen and responded
    to a request. If we failed to deliver it to the client we must resend with
    the same sequence id. So just like the client track errors at the transport
    level differently from those returned in the XDR.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields

    Christoph Hellwig
     
  • In an environment where the KDC is running Active Directory, the
    exported composite name field returned in the context could be large
    enough to span a page boundary. Attaching a scratch buffer to the
    decoding xdr_stream helps deal with those cases.

    The case where we saw this was actually due to behavior that's been
    fixed in newer gss-proxy versions, but we're fixing it here too.

    Signed-off-by: Scott Mayhew
    Cc: stable@vger.kernel.org
    Reviewed-by: Simo Sorce
    Signed-off-by: J. Bruce Fields

    Scott Mayhew
     
  • For the sake of forgetful clients, the server should return the layouts
    to the file system on 'last close' of a file (assuming that there are no
    delegations outstanding to that particular client) or on delegreturn
    (assuming that there are no opens on a file from that particular
    client).

    In theory the information is all there in current data structures, but
    it's not efficiently available; nfs4_file->fi_ref includes references on
    the file across all clients, but we need a per-(client, file) count.
    Walking through lots of stateid's to calculate this on each close or
    delegreturn would be painful.

    This patch introduces infrastructure to maintain per-client opens and
    delegation counters on a per-file basis.

    [hch: ported to the mainline pNFS support, merged various fixes from Jeff]
    Signed-off-by: Sachin Bhamare
    Signed-off-by: Jeff Layton
    Signed-off-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields

    Sachin Bhamare
     
  • If we find a non-confirmed openowner we jump to exit the function, but do
    not set an error value. Fix this by factoring out a helper to do the
    check and properly set the error from nfsd4_validate_stateid.

    Cc: stable@vger.kernel.org
    Signed-off-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields

    Christoph Hellwig
     
  • Commit df52699e4fcef ("NFSv4.1: Don't cache deviceids that have no
    notifications") causes the Linux NFS client to stop caching deviceid's
    unless a server pretends to support deviceid notifications. While this
    behavior is stupid and the language around this area in rfc5661 is a
    mess carified by an errata that I submittted, Trond insists on this
    behavior. Not caching deviceids degrades block layout performance
    massively as a GETDEVICEINFO is fairly expensive.

    So add this hack to make the Linux client happy again.

    Cc: stable@vger.kernel.org
    Signed-off-by: Christoph Hellwig
    Signed-off-by: J. Bruce Fields

    Christoph Hellwig
     

04 May, 2015

8 commits

  • Linus Torvalds
     
  • Pull ext4 fixes from Ted Ts'o:
    "Some miscellaneous bug fixes and some final on-disk and ABI changes
    for ext4 encryption which provide better security and performance"

    * tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: fix growing of tiny filesystems
    ext4: move check under lock scope to close a race.
    ext4: fix data corruption caused by unwritten and delayed extents
    ext4 crypto: remove duplicated encryption mode definitions
    ext4 crypto: do not select from EXT4_FS_ENCRYPTION
    ext4 crypto: add padding to filenames before encrypting
    ext4 crypto: simplify and speed up filename encryption

    Linus Torvalds
     
  • Pull drm fixes from Dave Airlie:
    "One intel fix, one rockchip fix, and a bunch of radeon fixes for some
    regressions from audio rework and vm stability"

    * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
    drm/i915/chv: Implement WaDisableShadowRegForCpd
    drm/radeon: fix userptr return value checking (v2)
    drm/radeon: check new address before removing old one
    drm/radeon: reset BOs address after clearing it.
    drm/radeon: fix lockup when BOs aren't part of the VM on release
    drm/radeon: add SI DPM quirk for Sapphire R9 270 Dual-X 2G GDDR5
    drm/radeon: adjust pll when audio is not enabled
    drm/radeon: only enable audio streams if the monitor supports it
    drm/radeon: only mark audio as connected if the monitor supports it (v3)
    drm/radeon/audio: don't enable packets until the end
    drm/radeon: drop dce6_dp_enable
    drm/radeon: fix ordering of AVI packet setup
    drm/radeon: Use drm_calloc_ab for CS relocs
    drm/rockchip: fix error check when getting irq
    MAINTAINERS: add entry for Rockchip drm drivers

    Linus Torvalds
     
  • Just a single intel fix
    * tag 'drm-intel-fixes-2015-04-30' of git://anongit.freedesktop.org/drm-intel:
    drm/i915/chv: Implement WaDisableShadowRegForCpd

    Dave Airlie
     
  • one fix and maintainers update
    * 'drm-next0420' of https://github.com/markyzq/kernel-drm-rockchip:
    drm/rockchip: fix error check when getting irq
    MAINTAINERS: add entry for Rockchip drm drivers

    Dave Airlie
     
  • Pull SCSI fixes from James Bottomley:
    "This is three logical fixes (as 5 patches).

    The 3ware class of drivers were causing an oops with multiqueue by
    tearing down the command mappings after completing the command (where
    the variables in the command used to tear down the mapping were
    no-longer valid). There's also a fix for the qnap iscsi target which
    was choking on us sending it commands that were too long and a fix for
    the reworked aha1542 allocating GFP_KERNEL under a lock"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    3w-9xxx: fix command completion race
    3w-xxxx: fix command completion race
    3w-sas: fix command completion race
    aha1542: Allocate memory before taking a lock
    SCSI: add 1024 max sectors black list flag

    Linus Torvalds
     
  • Pull slave dmaengine fixes from Vinod Koul:
    "Here are the fixes in dmaengine subsystem for rc2:

    - privatecnt fix for slave dma request API by Christopher

    - warn fix for PM ifdef in usb-dmac by Geert

    - fix hardware dependency for xgene by Jean"

    * 'next' of git://git.infradead.org/users/vkoul/slave-dma:
    dmaengine: increment privatecnt when using dma_get_any_slave_channel
    dmaengine: xgene: Set hardware dependency
    dmaengine: usb-dmac: Protect PM-only functions to kill warning

    Linus Torvalds
     
  • Pull powerpc fixes from Michael Ellerman:
    - build fix for SMP=n in book3s_xics.c
    - fix for Daniel's pci_controller_ops on powernv.
    - revert the TM syscall abort patch for now.
    - CPU affinity fix from Nathan.
    - two EEH fixes from Gavin.
    - fix for CR corruption from Sam.
    - selftest build fix.

    * tag 'powerpc-4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
    powerpc/powernv: Restore non-volatile CRs after nap
    powerpc/eeh: Delay probing EEH device during hotplug
    powerpc/eeh: Fix race condition in pcibios_set_pcie_reset_state()
    powerpc/pseries: Correct cpu affinity for dlpar added cpus
    selftests/powerpc: Fix the pmu install rule
    Revert "powerpc/tm: Abort syscalls in active transactions"
    powerpc/powernv: Fix early pci_controller_ops loading.
    powerpc/kvm: Fix SMP=n build error in book3s_xics.c

    Linus Torvalds
     

03 May, 2015

3 commits

  • The estimate of necessary transaction credits in ext4_flex_group_add()
    is too pessimistic. It reserves credit for sb, resize inode, and resize
    inode dindirect block for each group added in a flex group although they
    are always the same block and thus it is enough to account them only
    once. Also the number of modified GDT block is overestimated since we
    fit EXT4_DESC_PER_BLOCK(sb) descriptors in one block.

    Make the estimation more precise. That reduces number of requested
    credits enough that we can grow 20 MB filesystem (which has 1 MB
    journal, 79 reserved GDT blocks, and flex group size 16 by default).

    Signed-off-by: Jan Kara
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Eric Sandeen

    Jan Kara
     
  • fallocate() checks that the file is extent-based and returns
    EOPNOTSUPP in case is not. Other tasks can convert from and to
    indirect and extent so it's safe to check only after grabbing
    the inode mutex.

    Signed-off-by: Davide Italiano
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Davide Italiano
     
  • Currently it is possible to lose whole file system block worth of data
    when we hit the specific interaction with unwritten and delayed extents
    in status extent tree.

    The problem is that when we insert delayed extent into extent status
    tree the only way to get rid of it is when we write out delayed buffer.
    However there is a limitation in the extent status tree implementation
    so that when inserting unwritten extent should there be even a single
    delayed block the whole unwritten extent would be marked as delayed.

    At this point, there is no way to get rid of the delayed extents,
    because there are no delayed buffers to write out. So when a we write
    into said unwritten extent we will convert it to written, but it still
    remains delayed.

    When we try to write into that block later ext4_da_map_blocks() will set
    the buffer new and delayed and map it to invalid block which causes
    the rest of the block to be zeroed loosing already written data.

    For now we can fix this by simply not allowing to set delayed status on
    written extent in the extent status tree. Also add WARN_ON() to make
    sure that we notice if this happens in the future.

    This problem can be easily reproduced by running the following xfs_io.

    xfs_io -f -c "pwrite -S 0xaa 4096 2048" \
    -c "falloc 0 131072" \
    -c "pwrite -S 0xbb 65536 2048" \
    -c "fsync" /mnt/test/fff

    echo 3 > /proc/sys/vm/drop_caches
    xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff

    This can be theoretically also reproduced by at random by running fsx,
    but it's not very reliable, though on machines with bigger page size
    (like ppc) this can be seen more often (especially xfstest generic/127)

    Signed-off-by: Lukas Czerner
    Signed-off-by: Theodore Ts'o
    Cc: stable@vger.kernel.org

    Lukas Czerner
     

02 May, 2015

1 commit