29 Jun, 2013

2 commits

  • The GETDEVICEINFO gdia_maxcount represents all of the data being returned
    within the GETDEVICEINFO4resok structure and includes the XDR overhead.

    The CREATE_SESSION ca_maxresponsesize is the maximum reply and includes the RPC
    headers (including security flavor credentials and verifiers).

    Split out the struct pnfs_device field maxcount which is the gdia_maxcount
    from the pglen field which is the reply (the total) buffer length.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Fallback should happen only when the request_key() call fails, because
    this indicates that there was a problem running the nfsidmap program.
    We shouldn't call the legacy code if the error was elsewhere.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     

20 Jun, 2013

1 commit

  • We need to ensure that we clear NFS4_SLOT_TBL_DRAINING on the back
    channel when we're done recovering the session.

    Regression introduced by commit 774d5f14e (NFSv4.1 Fix a pNFS session
    draining deadlock)

    Signed-off-by: Andy Adamson
    [Trond: Changed order to start back-channel first. Minor code cleanup]
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org [>=3.10]

    Andy Adamson
     

19 Jun, 2013

5 commits

  • Give them names that are a bit more consistent with the general
    pNFS naming scheme.

    - lo_seg_contained -> pnfs_lseg_range_contained
    - lo_seg_intersecting -> pnfs_lseg_range_intersecting
    - cmp_layout -> pnfs_lseg_range_cmp
    - is_matching_lseg -> pnfs_lseg_range_match

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Also strip off the unnecessary 'inline' declarations.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • The other protocols don't use it, so make it local to NFSv4, and
    remove the EXPORT.
    Also ensure that we only compile in cache_lib.o if we're using
    the legacy DNS resolver.

    Signed-off-by: Trond Myklebust
    Cc: Bryan Schumaker

    Trond Myklebust
     
  • We had a report of a reproducible WARNING:

    [ 1360.039358] ------------[ cut here ]------------
    [ 1360.043978] WARNING: at fs/dcache.c:1355 d_set_d_op+0x8d/0xc0()
    [ 1360.049880] Hardware name: HP Z200 Workstation
    [ 1360.054308] Modules linked in: nfsv4 nfs dns_resolver fscache nfsd
    auth_rpcgss nfs_acl lockd sunrpc sg acpi_cpufreq mperf coretemp kvm_intel kvm
    snd_hda_codec_realtek snd_hda_intel snd_hda_codec hp_wmi crc32c_intel
    snd_hwdep e1000e snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd
    sparse_keymap rfkill soundcore serio_raw ptp iTCO_wdt pps_core pcspkr
    iTCO_vendor_support mei microcode lpc_ich mfd_core wmi xfs libcrc32c sr_mod
    sd_mod cdrom crc_t10dif radeon i2c_algo_bit drm_kms_helper ttm ahci libahci
    drm i2c_core libata dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
    auth_rpcgss]
    [ 1360.107406] Pid: 8814, comm: mount.nfs4 Tainted: G I -------------- 3.9.0-0.55.el7.x86_64 #1
    [ 1360.116771] Call Trace:
    [ 1360.119219] [] warn_slowpath_common+0x70/0xa0
    [ 1360.125208] [] warn_slowpath_null+0x1a/0x20
    [ 1360.131025] [] d_set_d_op+0x8d/0xc0
    [ 1360.136159] [] __rpc_lookup_create_exclusive+0x4f/0x80 [sunrpc]
    [ 1360.143710] [] rpc_mkpipe_dentry+0x86/0x170 [sunrpc]
    [ 1360.150311] [] nfs_idmap_new+0x96/0x130 [nfsv4]
    [ 1360.156475] [] nfs4_init_client+0xad/0x2d0 [nfsv4]
    [ 1360.162902] [] ? idr_get_empty_slot+0x16f/0x3c0
    [ 1360.169062] [] ? idr_mark_full+0x52/0x60
    [ 1360.174615] [] ? idr_alloc+0x79/0xe0
    [ 1360.179826] [] ? __rpc_init_priority_wait_queue+0x81/0xc0 [sunrpc]
    [ 1360.187635] [] ? rpc_init_wait_queue+0x13/0x20 [sunrpc]
    [ 1360.194493] [] nfs_get_client+0x27a/0x350 [nfs]
    [ 1360.200666] [] nfs4_set_client.isra.8+0x78/0x100 [nfsv4]
    [ 1360.207624] [] nfs4_create_server+0xf3/0x3a0 [nfsv4]
    [ 1360.214222] [] nfs4_remote_mount+0x2e/0x60 [nfsv4]
    [ 1360.220644] [] mount_fs+0x39/0x1b0
    [ 1360.225691] [] ? __alloc_percpu+0x10/0x20
    [ 1360.231348] [] vfs_kern_mount+0x5f/0xf0
    [ 1360.236822] [] nfs_do_root_mount+0x86/0xc0 [nfsv4]
    [ 1360.243246] [] nfs4_try_mount+0x44/0xc0 [nfsv4]
    [ 1360.249410] [] ? get_nfs_version+0x27/0x80 [nfs]
    [ 1360.255659] [] nfs_fs_mount+0x5c5/0xd10 [nfs]
    [ 1360.261650] [] ? nfs_clone_super+0x140/0x140 [nfs]
    [ 1360.268074] [] ? param_set_portnr+0x60/0x60 [nfs]
    [ 1360.274406] [] mount_fs+0x39/0x1b0
    [ 1360.279443] [] ? __alloc_percpu+0x10/0x20
    [ 1360.285088] [] vfs_kern_mount+0x5f/0xf0
    [ 1360.290556] [] do_mount+0x1fd/0xa00
    [ 1360.295677] [] ? __get_free_pages+0xe/0x50
    [ 1360.301405] [] ? copy_mount_options+0x36/0x170
    [ 1360.307479] [] sys_mount+0x83/0xc0
    [ 1360.312515] [] system_call_fastpath+0x16/0x1b
    [ 1360.318503] ---[ end trace 8fa1f4cbc36094a7 ]---

    The problem is that we're ending up in __rpc_lookup_create_exclusive
    with a negative dentry that already has d_op set. A little debugging
    has shown that when we hit this, the d_ops are already set to
    simple_dentry_operations.

    I believe that what's happening is that during a mount, idmapd is racing
    in and doing a lookup of /var/lib/nfs/rpc_pipefs/nfs/clnt???/idmap.
    Before that dentry reference is released, the kernel races in to create
    that file and finds the new negative dentry, which already has the
    d_op set.

    This patch just avoids setting the d_op if it's already set.
    simple_dentry_operations and rpc_dentry_operations are functionally
    equivalent so it shouldn't matter which one it's set to.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • Make sure that NFSv4 SETCLIENTID does not parse the NETID as a
    format string.

    Signed-off-by: Djalal Harouni
    Signed-off-by: Trond Myklebust

    Djalal Harouni
     

07 Jun, 2013

17 commits


31 May, 2013

1 commit

  • Darrick J. Wong reports:
    > I have a kvm-based testing setup that netboots VMs over NFS, the
    > client end of which seems to have broken somehow in 3.10-rc1. The
    > server's exports file looks like this:
    >
    > /storage/mtr/x64 192.168.122.0/24(ro,sync,no_root_squash,no_subtree_check)
    >
    > On the client end (inside the VM), the initrd runs the following
    > command to try to mount the rootfs over NFS:
    >
    > # mount -o nolock -o ro -o retrans=10 192.168.122.1:/storage/mtr/x64/ /root
    >
    > (Note: This is the busybox mount command.)
    >
    > The mount fails with -EINVAL.

    Commit 4580a92d44 "NFS: Use server-recommended security flavor by
    default (NFSv3)" introduced a behavior regression for NFS mounts
    done via a legacy binary mount(2) call.

    Ensure that a default security flavor is specified for legacy binary
    mount requests, since they do not invoke nfs_select_flavor() in the
    kernel.

    Busybox uses klibc's nfsmount command, which performs NFS mounts
    using the legacy binary mount data format. /sbin/mount.nfs is not
    affected by this regression.

    Reported-by: Darrick J. Wong
    Signed-off-by: Chuck Lever
    Tested-by: Darrick J. Wong
    Acked-by: Weston Andros Adamson
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

30 May, 2013

1 commit


24 May, 2013

1 commit


23 May, 2013

1 commit

  • The lockless RPC_IS_QUEUED() test in __rpc_execute means that we need to
    be careful about ordering the calls to rpc_test_and_set_running(task) and
    rpc_clear_queued(task). If we get the order wrong, then we may end up
    testing the RPC_TASK_RUNNING flag after __rpc_execute() has looped
    and changed the state of the rpc_task.

    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org

    Trond Myklebust
     

21 May, 2013

1 commit

  • On a CB_RECALL the callback service thread flushes the inode using
    filemap_flush prior to scheduling the state manager thread to return the
    delegation. When pNFS is used and I/O has not yet gone to the data server
    servicing the inode, a LAYOUTGET can preceed the I/O. Unlike the async
    filemap_flush call, the LAYOUTGET must proceed to completion.

    If the state manager starts to recover data while the inode flush is sending
    the LAYOUTGET, a deadlock occurs as the callback service thread holds the
    single callback session slot until the flushing is done which blocks the state
    manager thread, and the state manager thread has set the session draining bit
    which puts the inode flush LAYOUTGET RPC to sleep on the forechannel slot
    table waitq.

    Separate the draining of the back channel from the draining of the fore channel
    by moving the NFS4_SESSION_DRAINING bit from session scope into the fore
    and back slot tables. Drain the back channel first allowing the LAYOUTGET
    call to proceed (and fail) so the callback service thread frees the callback
    slot. Then proceed with draining the forechannel.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     

16 May, 2013

3 commits

  • This seems to have been overlooked when we did the namespace
    conversion. If a container is running a legacy version of rpc.gssd
    then it will be disrupted if the global 'pipe_version' is set by a
    container running the new version of rpc.gssd.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Recent changes to the NFS security flavour negotiation mean that
    we have a stronger dependency on rpc.gssd. If the latter is not
    running, because the user failed to start it, then we time out
    and mark the container as not having an instance. We then
    use that information to time out faster the next time.

    If, on the other hand, the rpc.gssd successfully binds to an rpc_pipe,
    then we mark the container as having an rpc.gssd instance.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • If wait_event_interruptible_timeout() is successful, it returns
    the number of seconds remaining until the timeout. In that
    case, we should be retrying the upcall.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

12 May, 2013

6 commits

  • Linus Torvalds
     
  • Pull tracing/kprobes update from Steven Rostedt:
    "The majority of these changes are from Masami Hiramatsu bringing
    kprobes up to par with the latest changes to ftrace (multi buffering
    and the new function probes).

    He also discovered and fixed some bugs in doing so. When pulling in
    his patches, I also found a few minor bugs as well and fixed them.

    This also includes a compile fix for some archs that select the ring
    buffer but not tracing.

    I based this off of the last patch you took from me that fixed the
    merge conflict error, as that was the commit that had all the changes
    I needed for this set of changes."

    * tag 'trace-fixes-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing/kprobes: Support soft-mode disabling
    tracing/kprobes: Support ftrace_event_file base multibuffer
    tracing/kprobes: Pass trace_probe directly from dispatcher
    tracing/kprobes: Increment probe hit-count even if it is used by perf
    tracing/kprobes: Use bool for retprobe checker
    ftrace: Fix function probe when more than one probe is added
    ftrace: Fix the output of enabled_functions debug file
    ftrace: Fix locking in register_ftrace_function_probe()
    tracing: Add helper function trace_create_new_event() to remove duplicate code
    tracing: Modify soft-mode only if there's no other referrer
    tracing: Indicate enabled soft-mode in enable file
    tracing/kprobes: Fix to increment return event probe hit-count
    ftrace: Cleanup regex_lock and ftrace_lock around hash updating
    ftrace, kprobes: Fix a deadlock on ftrace_regex_lock
    ftrace: Have ftrace_regex_write() return either read or error
    tracing: Return error if register_ftrace_function_probe() fails for event_enable_func()
    tracing: Don't succeed if event_enable_func did not register anything
    ring-buffer: Select IRQ_WORK

    Linus Torvalds
     
  • …nux/kernel/git/konrad/xen

    Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
    - More fixes in the vCPU PVHVM hotplug path.
    - Add more documentation.
    - Fix various ARM related issues in the Xen generic drivers.
    - Updates in the xen-pciback driver per Bjorn's updates.
    - Mask the x2APIC feature for PV guests.

    * tag 'stable/for-linus-3.10-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    xen/pci: Used cached MSI-X capability offset
    xen/pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
    xen: clear IRQ_NOAUTOEN and IRQ_NOREQUEST
    xen: mask x2APIC feature in PV
    xen: SWIOTLB is only used on x86
    xen/spinlock: Fix check from greater than to be also be greater or equal to.
    xen/smp/pvhvm: Don't point per_cpu(xen_vpcu, 33 and larger) to shared_info
    xen/vcpu: Document the xen_vcpu_info and xen_vcpu
    xen/vcpu/pvhvm: Fix vcpu hotplugging hanging.

    Linus Torvalds
     
  • Pull second SCSI update from James "Jaj B" Bottomley:
    "This is the final round of SCSI patches for the merge window. It
    consists mostly of driver updates (bnx2fc, ibmfc, fnic, lpfc,
    be2iscsi, pm80xx, qla4x and ipr).

    There's also the power management updates that complete the patches in
    Jens' tree, an iscsi refcounting problem fix from the last pull, some
    dif handling in scsi_debug fixes, a few nice code cleanups and an
    error handling busy bug fix."

    * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (92 commits)
    [SCSI] qla2xxx: Update firmware link in Kconfig file.
    [SCSI] iscsi class, qla4xxx: fix sess/conn refcounting when find fns are used
    [SCSI] sas: unify the pointlessly separated enums sas_dev_type and sas_device_type
    [SCSI] pm80xx: thermal, sas controller config and error handling update
    [SCSI] pm80xx: NCQ error handling changes
    [SCSI] pm80xx: WWN Modification for PM8081/88/89 controllers
    [SCSI] pm80xx: Changed module name and debug messages update
    [SCSI] pm80xx: Firmware flash memory free fix, with addition of new memory region for it
    [SCSI] pm80xx: SPC new firmware changes for device id 0x8081 alone
    [SCSI] pm80xx: Added SPCv/ve specific hardware functionalities and relevant changes in common files
    [SCSI] pm80xx: MSI-X implementation for using 64 interrupts
    [SCSI] pm80xx: Updated common functions common for SPC and SPCv/ve
    [SCSI] pm80xx: Multiple inbound/outbound queue configuration
    [SCSI] pm80xx: Added SPCv/ve specific ids, variables and modify for SPC
    [SCSI] lpfc: fix up Kconfig dependencies
    [SCSI] Handle MLQUEUE busy response in scsi_send_eh_cmnd
    [SCSI] sd: change to auto suspend mode
    [SCSI] sd: use REQ_PM in sd's runtime suspend operation
    [SCSI] qla4xxx: Fix iocb_cnt calculation in qla4xxx_send_mbox_iocb()
    [SCSI] ufs: Correct the expected data transfersize
    ...

    Linus Torvalds
     
  • Pull idle update from Len Brown:
    "Add support for new Haswell-ULT CPU idle power states"

    * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
    intel_idle: initial C8, C9, C10 support
    tools/power turbostat: display C8, C9, C10 residency

    Linus Torvalds
     
  • Pull audit changes from Eric Paris:
    "Al used to send pull requests every couple of years but he told me to
    just start pushing them to you directly.

    Our touching outside of core audit code is pretty straight forward. A
    couple of interface changes which hit net/. A simple argument bug
    calling audit functions in namei.c and the removal of some assembly
    branch prediction code on ppc"

    * git://git.infradead.org/users/eparis/audit: (31 commits)
    audit: fix message spacing printing auid
    Revert "audit: move kaudit thread start from auditd registration to kaudit init"
    audit: vfs: fix audit_inode call in O_CREAT case of do_last
    audit: Make testing for a valid loginuid explicit.
    audit: fix event coverage of AUDIT_ANOM_LINK
    audit: use spin_lock in audit_receive_msg to process tty logging
    audit: do not needlessly take a lock in tty_audit_exit
    audit: do not needlessly take a spinlock in copy_signal
    audit: add an option to control logging of passwords with pam_tty_audit
    audit: use spin_lock_irqsave/restore in audit tty code
    helper for some session id stuff
    audit: use a consistent audit helper to log lsm information
    audit: push loginuid and sessionid processing down
    audit: stop pushing loginid, uid, sessionid as arguments
    audit: remove the old depricated kernel interface
    audit: make validity checking generic
    audit: allow checking the type of audit message in the user filter
    audit: fix build break when AUDIT_DEBUG == 2
    audit: remove duplicate export of audit_enabled
    Audit: do not print error when LSMs disabled
    ...

    Linus Torvalds
     

11 May, 2013

1 commit

  • Pull nfsd fixes from Bruce Fields:
    "Small fixes for two bugs and two warnings"

    * 'for-3.10' of git://linux-nfs.org/~bfields/linux:
    nfsd: fix oops when legacy_recdir_name_error is passed a -ENOENT error
    SUNRPC: fix decoding of optional gss-proxy xdr fields
    SUNRPC: Refactor gssx_dec_option_array() to kill uninitialized warning
    nfsd4: don't allow owner override on 4.1 CLAIM_FH opens

    Linus Torvalds