31 Jul, 2012

13 commits

  • v2 and v4 don't use it, so I create two new nfs_rpc_ops functions to
    initialize the ACL client only when we are using v3.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • I'm already looking up the nfs subversion in nfs_fs_mount(), so I have
    easy access to rpc_ops that used to be difficult to reach. This allows
    me to set up a different mount path for NFS v2/3 and NFS v4.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • I can now share this code with the v2 and v3 code by using the NFS
    subversion structure.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • This patch adds in the code to track multiple versions of the NFS
    protocol. I created default structures for v2, v3 and v4 so that each
    version can continue to work while I convert them into kernel modules.
    I also removed the const parameter from the rpc_version array so that I
    can change it at runtime.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • Fix a number of bugs in the NFS idmapper code:

    (1) Only registered key types can be passed to the core keys code, so
    register the legacy idmapper key type.

    This is a requirement because the unregister function cleans up keys
    belonging to that key type so that there aren't dangling pointers to the
    module left behind - including the key->type pointer.

    (2) Rename the legacy key type. You can't have two key types with the same
    name, and (1) would otherwise require that.

    (3) complete_request_key() must be called in the error path of
    nfs_idmap_legacy_upcall().

    (4) There is one idmap struct for each nfs_client struct. This means that
    idmap->idmap_key_cons is shared without the use of a lock. This is a
    problem because key_instantiate_and_link() - as called indirectly by
    idmap_pipe_downcall() - releases anyone waiting for the key to be
    instantiated.

    What happens is that idmap_pipe_downcall() running in the rpc.idmapd
    thread, releases the NFS filesystem in whatever thread that is running in
    to continue. This may then make another idmapper call, overwriting
    idmap_key_cons before idmap_pipe_downcall() gets the chance to call
    complete_request_key().

    I *think* that reading idmap_key_cons only once, before
    key_instantiate_and_link() is called, and then caching the result in a
    variable is sufficient.

    Bug (4) is the cause of:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [< (null)>] (null)
    PGD 0
    Oops: 0010 [#1] SMP
    CPU 1
    Modules linked in: ppdev parport_pc lp parport ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack nfs fscache xt_CHECKSUM auth_rpcgss iptable_mangle nfs_acl bridge stp llc lockd be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_realtek snd_usb_audio snd_hda_intel snd_hda_codec snd_seq snd_pcm snd_hwdep snd_usbmidi_lib snd_rawmidi snd_timer uvcvideo videobuf2_core videodev media videobuf2_vmalloc snd_seq_device videobuf2_memops e1000e vhost_net iTCO_wdt joydev coretemp snd soundcore macvtap macvlan i2c_i801 snd_page_alloc tun iTCO_vendor_support microcode kvm_intel kvm sunrpc hid_logitech_dj usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
    Pid: 1229, comm: rpc.idmapd Not tainted 3.4.2-1.fc16.x86_64 #1 Gateway DX4710-UB801A/G33M05G1
    RIP: 0010:[] [< (null)>] (null)
    RSP: 0018:ffff8801a3645d40 EFLAGS: 00010246
    RAX: ffff880077707e30 RBX: ffff880077707f50 RCX: ffff8801a18ccd80
    RDX: 0000000000000006 RSI: ffff8801a3645e75 RDI: ffff880077707f50
    RBP: ffff8801a3645d88 R08: ffff8801a430f9c0 R09: ffff8801a3645db0
    R10: 000000000000000a R11: 0000000000000246 R12: ffff8801a18ccd80
    R13: ffff8801a3645e75 R14: ffff8801a430f9c0 R15: 0000000000000006
    FS: 00007fb6fb51a700(0000) GS:ffff8801afc80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 00000001a49b0000 CR4: 00000000000027e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process rpc.idmapd (pid: 1229, threadinfo ffff8801a3644000, task ffff8801a3bf9710)
    Stack:
    ffffffff81260878 ffff8801a3645db0 ffff8801a3645db0 ffff880077707a90
    ffff880077707f50 ffff8801a18ccd80 0000000000000006 ffff8801a3645e75
    ffff8801a430f9c0 ffff8801a3645dd8 ffffffff81260983 ffff8801a3645de8
    Call Trace:
    [] ? __key_instantiate_and_link+0x58/0x100
    [] key_instantiate_and_link+0x63/0xa0
    [] idmap_pipe_downcall+0x1cb/0x1e0 [nfs]
    [] rpc_pipe_write+0x67/0x90 [sunrpc]
    [] vfs_write+0xb3/0x180
    [] sys_write+0x4a/0x90
    [] system_call_fastpath+0x16/0x1b
    Code: Bad RIP value.
    RIP [< (null)>] (null)
    RSP
    CR2: 0000000000000000

    Signed-off-by: David Howells
    Reviewed-by: Steve Dickson
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org [>= 3.4]

    David Howells
     
  • We've had some reports of a deadlock where rpciod ends up with a stack
    trace like this:

    PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14"
    #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
    #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
    #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
    #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
    #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
    #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
    #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
    #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
    #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
    #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

    rpciod is trying to allocate memory for a new socket to talk to the
    server. The VM ends up calling ->releasepage to get more memory, and it
    tries to do a blocking commit. That commit can't succeed however without
    a connected socket, so we deadlock.

    Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
    socket allocation, and having nfs_release_page check for that flag when
    deciding whether to do a commit call. Also, set PF_FSTRANS
    unconditionally in rpc_async_schedule since that function can also do
    allocations sometimes.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org

    Jeff Layton
     
  • rpc_make_runnable is not generally called with the queue lock held, unless
    it's waking up a task that has been sitting on a waitqueue. This is safe
    when the task has not entered the FSM yet, but the comments don't really
    spell this out.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • Current block layout driver read/write code assumes page
    aligned IO in many places. Add a checker to validate the assumption.
    Otherwise there would be data corruption like when application does
    open(O_WRONLY) and page unaliged write.

    Signed-off-by: Peng Tao
    Signed-off-by: Trond Myklebust

    Peng Tao
     
  • fl_type is not a bitmap.

    Reported-by: Al Viro
    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • Commit 57208fa7e51 "NFS: Create an write_pageio_init() function"
    did not modify the calls in direct.c, preventing direct io from
    using pnfs. This reintroduces that capability.

    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Fred Isaman
     
  • Commit 1abb50886af "NFS: Create an read_pageio_init() function"
    did not modify the call in direct.c, preventing direct io from
    using pnfs. This reintroduces that capability.

    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Fred Isaman
     
  • Add a missing set of braces that commit 4e0038b6b24
    ("SUNRPC: Move clnt->cl_server into struct rpc_xprt")
    forgot.

    Signed-off-by: Joe Perches
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org [>= 3.4]

    Joe Perches
     
  • Fix numerous repeated warnings by making the stub function
    void instead of non-void:

    fs/nfs/nfs4_fs.h: In function 'nfs4_unregister_sysctl':
    fs/nfs/nfs4_fs.h:385:1: warning: no return statement in function returning non-void

    Signed-off-by: Randy Dunlap
    Cc: Trond Myklebust
    Signed-off-by: Trond Myklebust

    Randy Dunlap
     

18 Jul, 2012

14 commits


17 Jul, 2012

13 commits

  • Add documenting comments and appropriate debugging messages.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • For NFSv4 minor version 0, currently the cl_id_uniquifier allows the
    Linux client to generate a unique nfs_client_id4 string whenever a
    server replies with NFS4ERR_CLID_INUSE.

    This implementation seems to be based on a flawed reading of RFC
    3530. NFS4ERR_CLID_INUSE actually means that the client has presented
    this nfs_client_id4 string with a different principal at some time in
    the past, and that lease is still in use on the server.

    For a Linux client this might be rather difficult to achieve: the
    authentication flavor is named right in the nfs_client_id4.id
    string. If we change flavors, we change strings automatically.

    So, practically speaking, NFS4ERR_CLID_INUSE means there is some other
    client using our string. There is not much that can be done to
    recover automatically. Let's make it a permanent error.

    Remove the recovery logic in nfs4_proc_setclientid(), and remove the
    cl_id_uniquifier field from the nfs_client data structure. And,
    remove the authentication flavor from the nfs_client_id4 string.

    Keeping the authentication flavor in the nfs_client_id4.id string
    means that we could have a separate lease for each authentication
    flavor used by mounts on the client. But we want just one lease for
    all the mounts on this client.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • NFSv4 state recovery is not always successful. Failure is signalled
    by setting the nfs_client.cl_cons_state to a negative (errno) value,
    then waking waiters.

    Currently this can happen only during mount processing. I'm about to
    add an explicit case where state recovery failure during normal
    operation should force all NFS requests waiting on that state recovery
    to exit.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • The gss_mech_list_pseudoflavors() function provides a list of
    currently registered GSS pseudoflavors. This list does not include
    any non-GSS flavors that have been registered with the RPC client.
    nfs4_find_root_sec() currently adds these extra flavors by hand.

    Instead, nfs4_find_root_sec() should be looking at the set of flavors
    that have been explicitly registered via rpcauth_register(). And,
    other areas of code will soon need the same kind of list that
    contains all flavors the kernel currently knows about (see below).

    Rather than cloning the open-coded logic in nfs4_find_root_sec() to
    those new places, introduce a generic RPC function that generates a
    full list of registered auth flavors and pseudoflavors.

    A new rpc_authops method is added that lists a flavor's
    pseudoflavors, if it has any. I encountered an interesting module
    loader loop when I tried to get the RPC client to invoke
    gss_mech_list_pseudoflavors() by name.

    This patch is a pre-requisite for server trunking discovery, and a
    pre-requisite for fixing up the in-kernel mount client to do better
    automatic security flavor selection.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Squelch compiler warnings:

    fs/nfs/nfs4proc.c: In function ‘__nfs4_get_acl_uncached’:
    fs/nfs/nfs4proc.c:3811:14: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
    fs/nfs/nfs4proc.c:3818:15: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]

    Introduced by commit bf118a34 "NFSv4: include bitmap in nfsv4 get
    acl data", Dec 7, 2011.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • As a finishing touch, add appropriate documenting comments and some
    debugging printk's.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean up: Instead of open-coded flag manipulation, use test_bit() and
    clear_bit() just like all other accessors of the state->flag field.
    This also eliminates several unnecessary implicit integer type
    conversions.

    To make it absolutely clear what is going on, a number of comments
    are introduced.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • The "state->flags & flags" test in nfs41_check_expired_stateid()
    allows the state manager to squelch a TEST_STATEID operation when
    it is known for sure that a state ID is no longer valid. If the
    lease was purged, for example, the client already knows that state
    ID is now defunct.

    But open recovery is still needed for that inode.

    To force a call to nfs4_open_expired(), change the default return
    value for nfs41_check_expired_stateid() to force open recovery, and
    the default return value for nfs41_check_locks() to force lock
    recovery, if the requested flags are clear. Fix suggested by Bryan
    Schumaker.

    Also, the presence of a delegation state ID must not prevent normal
    open recovery. The delegation state ID must be cleared if it was
    revoked, but once cleared I don't think it's presence or absence has
    any bearing on whether open recovery is still needed. So the logic
    is adjusted to ignore the TEST_STATEID result for the delegation
    state ID.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • The result of a TEST_STATEID operation can indicate a few different
    things:

    o If NFS_OK is returned, then the client can continue using the
    state ID under test, and skip recovery.

    o RFC 5661 says that if the state ID was revoked, then the client
    must perform an explicit FREE_STATEID before trying to re-open.

    o If the server doesn't recognize the state ID at all, then no
    FREE_STATEID is needed, and the client can immediately continue
    with open recovery.

    Let's err on the side of caution: if the server clearly tells us the
    state ID is unknown, we skip the FREE_STATEID. For any other error,
    we issue a FREE_STATEID. Sometimes that FREE_STATEID will be
    unnecessary, but leaving unused state IDs on the server needlessly
    ties up resources.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • The TEST_STATEID and FREE_STATEID operations can return
    -NFS4ERR_BAD_STATEID, -NFS4ERR_OLD_STATEID, or -NFS4ERR_DEADSESSION.

    nfs41_{test,free}_stateid() should not pass these errors to
    nfs4_handle_exception() during state recovery, since that will
    recursively kick off state recovery again, resulting in a deadlock.

    In particular, when the TEST_STATEID operation returns NFS4_OK,
    res.status can contain one of these errors. _nfs41_test_stateid()
    replaces NFS4_OK with the value in res.status, which is then returned
    to callers.

    But res.status is not passed through nfs4_stat_to_errno(), and thus is
    a positive NFS4ERR value. Currently callers are only interested in
    !NFS4_OK, and nfs4_handle_exception() ignores positive values.

    Thus the res.status values are currently ignored by
    nfs4_handle_exception() and won't cause the deadlock above. Thanks to
    this missing negative, it is only when these operations fail (which
    is very rare) that a deadlock can occur.

    Bryan agrees the original intent was to return res.status as a
    negative NFS4ERR value to callers of nfs41_test_stateid().

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • mark_matching_lsegs_invalid() resets the mds_threshold counters and can
    dereference the layout hdr on an initial empty plh_segs list. It returns 0 both
    in the case of an initial empty list and in a non-emtpy list that was cleared
    by calls to mark_lseg_invalid.

    Don't send a LAYOUTRETURN if the list was initially empty.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • When the file layout driver is fencing a DS, _pnfs_return_layout can be
    called mulitple times per inode due to in-flight i/o referencing lsegs on it's
    plh_segs list.

    Remember that LAYOUTRETURN has been called, and do not call it again.
    Allow LAYOUTRETURNs after a subsequent LAYOUTGET.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson