13 Jan, 2012

1 commit


17 Dec, 2010

3 commits

  • Now that all client-side XDR decoder routines use xdr_streams, there
    should be no need to support the legacy calling sequence [rpc_rqst *,
    __be32 *, RPC res *] anywhere. We can construct an xdr_stream in the
    generic RPC code, instead of in each decoder function.

    This is a refactoring change. It should not cause different behavior.

    Signed-off-by: Chuck Lever
    Tested-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Now that all client-side XDR encoder routines use xdr_streams, there
    should be no need to support the legacy calling sequence [rpc_rqst *,
    __be32 *, RPC arg *] anywhere. We can construct an xdr_stream in the
    generic RPC code, instead of in each encoder function.

    Also, all the client-side encoder functions return 0 now, making a
    return value superfluous. Take this opportunity to convert them to
    return void instead.

    This is a refactoring change. It should not cause different behavior.

    Signed-off-by: Chuck Lever
    Tested-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean up.

    The trend in the other XDR encoder functions is to BUG() when encoding
    problems occur, since a problem here is always due to a local coding
    error. Then, instead of a status, zero is unconditionally returned.

    Update the NSM XDR encoders to behave this way.

    To finish the update, use the new-style be32_to_cpup() and
    cpu_to_be32() macros, and compute the buffer sizes using raw integers
    instead of sizeof(). This matches the conventions used in other XDR
    functions

    Signed-off-by: Chuck Lever
    Tested-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

02 Oct, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

09 Feb, 2010

1 commit

  • When lockd gets a notify downcall from statd, it'll search its hosts
    cache and then clear the sm_monitored bit on the host it finds. The idea
    is apparently to make lockd redo a SM_MON on the next lock request.

    This is unnecessary and causes the kernel's NSM cache to go out of sync
    with statd. statd doesn't stop monitoring a host when it gets a
    SM_NOTIFY and there's no guarantee that another lock will occur after
    the reclaim and before the unmount. In that event, no SM_UNMON will
    occur.

    Signed-off-by: Jeff Layton
    Reviewed-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

21 Aug, 2009

1 commit


10 Aug, 2009

1 commit


18 Jun, 2009

2 commits

  • Cut NSM upcall RPC traffic in half -- don't do a NULL call first.
    The cases where a ping would be helpful are rare.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • When rpc.statd starts up in user space at boot time, it attempts to
    write the latest NSM local state number into
    /proc/sys/fs/nfs/nsm_local_state.

    If lockd.ko isn't loaded yet (as is the case in most configurations),
    that file doesn't exist, thus the kernel's NSM state remains set to
    its initial value of zero during lockd operation.

    This is a problem because rpc.statd and lockd use the NSM state number
    to prevent repeated lock recovery on rebooted hosts. If lockd sends
    a zero NSM state, but then a delayed SM_NOTIFY with a real NSM state
    number is received, there is no way for lockd or rpc.statd to
    distinguish that stale SM_NOTIFY from an actual reboot. Thus lock
    recovery could be performed after the rebooted host has already
    started reclaiming locks, and those locks will be lost.

    We could change /etc/init.d/nfslock so it always modprobes lockd.ko
    before starting rpc.statd. However, if lockd.ko is ever unloaded
    and reloaded, we are back at square one, since the NSM state is not
    preserved across an unload/reload cycle. This may happen frequently
    on clients that use automounter. A period of NFS inactivity causes
    lockd.ko to be unloaded, and the kernel loses its NSM state setting.

    Instead, let's use the fact that rpc.statd plants the local system's
    NSM state in every SM_MON (and SM_UNMON) reply. lockd performs a
    synchronous SM_MON upcall to the local rpc.statd _before_ sending its
    first NLM request to a new remote. This would permit rpc.statd to
    provide the current NSM state to lockd, even after lockd.ko had been
    unloaded and reloaded.

    Note that NLMPROC_LOCK arguments are constructed before the
    nsm_monitor() call, so we have to rearrange argument construction very
    slightly to make this all work out.

    And, the kernel appears to treat NSM state as a u32 (see struct
    nlm_args and nsm_res). Make nsm_local_state a u32 as well, to ensure
    we don't get bogus comparison results.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

02 Apr, 2009

1 commit


07 Jan, 2009

28 commits

  • Clean up: one last thing... relocate nsm_create() to eliminate the forward
    declaration and group it near the only function that actually uses it.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up.

    Treat the nsm_use_hostnames global variable like nsm_local_state.
    Note that the default value of nsm_use_hostnames is still zero.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: nsm_addr_in() is no longer used, and nsm_addr() is used only in
    fs/lockd/mon.c, so move it there.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: The include/linux/lockd/sm_inter.h header is nearly empty
    now. Remove it.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • NLM provides file locking services for NFS files. Part of this service
    includes a second protocol, known as NSM, which is a reboot
    notification service. NLM uses this service to determine when to
    reclaim locks or enter a grace period after a client or server reboots.

    The NLM service (implemented by lockd in the Linux kernel) contacts
    the local NSM service (implemented by rpc.statd in Linux user space)
    via NSM protocol upcalls to register a callback when a particular
    remote peer reboots.

    To match the callback to the correct remote peer, the NLM service
    constructs a cookie that it passes in the request. The NSM service
    passes that cookie back to the NLM service when it is notified that
    the given remote peer has indeed rebooted.

    Currently on Linux, the cookie is the raw 32-bit IPv4 address of the
    remote peer. To support IPv6 addresses, which are larger, we could
    use all 16 bytes of the cookie to represent a full IPv6 address,
    although we still can't represent an IPv6 address with a scope ID in
    just 16 bytes.

    Instead, to avoid the need for future changes to support additional
    address types, we'll use a manufactured value for the cookie, and use
    that to find the corresponding nsm_handle struct in the kernel during
    the NLMPROC_SM_NOTIFY callback.

    This should provide complete support in the kernel's NSM
    implementation for IPv6 hosts, while remaining backwards compatible
    with older rpc.statd implementations.

    Note we also deal with another case where nsm_use_hostnames can change
    while there are outstanding notifications, possibly resulting in the
    loss of reboot notifications. After this patch, the priv cookie is
    always used to lookup rebooted hosts in the kernel.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: refactor nsm_get_handle() so it is organized the same way that
    nsm_reboot_lookup() is.

    There is an additional micro-optimization here. This change moves the
    "hostname & nsm_use_hostnames" test out of the list_for_each_entry()
    clause in nsm_get_handle(), since it is loop-invariant.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up. Refactor the creation of nsm_handles into a helper. Fields
    are initialized in increasing address order to make efficient use of
    CPU caches.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: nsm_find() now has only one caller, and that caller
    unconditionally sets the @create argument. Thus the @create
    argument is no longer needed.

    Since nsm_find() now has a more specific purpose, pick a more
    appropriate name for it.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Introduce a new API to fs/lockd/mon.c that allows nlm_host_rebooted()
    to lookup up nsm_handles via the contents of an nlm_reboot struct.

    The new function is equivalent to calling nsm_find() with @create set
    to zero, but it takes a struct nlm_reboot instead of separate
    arguments.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Pass the new "priv" cookie to NSMPROC_MON's XDR encoder, instead of
    creating the "priv" argument in the encoder at call time.

    This patch should not cause a behavioral change: the contents of the
    cookie remain the same for the time being.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Introduce a new data type, used by both the in-kernel NLM and NSM
    implementations, that is used to manage the opaque "priv" argument
    for the NSMPROC_MON and NLMPROC_SM_NOTIFY calls.

    Construct the "priv" cookie when the nsm_handle is created.

    The nsm_init_private() function may look a little strange, but it is
    roughly equivalent to how the XDR encoder formed the "priv" argument.
    It's going to go away soon.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • The nsm_release() function should never be called with a NULL handle
    point. If it is, that's a bug.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • The nsm_find() function should never be called with a NULL IP address
    pointer. If it is, that's a bug.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Introduce some dprintk() calls in fs/lockd/mon.c that are enabled by
    the NLMDBG_MONITOR flag. These report when we find, create, and
    release nsm_handles.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • The nsm_find() function sets up fresh nsm_handle entries. This is
    where we will store the "priv" cookie used to lookup nsm_handles during
    reboot recovery. The cookie will be constructed when nsm_find()
    creates a new nsm_handle.

    As much as possible, I would like to keep everything that handles a
    "priv" cookie in fs/lockd/mon.c so that all the smarts are in one
    source file. That organization should make it pretty simple to see how
    all this works.

    To me, it makes more sense than the current arrangement to keep
    nsm_find() with nsm_monitor() and nsm_unmonitor().

    So, start reorganizing by moving nsm_find() into fs/lockd/mon.c. The
    nsm_release() function comes along too, since it shares the nsm_lock
    global variable.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Introduce xdr_stream-based XDR encoder and decoder functions, which are
    more careful about preventing RPC buffer overflows.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: Move the RPC program and procedure numbers for NSM into the
    one source file that needs them: fs/lockd/mon.c.

    And, as with NLM, NFS, and rpcbind calls, use NSMPROC_FOO instead of
    SM_FOO for NSM procedure numbers.

    Finally, make a couple of comments more precise: what is referred to
    here as SM_NOTIFY is really the NLM (lockd) NLMPROC_SM_NOTIFY downcall,
    not NSMPROC_NOTIFY.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: NSM's XDR data structures are used only in fs/lockd/mon.c,
    so move them there.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Make sure any error returned by rpc.statd during an SM_UNMON call is
    reported rather than ignored completely. There isn't much to do with
    such an error, but we should log it in any case.

    Similar to a recent change to nsm_monitor().

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up.

    Make the nlm_host argument "const," and move the public declaration to
    lockd.h. Add a documenting comment.

    Bruce observed that nsm_unmonitor()'s only caller doesn't care about
    its return code, so make nsm_unmonitor() return void.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • The nsm_handle's reference count is bumped in nlm_lookup_host(). It
    should be decremented in nlm_destroy_host() to make it easier to see
    the balance of these two operations.

    Move the nsm_release() call to fs/lockd/host.c.

    The h_nsmhandle pointer is set in nlm_lookup_host(), and never cleared.
    The nlm_destroy_host() function is never called for the same nlm_host
    twice, so h_nsmhandle won't ever be NULL when nsm_unmonitor() is
    called.

    All references to the nlm_host are gone before it is freed. We can
    skip making h_nsmhandle NULL just before the nlm_host is deallocated.

    It's also likely we can remove the h_nsmhandle NULL check in
    nlmsvc_is_client() as well, but we can do that later when rearchitect-
    ing the nlm_host cache.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up.

    Make the nlm_host argument "const," and move the public declaration to
    lockd.h with other NSM public function (nsm_release, eg) and global
    variable declarations.

    Add a documenting comment.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • The nsm_monitor() function reports an error and does not set sm_monitored
    if the SM_MON upcall reply has a non-zero result code, but nsm_monitor()
    does not return an error to its caller in this case.

    Since sm_monitored is not set, the upcall is retried when the next NLM
    request invokes nsm_monitor(). However, that may not come for a while.
    In the meantime, at least one NLM request will potentially proceed
    without the peer being monitored properly.

    Have nsm_monitor() return an error if the result code is non-zero.
    This will cause all NLM requests to fail immediately if the upcall
    completed successfully but rpc.statd returned an error.

    This may be inconvenient in some cases (for example if rpc.statd
    cannot complete a proper DNS reverse lookup of the hostname), but will
    make the reboot monitoring service more robust by forcing such issues
    to be corrected by an admin.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: Remove the BUG_ON() invocation in nsm_monitor(). It's not
    likely that nsm_monitor() is ever called with a NULL host pointer, and
    the code will die anyway if host is NULL.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: Use the sm_name field for reporting the hostname in nsm_monitor()
    and nsm_unmonitor(), just as the other functions in fs/lockd/mon.c do.

    The h_name field is just a copy of the sm_name pointer.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • The "mon_name" argument of the NSMPROC_MON and NSMPROC_UNMON upcalls
    is a string that contains the hostname or IP address of the remote peer
    to be notified when this host has rebooted. The sm-notify command uses
    this identifier to contact the peer when we reboot, so it must be
    either a well-qualified DNS hostname or a presentation format IP
    address string.

    When the "nsm_use_hostnames" sysctl is set to zero, the kernel's NSM
    provides a presentation format IP address in the "mon_name" argument.
    Otherwise, the "caller_name" argument from NLM requests is used,
    which is usually just the DNS hostname of the peer.

    To support IPv6 addresses for the mon_name argument, we use the
    nsm_handle's address eye-catcher, which already contains an appropriate
    presentation format address string. Using the eye-catcher string
    obviates the need to use a large buffer on the stack to form the
    presentation address string for the upcall.

    This patch also addresses a subtle bug.

    An NSMPROC_MON request and the subsequent NSMPROC_UNMON request for the
    same peer are required to use the same value for the "mon_name"
    argument. Otherwise, rpc.statd's NSMPROC_UNMON processing cannot
    locate the database entry for that peer and remove it.

    If the setting of nsm_use_hostnames is changed between the time the
    kernel sends an NSMPROC_MON request and the time it sends the
    NSMPROC_UNMON request for the same peer, the "mon_name" argument for
    these two requests may not be the same. This is because the value of
    "mon_name" is currently chosen at the moment the call is made based on
    the setting of nsm_use_hostnames

    To ensure both requests pass identical contents in the "mon_name"
    argument, we now select which string to use for the argument in the
    nsm_monitor() function. A pointer to this string is saved in the
    nsm_handle so it can be used for a subsequent NSMPROC_UNMON upcall.

    NB: There are other potential problems, such as how nlm_host_rebooted()
    might behave if nsm_use_hostnames were changed while hosts are still
    being monitored. This patch does not attempt to address those
    problems.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: make the printk(KERN_DEBUG) in nsm_mon_unmon() a dprintk,
    and add another dprintk to note if creating an RPC client for the
    upcall failed.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever
     
  • Clean up: Use a C99 structure initializer instead of open-coding the
    initialization of nsm_args.

    Signed-off-by: Chuck Lever
    Signed-off-by: J. Bruce Fields

    Chuck Lever