15 Jan, 2012

1 commit

  • * 'for-3.3' of git://linux-nfs.org/~bfields/linux: (31 commits)
    nfsd4: nfsd4_create_clid_dir return value is unused
    NFSD: Change name of extended attribute containing junction
    svcrpc: don't revert to SVC_POOL_DEFAULT on nfsd shutdown
    svcrpc: fix double-free on shutdown of nfsd after changing pool mode
    nfsd4: be forgiving in the absence of the recovery directory
    nfsd4: fix spurious 4.1 post-reboot failures
    NFSD: forget_delegations should use list_for_each_entry_safe
    NFSD: Only reinitilize the recall_lru list under the recall lock
    nfsd4: initialize special stateid's at compile time
    NFSd: use network-namespace-aware cache registering routines
    SUNRPC: create svc_xprt in proper network namespace
    svcrpc: update outdated BKL comment
    nfsd41: allow non-reclaim open-by-fh's in 4.1
    svcrpc: avoid memory-corruption on pool shutdown
    svcrpc: destroy server sockets all at once
    svcrpc: make svc_delete_xprt static
    nfsd: Fix oops when parsing a 0 length export
    nfsd4: Use kmemdup rather than duplicating its implementation
    nfsd4: add a separate (lockowner, inode) lookup
    nfsd4: fix CONFIG_NFSD_FAULT_INJECTION compile error
    ...

    Linus Torvalds
     

06 Jan, 2012

2 commits

  • This was unexpected behavior (at least for me)--why would you want
    configuration settings automatically lost on nfsd restart?

    In practice this won't affect distributions, which likely set everything
    on every startup. But I'd expect the behavior to be less confusing to
    someone manually restarting nfsd for testing.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The pool_to and to_pool fields of the global svc_pool_map are freed on
    shutdown, but are initialized in nfsd startup only in the
    SVC_POOL_PERCPU and SVC_POOL_PERNODE cases.

    They *are* initialized to zero on kernel startup. So as long as you use
    only SVC_POOL_GLOBAL (the default), this will never be a problem.

    You're also OK if you only ever use SVC_POOL_PERCPU or SVC_POOL_PERNODE.

    However, the following sequence events leads to a double-free:

    1. set SVC_POOL_PERCPU or SVC_POOL_PERNODE
    2. start nfsd: both fields are initialized.
    3. shutdown nfsd: both fields are freed.
    4. set SVC_POOL_GLOBAL
    5. start nfsd: the fields are left untouched.
    6. shutdown nfsd: now we try to free them again.

    Step 4 is actually unnecessary, since (for some bizarre reason), nfsd
    automatically resets the pool mode to SVC_POOL_GLOBAL on shutdown.

    Cc: stable@kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

12 Dec, 2011

1 commit


07 Dec, 2011

3 commits

  • Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Socket callbacks use svc_xprt_enqueue() to add an xprt to a
    pool->sp_sockets list. In normal operation a server thread will later
    come along and take the xprt off that list. On shutdown, after all the
    threads have exited, we instead manually walk the sv_tempsocks and
    sv_permsocks lists to find all the xprt's and delete them.

    So the sp_sockets lists don't really matter any more. As a result,
    we've mostly just ignored them and hoped they would go away.

    Which has gotten us into trouble; witness for example ebc63e531cc6
    "svcrpc: fix list-corrupting race on nfsd shutdown", the result of Ben
    Greear noticing that a still-running svc_xprt_enqueue() could re-add an
    xprt to an sp_sockets list just before it was deleted. The fix was to
    remove it from the list at the end of svc_delete_xprt(). But that only
    made corruption less likely--I can see nothing that prevents a
    svc_xprt_enqueue() from adding another xprt to the list at the same
    moment that we're removing this xprt from the list. In fact, despite
    the earlier xpo_detach(), I don't even see what guarantees that
    svc_xprt_enqueue() couldn't still be running on this xprt.

    So, instead, note that svc_xprt_enqueue() essentially does:
    lock sp_lock
    if XPT_BUSY unset
    add to sp_sockets
    unlock sp_lock

    So, if we do:

    set XPT_BUSY on every xprt.
    Empty every sp_sockets list, under the sp_socks locks.

    Then we're left knowing that the sp_sockets lists are all empty and will
    stay that way, since any svc_xprt_enqueue() will check XPT_BUSY under
    the sp_lock and see it set.

    And *then* we can continue deleting the xprt's.

    (Thanks to Jeff Layton for being correctly suspicious of this code....)

    Cc: Ben Greear
    Cc: Jeff Layton
    Cc: stable@kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • There's no reason I can see that we need to call sv_shutdown between
    closing the two lists of sockets.

    Cc: stable@kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

03 Nov, 2011

1 commit


01 Nov, 2011

1 commit

  • Standardize the style for compiler based printf format verification.
    Standardized the location of __printf too.

    Done via script and a little typing.

    $ grep -rPl --include=*.[ch] -w "__attribute__" * | \
    grep -vP "^(tools|scripts|include/linux/compiler-gcc.h)" | \
    xargs perl -n -i -e 'local $/; while (<>) { s/\b__attribute__\s*\(\s*\(\s*format\s*\(\s*printf\s*,\s*(.+)\s*,\s*(.+)\s*\)\s*\)\s*\)/__printf($1, $2)/g ; print; }'

    [akpm@linux-foundation.org: revert arch bits]
    Signed-off-by: Joe Perches
    Cc: "Kirill A. Shutemov"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

25 Oct, 2011

4 commits


20 Aug, 2011

1 commit

  • Use NUMA aware allocations to reduce latencies and increase throughput.

    sunrpc kthreads can use kthread_create_on_node() if pool_mode is
    "percpu" or "pernode", and svc_prepare_thread()/svc_init_buffer() can
    also take into account NUMA node affinity for memory allocations.

    Signed-off-by: Eric Dumazet
    CC: "J. Bruce Fields"
    CC: Neil Brown
    CC: David Miller
    Reviewed-by: Greg Banks
    [bfields@redhat.com: fix up caller nfs41_callback_up]
    Signed-off-by: J. Bruce Fields

    Eric Dumazet
     

15 Jul, 2011

2 commits


28 May, 2011

1 commit

  • As libtirpc does in user space, have our registration API try using an
    AF_LOCAL transport first when registering and unregistering.

    This means we don't chew up privileged ports, and our registration is
    bound to an "owner" (the effective uid of the process on the sending
    end of the transport). Only that "owner" may unregister the service.

    The kernel could probe rpcbind via an rpcbind query to determine
    whether rpcbind has an AF_LOCAL service. For simplicity, we use the
    same technique that libtirpc uses: simply fail over to network
    loopback if creating an AF_LOCAL transport to the well-known rpcbind
    service socket fails.

    This means we open-code the pathname of the rpcbind socket in the
    kernel. For now we have to do that anyway because the kernel's
    RPC over AF_LOCAL implementation does not support autobind. That may
    be undesirable in the long term.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

15 Jan, 2011

1 commit

  • * 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: (62 commits)
    nfsd4: fix callback restarting
    nfsd: break lease on unlink, link, and rename
    nfsd4: break lease on nfsd setattr
    nfsd: don't support msnfs export option
    nfsd4: initialize cb_per_client
    nfsd4: allow restarting callbacks
    nfsd4: simplify nfsd4_cb_prepare
    nfsd4: give out delegations more quickly in 4.1 case
    nfsd4: add helper function to run callbacks
    nfsd4: make sure sequence flags are set after destroy_session
    nfsd4: re-probe callback on connection loss
    nfsd4: set sequence flag when backchannel is down
    nfsd4: keep finer-grained callback status
    rpc: allow xprt_class->setup to return a preexisting xprt
    rpc: keep backchannel xprt as long as server connection
    rpc: move sk_bc_xprt to svc_xprt
    nfsd4: allow backchannel recovery
    nfsd4: support BIND_CONN_TO_SESSION
    nfsd4: modify session list under cl_lock
    Documentation: fl_mylease no longer exists
    ...

    Fix up conflicts in fs/nfsd/vfs.c with the vfs-scale work. The
    vfs-scale work touched some msnfs cases, and this merge removes support
    for that entirely, so the conflict was trivial to resolve.

    Linus Torvalds
     

07 Jan, 2011

3 commits


05 Jan, 2011

1 commit

  • Currently we use -EAGAIN returns to determine when to drop a deferred
    request. On its own, that is error-prone, as it makes us treat -EAGAIN
    returns from other functions specially to prevent inadvertent dropping.

    So, use a flag on the request instead.

    Returning an error on request deferral is still required, to prevent
    further processing, but we no longer need worry that an error return on
    its own could result in a drop.

    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

22 Sep, 2010

1 commit

  • If we drop a request in the sunrpc layer, either due kmalloc failure,
    or due to a cache miss when we could not queue the request for later
    replay, then close the connection to encourage the client to retry sooner.

    Note that if the drop happens in the NFS layer, NFSERR_JUKEBOX
    (aka NFS4ERR_DELAY) is returned to guide the client concerning
    replay.

    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

07 Mar, 2010

1 commit

  • The macro any_online_node() is prone to producing sparse warnings due to
    the local symbol 'node'. Since all the in-tree users are really
    requesting the first online node (the mask argument is either
    NODE_MASK_ALL or node_online_map) just use the first_online_node macro and
    remove the any_online_node macro since there are no users.

    Signed-off-by: H Hartley Sweeten
    Acked-by: David Rientjes
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: Lee Schermerhorn
    Acked-by: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Dave Hansen
    Cc: Milton Miller
    Cc: Nathan Fontenot
    Cc: Geoff Levand
    Cc: Grant Likely
    Cc: J. Bruce Fields
    Cc: Neil Brown
    Cc: Trond Myklebust
    Cc: David S. Miller
    Cc: Benny Halevy
    Cc: Chuck Lever
    Cc: Ricardo Labiaga
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H Hartley Sweeten
     

10 Feb, 2010

1 commit


30 Nov, 2009

1 commit


18 Jun, 2009

5 commits


17 Jun, 2009

1 commit

  • num_online_nodes() is called in a number of places but most often by the
    page allocator when deciding whether the zonelist needs to be filtered
    based on cpusets or the zonelist cache. This is actually a heavy function
    and touches a number of cache lines.

    This patch stores the number of online nodes at boot time and updates the
    value when nodes get onlined and offlined. The value is then used in a
    number of important paths in place of num_online_nodes().

    [rientjes@google.com: do not override definition of node_set_online() with macro]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: Pekka Enberg
    Cc: Peter Zijlstra
    Cc: Nick Piggin
    Cc: Dave Hansen
    Cc: Lee Schermerhorn
    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

07 Apr, 2009

1 commit

  • * 'for-2.6.30' of git://linux-nfs.org/~bfields/linux: (81 commits)
    nfsd41: define nfsd4_set_statp as noop for !CONFIG_NFSD_V4
    nfsd41: define NFSD_DRC_SIZE_SHIFT in set_max_drc
    nfsd41: Documentation/filesystems/nfs41-server.txt
    nfsd41: CREATE_EXCLUSIVE4_1
    nfsd41: SUPPATTR_EXCLCREAT attribute
    nfsd41: support for 3-word long attribute bitmask
    nfsd: dynamically skip encoded fattr bitmap in _nfsd4_verify
    nfsd41: pass writable attrs mask to nfsd4_decode_fattr
    nfsd41: provide support for minor version 1 at rpc level
    nfsd41: control nfsv4.1 svc via /proc/fs/nfsd/versions
    nfsd41: add OPEN4_SHARE_ACCESS_WANT nfs4_stateid bmap
    nfsd41: access_valid
    nfsd41: clientid handling
    nfsd41: check encode size for sessions maxresponse cached
    nfsd41: stateid handling
    nfsd: pass nfsd4_compound_state* to nfs4_preprocess_{state,seq}id_op
    nfsd41: destroy_session operation
    nfsd41: non-page DRC for solo sequence responses
    nfsd41: Add a create session replay cache
    nfsd41: create_session operation
    ...

    Linus Torvalds
     

06 Apr, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask: (36 commits)
    cpumask: remove cpumask allocation from idle_balance, fix
    numa, cpumask: move numa_node_id default implementation to topology.h, fix
    cpumask: remove cpumask allocation from idle_balance
    x86: cpumask: x86 mmio-mod.c use cpumask_var_t for downed_cpus
    x86: cpumask: update 32-bit APM not to mug current->cpus_allowed
    x86: microcode: cleanup
    x86: cpumask: use work_on_cpu in arch/x86/kernel/microcode_core.c
    cpumask: fix CONFIG_CPUMASK_OFFSTACK=y cpu hotunplug crash
    numa, cpumask: move numa_node_id default implementation to topology.h
    cpumask: convert node_to_cpumask_map[] to cpumask_var_t
    cpumask: remove x86 cpumask_t uses.
    cpumask: use cpumask_var_t in uv_flush_tlb_others.
    cpumask: remove cpumask_t assignment from vector_allocation_domain()
    cpumask: make Xen use the new operators.
    cpumask: clean up summit's send_IPI functions
    cpumask: use new cpumask functions throughout x86
    x86: unify cpu_callin_mask/cpu_callout_mask/cpu_initialized_mask/cpu_sibling_setup_mask
    cpumask: convert struct cpuinfo_x86's llc_shared_map to cpumask_var_t
    cpumask: convert node_to_cpumask_map[] to cpumask_var_t
    x86: unify 32 and 64-bit node_to_cpumask_map
    ...

    Linus Torvalds
     

04 Apr, 2009

1 commit

  • On an NFSv4.1 server cache miss that causes an upcall, NFS4ERR_DELAY will be
    returned. It is up to the NFSv4.1 client to resend only the operations that
    have not been processed.

    Initialize rq_usedeferral to 1 in svc_process(). It sill be turned off in
    nfsd4_proc_compound() only when NFSv4.1 Sessions are used.

    Note: this isn't an adequate solution on its own. It's acceptable as a way
    to get some minimal 4.1 up and working, but we're going to have to find a
    way to avoid returning DELAY in all common cases before 4.1 can really be
    considered ready.

    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    [nfsd41: reverse rq_nodeferral negative logic]
    Signed-off-by: Benny Halevy
    [sunrpc: initialize rq_usedeferral]
    Signed-off-by: Andy Adamson
    Signed-off-by: Benny Halevy
    Signed-off-by: J. Bruce Fields

    Andy Adamson
     

02 Apr, 2009

1 commit


31 Mar, 2009

1 commit


30 Mar, 2009

1 commit


29 Mar, 2009

1 commit

  • Move error reporting for RPC registration to rpcb_register's caller.

    This way the caller can choose to recover silently from certain
    errors, but report errors it does not recognize. Error reporting
    for kernel RPC service registration is now handled in one place.

    This patch is part of a series that addresses
    http://bugzilla.kernel.org/show_bug.cgi?id=12256

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever