26 Jan, 2011

1 commit

  • Milan Broz reports:

    > on today Linus' tree I get OOps if using nfs.
    >
    > server (2.6.36) exports dir:
    > /dir 172.16.1.0/24(rw,async,all_squash,no_subtree_check,anonuid=500,anongid=500)
    >
    > on client it is mounted in fstab
    > server:/dir /mnt/tst nfs rw,soft 0 0
    >
    > and these commands OOpses it (simplified from a configure script):
    >
    > cd /dir
    > touch x
    > install x y
    >
    > [ 105.327701] ------------[ cut here ]------------
    > [ 105.327979] kernel BUG at fs/nfs/nfs3xdr.c:1338!
    > [ 105.328075] invalid opcode: 0000 [#1] PREEMPT SMP
    > [ 105.328223] last sysfs file: /sys/devices/virtual/bdi/0:16/uevent
    > [ 105.328349] Modules linked in: usbcore dm_mod
    > [ 105.328553]
    > [ 105.328678] Pid: 3710, comm: install Not tainted 2.6.37+ #423 440BX Desktop Reference Platform/VMware Virtual Platform
    > [ 105.328853] EIP: 0060:[] EFLAGS: 00010282 CPU: 0
    > [ 105.329152] EIP is at nfs3_xdr_enc_setacl3args+0x61/0x98
    > [ 105.329249] EAX: ffffffea EBX: ce941d98 ECX: 00000000 EDX: 00000004
    > [ 105.329340] ESI: ce941cd0 EDI: 000000a4 EBP: ce941cc0 ESP: ce941cb4
    > [ 105.329431] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    > [ 105.329525] Process install (pid: 3710, ti=ce940000 task=ced36f20 task.ti=ce940000)
    > [ 105.336600] Stack:
    > [ 105.336693] ce941cd0 ce9dc000 00000000 ce941cf8 c12ecd02 c12f43e0 c116c00b cf754158
    > [ 105.336982] ce9dc004 cf754284 ce9dc004 cf7ffee8 ceff9978 ce9dc000 cf7ffee8 ce9dc000
    > [ 105.337182] ce9dc000 ce941d14 c12e698d cf75412c ce941d98 cf7ffee8 cf7fff20 00000000
    > [ 105.337405] Call Trace:
    > [ 105.337695] [] rpcauth_wrap_req+0x75/0x7f
    > [ 105.337806] [] ? xdr_encode_opaque+0x12/0x15
    > [ 105.337898] [] ? nfs3_xdr_enc_setacl3args+0x0/0x98
    > [ 105.337988] [] call_transmit+0x17e/0x1e8
    > [ 105.338072] [] __rpc_execute+0x6d/0x1a6
    > [ 105.338155] [] rpc_execute+0x34/0x37
    > [ 105.338235] [] rpc_run_task+0xb5/0xbd
    > [ 105.338316] [] rpc_call_sync+0x3d/0x58
    > [ 105.338402] [] nfs3_proc_setacls+0x18e/0x24f
    > [ 105.338493] [] ? __kmalloc+0x148/0x1c4
    > [ 105.338579] [] ? posix_acl_alloc+0x12/0x22
    > [ 105.338665] [] nfs3_proc_setacl+0xa0/0xca
    > [ 105.338748] [] nfs3_setxattr+0x62/0x88
    > [ 105.338834] [] ? sub_preempt_count+0x7c/0x89
    > [ 105.338926] [] ? nfs3_setxattr+0x0/0x88
    > [ 105.339026] [] __vfs_setxattr_noperm+0x26/0x95
    > [ 105.339114] [] vfs_setxattr+0x5b/0x76
    > [ 105.339211] [] setxattr+0x9d/0xc3
    > [ 105.339298] [] ? handle_pte_fault+0x258/0x5cb
    > [ 105.339428] [] ? __free_pages+0x1a/0x23
    > [ 105.339517] [] ? up_read+0x16/0x2c
    > [ 105.339599] [] ? fget+0x0/0xa3
    > [ 105.339677] [] ? fget+0x0/0xa3
    > [ 105.339760] [] ? get_parent_ip+0xb/0x31
    > [ 105.339843] [] ? sub_preempt_count+0x7c/0x89
    > [ 105.339931] [] sys_fsetxattr+0x51/0x79
    > [ 105.340014] [] sysenter_do_call+0x12/0x32
    > [ 105.340133] Code: 2e 76 18 00 58 31 d2 8b 7f 28 f6 43 04 01 74 03 8b 53 08 6a 00 8b 46 04 6a 01 8b 0b 52 89 fa e8 85 10 f8 ff 83 c4 0c 85 c0 79 04 0b eb fe 31 c9 f6 43 04 04 74 03 8b 4b 0c 68 00 10 00 00 8d
    > [ 105.350321] EIP: [] nfs3_xdr_enc_setacl3args+0x61/0x98 SS:ESP 0068:ce941cb4
    > [ 105.364385] ---[ end trace 01fcfe7f0f7f6e4a ]---

    nfs3_xdr_enc_setacl3args() is not properly setting up the target
    buffer before nfsacl_encode() attempts to encode the ACL.

    Introduced by commit d9c407b1 "NFS: Introduce new-style XDR encoding
    functions for NFSv3."

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

11 Jan, 2011

2 commits

  • Conflicts:
    fs/nfs/nfs2xdr.c
    fs/nfs/nfs3xdr.c
    fs/nfs/nfs4xdr.c

    Trond Myklebust
     
  • vm_map_ram() is not available on NOMMU platforms, and causes trouble
    on incoherrent architectures such as ARM when we access the page data
    through both the direct and the virtual mapping.

    The alternative is to use the direct mapping to access page data
    for the case when we are not crossing a page boundary, but to copy
    the data into a linear scratch buffer when we are accessing data
    that spans page boundaries.

    Signed-off-by: Trond Myklebust
    Tested-by: Marc Kleine-Budde
    Cc: stable@kernel.org [2.6.37]

    Trond Myklebust
     

17 Dec, 2010

11 commits

  • Now that all client-side XDR decoder routines use xdr_streams, there
    should be no need to support the legacy calling sequence [rpc_rqst *,
    __be32 *, RPC res *] anywhere. We can construct an xdr_stream in the
    generic RPC code, instead of in each decoder function.

    This is a refactoring change. It should not cause different behavior.

    Signed-off-by: Chuck Lever
    Tested-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Now that all client-side XDR encoder routines use xdr_streams, there
    should be no need to support the legacy calling sequence [rpc_rqst *,
    __be32 *, RPC arg *] anywhere. We can construct an xdr_stream in the
    generic RPC code, instead of in each encoder function.

    Also, all the client-side encoder functions return 0 now, making a
    return value superfluous. Take this opportunity to convert them to
    return void instead.

    This is a refactoring change. It should not cause different behavior.

    Signed-off-by: Chuck Lever
    Tested-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean up.

    The pointer returned by ->decode_dirent() is no longer used as a
    pointer. The only call site (xdr_decode() in fs/nfs/dir.c) simply
    extracts the errno value encoded in the pointer. Replace the
    returned pointer with a standard integer errno return value.

    Also, pass the "server" argument as part of the nfs_entry instead of
    as a separate parameter. It's faster to derive "server" in
    nfs_readdir_xdr_to_array() since we already have the directory's inode
    handy. "server" ought to be invariant for a set of entries in the
    same directory, right?

    The legacy versions of decode_dirent() don't use "server" anyway, so
    it's wasted work for them to derive and pass "server" for each entry.

    Signed-off-by: Chuck Lever
    Tested-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean up.

    Move the timestamp decoder to match the placement and naming
    conventions of the other helpers. Fold xdr_decode_fattr() into
    decode_fattr3(), which is now it's only user. Fold
    xdr_decode_wcc_attr() into decode_wcc_attr(), which is now it's only
    user.

    Signed-off-by: Chuck Lever
    Tested-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean up. Remove unused legacy result decoder functions, and any
    now unused decoder helper functions.

    Signed-off-by: Chuck Lever
    Tested-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • The naming scheme of the new decoder functions, which follows the
    NFSv4 XDR decoder functions, is slightly different than the scheme
    used for the old functions. Rename the functions as a separate
    step to keep the patches clean.

    Signed-off-by: Chuck Lever
    Tested-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • We'd like to prevent local buffer overflows caused by malicious or
    broken servers. New xdr_stream style decoders can do that.

    For efficiency, we also eventually want to be able to pass xdr_streams
    from call_decode() to all XDR decoding functions, rather than building
    an xdr_stream in every XDR decoding function in the kernel.

    Static helper functions are left without the "inline" directive. This
    allows the compiler to choose automatically how to optimize these for
    size or speed.

    Signed-off-by: Chuck Lever
    Tested-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean up. Move the timestamp and the sattr encoder to match the
    placement convention of the other helpers, update their coding style,
    and refresh their documenting comments.

    Signed-off-by: Chuck Lever
    Tested-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean up. Remove unused legacy argument encoder functions, and any
    now unused encoder helper functions.

    Signed-off-by: Chuck Lever
    Tested-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • The naming scheme of the new encoder functions, which follows the
    NFSv4 XDR encoder functions, is slightly different than the scheme
    used for the old functions. Rename the functions as a separate
    step to keep the patches clean.

    Signed-off-by: Chuck Lever
    Tested-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • We're interested in taking advantage of the safety benefits of
    xdr_streams. These data structures allow more careful checking for
    buffer overflow while encoding. More careful type checking is also
    introduced in the new functions.

    For efficiency, we also eventually want to be able to pass xdr_streams
    from call_encode() to all XDR encoding functions, rather than building
    an xdr_stream in every XDR encoding function in the kernel. To do
    this means all encoders must be ready to handle a passed-in
    xdr_stream.

    The new encoders follow the modern paradigm for XDR encoders: BUG on
    error, and always return a zero status code.

    Static helper functions are left without the "inline" directive. This
    allows the compiler to choose automatically how to optimize these for
    size or speed.

    Signed-off-by: Chuck Lever
    Tested-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

23 Nov, 2010

2 commits


16 Nov, 2010

1 commit


25 Oct, 2010

1 commit

  • Instead of blindly zapping the caches, attempt to revalidate them if
    the server has indicated that it uses high resolution timestamps.

    NFSv4 should be able to always revalidate the cache since the
    protocol requires the update of the change attribute on modification of
    the data. In reality, there are servers (the Linux NFS server
    for example) that do not obey this requirement and use ctime as the
    basis for change attribute. Long term, the server needs to be fixed.
    At this time, and to be on the safe side, continue zapping caches if
    the server indicates that it does not have a high resolution timestamp.

    Signed-off-by: Ricardo Labiaga
    Signed-off-by: Trond Myklebust

    Ricardo Labiaga
     

24 Oct, 2010

3 commits

  • By requsting more attributes during a readdir, we can mimic the readdir plus
    operation that was in NFSv3.

    To test, I ran the command `ls -lU --color=none` on directories with various
    numbers of files. Without readdir plus, I see this:

    n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000
    --------+-----------+-----------+-----------+-----------+----------
    real | 0m00.153s | 0m00.589s | 0m05.601s | 0m56.691s | 9m59.128s
    user | 0m00.007s | 0m00.007s | 0m00.077s | 0m00.703s | 0m06.800s
    sys | 0m00.010s | 0m00.070s | 0m00.633s | 0m06.423s | 1m10.005s
    access | 3 | 1 | 1 | 4 | 31
    getattr | 2 | 1 | 1 | 1 | 1
    lookup | 104 | 1,003 | 10,003 | 100,003 | 1,000,003
    readdir | 2 | 16 | 158 | 1,575 | 15,749
    total | 111 | 1,021 | 10,163 | 101,583 | 1,015,784

    With readdir plus enabled, I see this:

    n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000
    --------+-----------+-----------+-----------+-----------+----------
    real | 0m00.115s | 0m00.206s | 0m01.079s | 0m12.521s | 2m07.528s
    user | 0m00.003s | 0m00.003s | 0m00.040s | 0m00.290s | 0m03.296s
    sys | 0m00.007s | 0m00.020s | 0m00.120s | 0m01.357s | 0m17.556s
    access | 3 | 1 | 1 | 1 | 7
    getattr | 2 | 1 | 1 | 1 | 1
    lookup | 4 | 3 | 3 | 3 | 3
    readdir | 6 | 62 | 630 | 6,300 | 62,993
    total | 15 | 67 | 635 | 6,305 | 63,004

    Readdir plus disabled has about a 16x increase in the number of rpc calls and
    is 4 - 5 times slower on large directories.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • Remove the page size checking code for a readdir decode. This is now done
    by decode_dirent with xdr_streams.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • Convert nfs*xdr.c to use an xdr stream in decode_dirent. This will prevent a
    kernel oops that has been occuring when reading a vmapped page.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     

22 Sep, 2010

1 commit


18 Sep, 2010

2 commits


04 Aug, 2010

1 commit


15 May, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

24 Sep, 2009

1 commit

  • * remove asm/atomic.h inclusion from linux/utsname.h --
    not needed after kref conversion
    * remove linux/utsname.h inclusion from files which do not need it

    NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
    due to some personality stuff it _is_ needed -- cowardly leave ELF-related
    headers and files alone.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

21 Apr, 2009

1 commit

  • Commit ae46141ff08f1965b17c531b571953c39ce8b9e2 (NFSv3: Fix posix ACL code)
    introduces a bug in the calculation of the XDR header iovec. In the case
    where we are inlining the acls, we need to adjust the length of the iovec
    req->rq_svec, in addition to adjusting the total buffer length.

    Tested-by: Leonardo Chiquitto
    Tested-by: Suresh Jayaraman
    Signed-off-by: Trond Myklebust
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     

12 Mar, 2009

2 commits


11 Mar, 2009

1 commit

  • Fix a memory leak due to allocation in the XDR layer. In cases where the
    RPC call needs to be retransmitted, we end up allocating new pages without
    clearing the old ones. Fix this by moving the allocation into
    nfs3_proc_setacls().

    Also fix an issue discovered by Kevin Rudd, whereby the amount of memory
    reserved for the acls in the xdr_buf->head was miscalculated, and causing
    corruption.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

20 Apr, 2008

1 commit


20 Mar, 2008

1 commit

  • Currently, the NFS readdir decoders have a workaround for buggy servers
    that send an empty readdir response with the EOF bit unset. If the
    server sends a malformed response in some cases, this workaround kicks
    in and just returns an empty response rather than returning a proper
    error to the caller.

    This patch does 3 things:

    1) have malformed responses with no entries return error (-EIO)

    2) preserve existing workaround for servers that send empty
    responses with the EOF marker unset.

    3) Add some comments to clarify the logic in nfs3_xdr_readdirres().

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     

30 Jan, 2008

1 commit


10 Oct, 2007

2 commits


20 Jul, 2007

1 commit


11 Jul, 2007

1 commit

  • We should almost always be deferencing the rpc_auth struct by means of the
    credential's cr_auth field instead of the rpc_clnt->cl_auth anyway. Fix up
    that historical mistake, and remove the macro that propagated it.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

01 May, 2007

1 commit

  • The RPC buffer size estimation logic in net/sunrpc/clnt.c always
    significantly overestimates the requirements for the buffer size.
    A little instrumentation demonstrated that in fact rpc_malloc was never
    allocating the buffer from the mempool, but almost always called kmalloc.

    To compute the size of the RPC buffer more precisely, split p_bufsiz into
    two fields; one for the argument size, and one for the result size.

    Then, compute the sum of the exact call and reply header sizes, and split
    the RPC buffer precisely between the two. That should keep almost all RPC
    buffers within the 2KiB buffer mempool limit.

    And, we can finally be rid of RPC_SLACK_SPACE!

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever