01 Feb, 2011

2 commits

  • * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
    NFS: NFSv4 readdir loses entries
    NFS: Micro-optimize nfs4_decode_dirent()
    NFS: Fix an NFS client lockdep issue
    NFS construct consistent co_ownerid for v4.1
    NFS: nfs_wcc_update_inode() should set nfsi->attr_gencount
    NFS improve pnfs_put_deviceid_cache debug print
    NFS fix cb_sequence error processing
    NFS do not find client in NFSv4 pg_authenticate
    NLM: Fix "kernel BUG at fs/lockd/host.c:417!" or ".../host.c:283!"
    NFS: Prevent memory allocation failure in nfsacl_encode()
    NFS: nfsacl_{encode,decode} should return signed integer
    NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!"
    NFS: Fix "kernel BUG at fs/aio.c:554!"
    NFS4: Avoid potential NULL pointer dereference in decode_and_add_ds().
    NFS: fix handling of malloc failure during nfs_flush_multi()

    Linus Torvalds
     
  • * 'for-linus' of git://oss.sgi.com/xfs/xfs:
    xfs: xfs_bmap_add_extent_delay_real should init br_startblock
    xfs: fix dquot shaker deadlock
    xfs: handle CIl transaction commit failures correctly
    xfs: limit extsize to size of AGs and/or MAXEXTLEN
    xfs: prevent extsize alignment from exceeding maximum extent size
    xfs: limit extent length for allocation to AG size
    xfs: speculative delayed allocation uses rounddown_power_of_2 badly
    xfs: fix efi item leak on forced shutdown
    xfs: fix log ticket leak on forced shutdown.

    Linus Torvalds
     

31 Jan, 2011

2 commits

  • In ntfs_mft_record_alloc() when mapping the new extent mft record with
    map_extent_mft_record() we overwrite @m with the return value and on
    error, we then try to use the old @m but that is no longer there as @m
    now contains an error code instead so we crash when dereferencing the
    error code as if it were a pointer.

    The simple fix is to use a temporary variable to store the return value
    thus preserving the original @m for later use. This is a backport from
    the commercial Tuxera-NTFS driver and is well tested...

    Thanks go to Julia Lawall for pointing this out (whilst I had fixed it
    in the commercial driver I had failed to fix it in the Linux kernel).

    Signed-off-by: Anton Altaparmakov
    Signed-off-by: Linus Torvalds

    Anton Altaparmakov
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
    cifs: More crypto cleanup (try #2)
    CIFS: Add strictcache mount option
    CIFS: Implement cifs_strict_writev (try #4)
    [CIFS] Replace cifs md5 hashing functions with kernel crypto APIs

    Linus Torvalds
     

29 Jan, 2011

3 commits

  • On recent 2.6.38-rc kernels, connectathon basic test 6 fails on
    NFSv4 mounts of OpenSolaris with something like:

    > ./test6: readdir
    > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.12' dir entry, pass 0
    > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.82' dir entry, pass 0
    > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.164' dir entry, pass 0
    > ./test6: (/mnt/klimt/matisse.test) Test failed with 3 errors
    > basic tests failed
    > Tests failed, leaving /mnt/klimt mounted
    > [cel@matisse cthon04]$

    I narrowed the problem down to nfs4_decode_dirent() reporting that the
    decode buffer had overflowed while decoding the entries for those
    missing files.

    verify_attr_len() assumes both it's pointer arguments reside on the
    same page. When these arguments point to locations on two different
    pages, verify_attr_len() can report false errors. This can happen now
    that a large NFSv4 readdir result can span pages.

    We have reasonably good checking in nfs4_decode_dirent() anyway, so
    it should be safe to simply remove the extra checking.

    At a guess, this was introduced by commit 6650239a, "NFS: Don't use
    vm_map_ram() in readdir".

    Cc: stable@kernel.org [2.6.37]
    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Make the decoding of NFSv4 directory entries slightly more efficient
    by:

    1. Avoiding unnecessary byte swapping when checking XDR booleans,
    and

    2. Not bumping "p" when its value will be immediately replaced by
    xdr_inline_decode()

    This commit makes nfs4_decode_dirent() consistent with similar logic
    in the other two decode_dirent() functions.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • There is no reason to be freeing the delegation cred in the rcu callback,
    and doing so is resulting in a lockdep complaint that rpc_credcache_lock
    is being called from both softirq and non-softirq contexts.

    Reported-by: Chuck Lever
    Signed-off-by: Trond Myklebust
    Cc: stable@kernel.org

    Trond Myklebust
     

28 Jan, 2011

10 commits

  • When filling in the middle of a previous delayed allocation in
    xfs_bmap_add_extent_delay_real, set br_startblock of the new delay
    extent to the right to nullstartblock instead of 0 before inserting
    the extent into the ifork (xfs_iext_insert), rather than setting
    br_startblock afterward.

    Adding the extent into the ifork with br_startblock=0 can lead to
    the extent being copied into the btree by xfs_bmap_extent_to_btree
    if we happen to convert from extents format to btree format before
    updating br_startblock with the correct value. The unexpected
    addition of this delay extent to the btree can cause subsequent
    XFS_WANT_CORRUPTED_GOTO filesystem shutdown in several
    xfs_bmap_add_extent_delay_real cases where we are converting a delay
    extent to real and unexpectedly find an extent already inserted.
    For example:

    911 case BMAP_LEFT_FILLING:
    912 /*
    913 * Filling in the first part of a previous delayed allocation.
    914 * The left neighbor is not contiguous.
    915 */
    916 trace_xfs_bmap_pre_update(ip, idx, state, _THIS_IP_);
    917 xfs_bmbt_set_startoff(ep, new_endoff);
    918 temp = PREV.br_blockcount - new->br_blockcount;
    919 xfs_bmbt_set_blockcount(ep, temp);
    920 xfs_iext_insert(ip, idx, 1, new, state);
    921 ip->i_df.if_lastex = idx;
    922 ip->i_d.di_nextents++;
    923 if (cur == NULL)
    924 rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
    925 else {
    926 rval = XFS_ILOG_CORE;
    927 if ((error = xfs_bmbt_lookup_eq(cur, new->br_startoff,
    928 new->br_startblock, new->br_blockcount,
    929 &i)))
    930 goto done;
    931 XFS_WANT_CORRUPTED_GOTO(i == 0, done);

    With the bogus extent in the btree we shutdown the filesystem at
    931. The conversion from extents to btree format happens when the
    number of extents in the inode increases above ip->i_df.if_ext_max.
    xfs_bmap_extent_to_btree copies extents from the ifork into the
    btree, ignoring all delalloc extents which are denoted by
    br_startblock having some value of nullstartblock.

    SGI-PV: 1013221

    Signed-off-by: Ben Myers
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    bpm@sgi.com
     
  • Commit 368e136 ("xfs: remove duplicate code from dquot reclaim") fails
    to unlock the dquot freelist when the number of loop restarts is
    exceeded in xfs_qm_dqreclaim_one(). This causes hangs in memory
    reclaim.

    Rework the loop control logic into an unwind stack that all the
    different cases jump into. This means there is only one set of code
    that processes the loop exit criteria, and simplifies the unlocking
    of all the items from different points in the loop. It also fixes a
    double increment of the restart counter from the qi_dqlist_lock
    case.

    Reported-by: Malcolm Scott
    Signed-off-by: Dave Chinner
    Reviewed-by: Alex Elder

    Dave Chinner
     
  • Failure to commit a transaction into the CIL is not handled
    correctly. This currently can only happen when racing with a
    shutdown and requires an explicit shutdown check, so it rare and can
    be avoided. Remove the shutdown check and make the CIL commit a void
    function to indicate it will always succeed, thereby removing the
    incorrectly handled failure case.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Alex Elder

    Dave Chinner
     
  • The extent size hint can be set to larger than an AG. This means
    that the alignment process can push the range to be allocated
    outside the bounds of the AG, resulting in assert failures or
    corrupted bmbt records. Similarly, if the extsize is larger than the
    maximum extent size supported, the alignment process will produce
    extents that are too large to fit into the bmbt records, resulting
    in a different type of assert/corruption failure.

    Fix this by limiting extsize at the time іt is set firstly to be
    less than MAXEXTLEN, then to be a maximum of half the size of the
    AGs in the filesystem for non-realtime inodes. Realtime inodes do
    not allocate out of AGs, so don't have to be restricted by the size
    of AGs.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Alex Elder

    Dave Chinner
     
  • When doing delayed allocation, if the allocation size is for a
    maximally sized extent, extent size alignment can push it over this
    limit. This results in an assert failure in xfs_bmbt_set_allf() as
    the extent length is too large to find in the extent record.

    Fix this by ensuring that we allow for space that extent size
    alignment requires (up to 2 * (extsize -1) blocks as we have to
    handle both head and tail alignment) when limiting the maximum size
    of the extent.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Alex Elder

    Dave Chinner
     
  • Delayed allocation extents can be larger than AGs, so when trying to
    convert a large range we may scan every AG inside
    xfs_bmap_alloc_nullfb() trying to find an AG with a size larger than
    an AG. We should stop when we find the first AG with a maximum
    possible allocation size. This causes excessive CPU usage when there
    are lots of AGs.

    The same problem occurs when doing preallocation of a range larger
    than an AG.

    Fix the problem by limiting real allocation lengths to the maximum
    that an AG can support. This means if we have empty AGs, we'll stop
    the search at the first of them. If there are no empty AGs, we'll
    still scan them all, but that is a different problem....

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Alex Elder

    Dave Chinner
     
  • rounddown_power_of_2() returns an undefined result when passed a
    value of zero. The specualtive delayed allocation code is doing this
    when the inode is zero length. Hence occasionally the preallocation
    is much, much larger than is necessary (e.g. 8GB for a 270 _byte_
    file). Ensure we don't even pass a zero value to this function so
    the result of preallocation is always the desired size.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Alex Elder

    Dave Chinner
     
  • After test 139, kmemleak shows:

    unreferenced object 0xffff880078b405d8 (size 400):
    comm "xfs_io", pid 4904, jiffies 4294909383 (age 1186.728s)
    hex dump (first 32 bytes):
    60 c1 17 79 00 88 ff ff 60 c1 17 79 00 88 ff ff `..y....`..y....
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    [] kmemleak_alloc+0x2d/0x60
    [] kmem_cache_alloc+0x13f/0x2b0
    [] kmem_zone_alloc+0x77/0xf0
    [] kmem_zone_zalloc+0x1e/0x50
    [] xfs_efi_init+0x4b/0xb0
    [] xfs_trans_get_efi+0x58/0x90
    [] xfs_bmap_finish+0x8b/0x1d0
    [] xfs_itruncate_finish+0x2c4/0x5d0
    [] xfs_setattr+0x8df/0xa70
    [] xfs_vn_setattr+0x1b/0x20
    [] notify_change+0x170/0x2e0
    [] do_truncate+0x66/0xa0
    [] sys_ftruncate+0xdb/0xe0
    [] system_call_fastpath+0x16/0x1b
    [] 0xffffffffffffffff

    The cause of the leak is that the "remove" parameter of IOP_UNPIN()
    is never set when a CIL push is aborted. This means that the EFI
    item is never freed if it was in the push being cancelled. The
    problem is specific to delayed logging, but has uncovered a couple
    of problems with the handling of IOP_UNPIN(remove).

    Firstly, we cannot safely call xfs_trans_del_item() from IOP_UNPIN()
    in the CIL commit failure path or the iclog write failure path
    because for delayed loging we have no transaction context. Hence we
    must only call xfs_trans_del_item() if the log item being unpinned
    has an active log item descriptor.

    Secondly, xfs_trans_uncommit() does not handle log item descriptor
    freeing during the traversal of log items on a transaction. It can
    reference a freed log item descriptor when unpinning an EFI item.
    Hence it needs to use a safe list traversal method to allow items to
    be removed from the transaction during IOP_UNPIN().

    Signed-off-by: Dave Chinner
    Reviewed-by: Alex Elder

    Dave Chinner
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    ceph: avoid picking MDS that is not active
    ceph: avoid immediate cap check after import
    ceph: fix flushing of caps vs cap import
    ceph: fix erroneous cap flush to non-auth mds
    ceph: fix cap_wanted_delay_{min,max} mount option initialization
    ceph: fix xattr rbtree search
    ceph: fix getattr on directory when using norbytes

    Linus Torvalds
     
  • Replaced md4 hashing function local to cifs module with kernel crypto APIs.
    As a result, md4 hashing function and its supporting functions in
    file md4.c are not needed anymore.

    Cleaned up function declarations, removed forward function declarations,
    and removed a header file that is being deleted from being included.

    Verified that sec=ntlm/i, sec=ntlmv2/i, and sec=ntlmssp/i work correctly.

    Signed-off-by: Shirish Pargaonkar
    Reviewed-by: Jeff Layton
    Signed-off-by: Steve French

    Shirish Pargaonkar
     

27 Jan, 2011

1 commit

  • The kmemleak detector shows this after test 139:

    unreferenced object 0xffff880079b88bb0 (size 264):
    comm "xfs_io", pid 4904, jiffies 4294909382 (age 276.824s)
    hex dump (first 32 bytes):
    00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N..........
    ff ff ff ff ff ff ff ff 48 7b c9 82 ff ff ff ff ........H{......
    backtrace:
    [] kmemleak_alloc+0x2d/0x60
    [] kmem_cache_alloc+0x13f/0x2b0
    [] kmem_zone_alloc+0x77/0xf0
    [] kmem_zone_zalloc+0x1e/0x50
    [] xlog_ticket_alloc+0x34/0x170
    [] xlog_cil_push+0xa4/0x3f0
    [] xlog_cil_force_lsn+0x15a/0x160
    [] _xfs_log_force_lsn+0x75/0x2d0
    [] _xfs_trans_commit+0x2bd/0x2f0
    [] xfs_iomap_write_allocate+0x1ad/0x350
    [] xfs_map_blocks+0x21f/0x370
    [] xfs_vm_writepage+0x1c7/0x550
    [] __writepage+0x1a/0x50
    [] write_cache_pages+0x1c2/0x4c0
    [] generic_writepages+0x27/0x30
    [] xfs_vm_writepages+0x5d/0x80

    By inspection, the leak occurs when xlog_write() returns and error
    and we jump to the abort path without dropping the reference on the
    active ticket.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Alex Elder

    Dave Chinner
     

26 Jan, 2011

18 commits

  • As stated in section 2.4 of RFC 5661, subsequent instances of the client need
    to present the same co_ownerid. Concatinate the client's IP dot address,
    host name, and the rpc_auth pseudoflavor to form the co_ownerid.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • The -rt patches change the console_semaphore to console_mutex. As a
    result, a quite large chunk of the patches changes all
    acquire/release_console_sem() to acquire/release_console_mutex()

    This commit makes things use more neutral function names which dont make
    implications about the underlying lock.

    The only real change is the return value of console_trylock which is
    inverted from try_acquire_console_sem()

    This patch also paves the way to switching console_sem from a semaphore to
    a mutex.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: make console_trylock return 1 on success, per Geert]
    Signed-off-by: Torben Hohn
    Cc: Thomas Gleixner
    Cc: Greg KH
    Cc: Ingo Molnar
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Torben Hohn
     
  • Fix potential use of uninitialised variable caused by recent
    decompressor code optimisations.

    In zlib_uncompress (zlib_wrapper.c) we have

    int zlib_err, zlib_init = 0;
    ...
    do {
    ...
    if (avail == 0) {
    offset = 0;
    put_bh(bh[k++]);
    continue;
    }
    ...
    zlib_err = zlib_inflate(stream, Z_SYNC_FLUSH);
    ...
    } while (zlib_err == Z_OK);

    If continue is executed (avail == 0) then the while condition will be
    evaluated testing zlib_err, which is uninitialised first time around the
    loop.

    Fix this by getting rid of the 'if (avail == 0)' condition test, this
    edge condition should not be being handled in the decompressor code, and
    instead handle it generically in the caller code.

    Similarly for xz_wrapper.c.

    Incidentally, on most architectures (bar Mips and Parisc), no
    uninitialised variable warning is generated by gcc, this is because the
    while condition test on continue is optimised out and not performed
    (when executing continue zlib_err has not been changed since entering
    the loop, and logically if the while condition was true previously, then
    it's still true).

    Signed-off-by: Phillip Lougher
    Reported-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Phillip Lougher
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
    nilfs2: fix crash after one superblock became unavailable

    Linus Torvalds
     
  • If the call to nfs_wcc_update_inode() results in an attribute update, we
    need to ensure that the inode's attr_gencount gets bumped too, otherwise
    we are not protected against races with other GETATTR calls.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • What we really want to know is the ref count.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Always assign the cb_process_state nfs_client pointer so a processing error
    in cb_sequence after the nfs_client is found and referenced returns
    a non-NULL cb_process_state nfs_client and the matching nfs_put_client in
    nfs4_callback_compound dereferences the client.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • The information required to find the nfs_client cooresponding to the incoming
    back channel request is contained in the NFS layer. Perform minimal checking
    in the RPC layer pg_authenticate method, and push more detailed checking into
    the NFS layer where the nfs_client can be found.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Nick Bowler reports:

    > We were just having some NFS server troubles, and my client machine
    > running 2.6.38-rc1+ (specifically, commit 2b1caf6ed7b888c95) crashed
    > hard (syslog output appended to this mail).
    >
    > I'm not sure what the exact timeline was or how to reproduce this,
    > but the server was rebooted during all this. Since I've never seen
    > this happen before, it is possibly a regression from previous kernel
    > releases. However, I recently updated my nfs-utils (on the client) to
    > version 1.2.3, so that might be related as well.

    [ BUG output redacted ]

    When done searching, the for_each_host loop in next_host_state() falls
    through and returns the final host on the host chain without bumping
    it's reference count.

    Since the host's ref count is only one at that point, releasing the
    host in nlm_host_rebooted() attempts to destroy the host prematurely,
    and therefore hits a BUG().

    Likely, the original intent of the for_each_host behavior in
    next_host_state() was to handle the case when the host chain is empty.
    Searching the chain and finding no suitable host to return needs to be
    handled as well.

    Defensively restructure next_host_state() always to return NULL when
    the loop falls through.

    Introduced by commit b10e30f6 "lockd: reorganize nlm_host_rebooted".

    Cc: J. Bruce Fields
    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • nfsacl_encode() allocates memory in certain cases. This of course
    is not guaranteed to work.

    Since commit 9f06c719 "SUNRPC: New xdr_streams XDR encoder API", the
    kernel's XDR encoders can't return a result indicating possibly a
    failure, so a memory allocation failure in nfsacl_encode() has become
    fatal (ie, the XDR code Oopses) in some cases.

    However, the allocated memory is a tiny fixed amount, on the order
    of 40-50 bytes. We can easily use a stack-allocated buffer for
    this, with only a wee bit of nose-holding.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Clean up.

    The nfsacl_encode() and nfsacl_decode() functions return negative
    errno values, and each call site verifies that the returned value
    is not negative. Change the synopsis of both of these functions
    to reflect this usage.

    Document the synopsis and return values.

    Reported-by: Trond Myklebust
    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Milan Broz reports:

    > on today Linus' tree I get OOps if using nfs.
    >
    > server (2.6.36) exports dir:
    > /dir 172.16.1.0/24(rw,async,all_squash,no_subtree_check,anonuid=500,anongid=500)
    >
    > on client it is mounted in fstab
    > server:/dir /mnt/tst nfs rw,soft 0 0
    >
    > and these commands OOpses it (simplified from a configure script):
    >
    > cd /dir
    > touch x
    > install x y
    >
    > [ 105.327701] ------------[ cut here ]------------
    > [ 105.327979] kernel BUG at fs/nfs/nfs3xdr.c:1338!
    > [ 105.328075] invalid opcode: 0000 [#1] PREEMPT SMP
    > [ 105.328223] last sysfs file: /sys/devices/virtual/bdi/0:16/uevent
    > [ 105.328349] Modules linked in: usbcore dm_mod
    > [ 105.328553]
    > [ 105.328678] Pid: 3710, comm: install Not tainted 2.6.37+ #423 440BX Desktop Reference Platform/VMware Virtual Platform
    > [ 105.328853] EIP: 0060:[] EFLAGS: 00010282 CPU: 0
    > [ 105.329152] EIP is at nfs3_xdr_enc_setacl3args+0x61/0x98
    > [ 105.329249] EAX: ffffffea EBX: ce941d98 ECX: 00000000 EDX: 00000004
    > [ 105.329340] ESI: ce941cd0 EDI: 000000a4 EBP: ce941cc0 ESP: ce941cb4
    > [ 105.329431] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    > [ 105.329525] Process install (pid: 3710, ti=ce940000 task=ced36f20 task.ti=ce940000)
    > [ 105.336600] Stack:
    > [ 105.336693] ce941cd0 ce9dc000 00000000 ce941cf8 c12ecd02 c12f43e0 c116c00b cf754158
    > [ 105.336982] ce9dc004 cf754284 ce9dc004 cf7ffee8 ceff9978 ce9dc000 cf7ffee8 ce9dc000
    > [ 105.337182] ce9dc000 ce941d14 c12e698d cf75412c ce941d98 cf7ffee8 cf7fff20 00000000
    > [ 105.337405] Call Trace:
    > [ 105.337695] [] rpcauth_wrap_req+0x75/0x7f
    > [ 105.337806] [] ? xdr_encode_opaque+0x12/0x15
    > [ 105.337898] [] ? nfs3_xdr_enc_setacl3args+0x0/0x98
    > [ 105.337988] [] call_transmit+0x17e/0x1e8
    > [ 105.338072] [] __rpc_execute+0x6d/0x1a6
    > [ 105.338155] [] rpc_execute+0x34/0x37
    > [ 105.338235] [] rpc_run_task+0xb5/0xbd
    > [ 105.338316] [] rpc_call_sync+0x3d/0x58
    > [ 105.338402] [] nfs3_proc_setacls+0x18e/0x24f
    > [ 105.338493] [] ? __kmalloc+0x148/0x1c4
    > [ 105.338579] [] ? posix_acl_alloc+0x12/0x22
    > [ 105.338665] [] nfs3_proc_setacl+0xa0/0xca
    > [ 105.338748] [] nfs3_setxattr+0x62/0x88
    > [ 105.338834] [] ? sub_preempt_count+0x7c/0x89
    > [ 105.338926] [] ? nfs3_setxattr+0x0/0x88
    > [ 105.339026] [] __vfs_setxattr_noperm+0x26/0x95
    > [ 105.339114] [] vfs_setxattr+0x5b/0x76
    > [ 105.339211] [] setxattr+0x9d/0xc3
    > [ 105.339298] [] ? handle_pte_fault+0x258/0x5cb
    > [ 105.339428] [] ? __free_pages+0x1a/0x23
    > [ 105.339517] [] ? up_read+0x16/0x2c
    > [ 105.339599] [] ? fget+0x0/0xa3
    > [ 105.339677] [] ? fget+0x0/0xa3
    > [ 105.339760] [] ? get_parent_ip+0xb/0x31
    > [ 105.339843] [] ? sub_preempt_count+0x7c/0x89
    > [ 105.339931] [] sys_fsetxattr+0x51/0x79
    > [ 105.340014] [] sysenter_do_call+0x12/0x32
    > [ 105.340133] Code: 2e 76 18 00 58 31 d2 8b 7f 28 f6 43 04 01 74 03 8b 53 08 6a 00 8b 46 04 6a 01 8b 0b 52 89 fa e8 85 10 f8 ff 83 c4 0c 85 c0 79 04 0b eb fe 31 c9 f6 43 04 04 74 03 8b 4b 0c 68 00 10 00 00 8d
    > [ 105.350321] EIP: [] nfs3_xdr_enc_setacl3args+0x61/0x98 SS:ESP 0068:ce941cb4
    > [ 105.364385] ---[ end trace 01fcfe7f0f7f6e4a ]---

    nfs3_xdr_enc_setacl3args() is not properly setting up the target
    buffer before nfsacl_encode() attempts to encode the ACL.

    Introduced by commit d9c407b1 "NFS: Introduce new-style XDR encoding
    functions for NFSv3."

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • Nick Piggin reports:

    > I'm getting use after frees in aio code in NFS
    >
    > [ 2703.396766] Call Trace:
    > [ 2703.396858] [] ? native_sched_clock+0x27/0x80
    > [ 2703.396959] [] ? put_lock_stats+0xe/0x40
    > [ 2703.397058] [] ? lock_release_holdtime+0xa8/0x140
    > [ 2703.397159] [] lock_acquire+0x95/0x1b0
    > [ 2703.397260] [] ? aio_put_req+0x2b/0x60
    > [ 2703.397361] [] ? get_parent_ip+0x11/0x50
    > [ 2703.397464] [] _raw_spin_lock_irq+0x41/0x80
    > [ 2703.397564] [] ? aio_put_req+0x2b/0x60
    > [ 2703.397662] [] aio_put_req+0x2b/0x60
    > [ 2703.397761] [] do_io_submit+0x2be/0x7c0
    > [ 2703.397895] [] sys_io_submit+0xb/0x10
    > [ 2703.397995] [] system_call_fastpath+0x16/0x1b
    >
    > Adding some tracing, it is due to nfs completing the request then
    > returning something other than -EIOCBQUEUED, so aio.c
    > also completes the request.

    To address this, prevent the NFS direct I/O engine from completing
    async iocbs when the forward path returns an error without starting
    any I/O.

    This fix appears to survive ^C during both "xfstest no. 208" and "fsx
    -Z."

    It's likely this bug has existed for a very long while, as we are seeing
    very similar symptoms in OEL 5. Copying stable.

    Cc: Stable
    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • On Mon, 17 Jan 2011, Mi Jinlong wrote:

    >
    >
    > Jesper Juhl:
    > > strrchr() can return NULL if nothing is found. If this happens we'll
    > > dereference a NULL pointer in
    > > fs/nfs/nfs4filelayoutdev.c::decode_and_add_ds().
    > >
    > > I tried to find some other code that guarantees that this can never
    > > happen but I was unsuccessful. So, unless someone else can point to some
    > > code that ensures this can never be a problem, I believe this patch is
    > > needed.
    > >
    > > While I was changing this code I also noticed that all the dprintk()
    > > statements, except one, start with "%s:". The one missing the ":" I added
    > > it to.
    >
    > Maybe another one also should be changed at decode_and_add_ds() at line 243:
    >
    > 243 printk("%s Decoded address and port %s\n", __func__, buf);
    >
    Missed that one. Thanks.

    Signed-off-by: Jesper Juhl
    Signed-off-by: Trond Myklebust

    Jesper Juhl
     
  • Use for switching on strict cache mode. In this mode the
    client reads from the cache all the time it has Oplock Level II,
    otherwise - read from the server. As for write - the client stores
    a data in the cache in Exclusive Oplock case, otherwise - write
    directly to the server.

    Signed-off-by: Pavel Shilovsky
    Reviewed-by: Jeff Layton
    Signed-off-by: Steve French

    Pavel Shilovsky
     
  • If we don't have Exclusive oplock we write a data to the server.
    Also set invalidate_mapping flag on the inode if we wrote something
    to the server. Add cifs_iovec_write to let the client write iovec
    buffers through CIFSSMBWrite2.

    Signed-off-by: Pavel Shilovsky
    Reviewed-by: Jeff Layton
    Signed-off-by: Steve French

    Pavel Shilovsky
     
  • Replace remaining use of md5 hash functions local to cifs module
    with kernel crypto APIs.
    Remove header and source file containing those local functions.

    Signed-off-by: Shirish Pargaonkar
    Reviewed-by: Jeff Layton
    Signed-off-by: Steve French

    Steve French
     
  • Ignore replication or auth frag data if it indicates an MDS that is not
    active. This can happen if the MDS shuts down and the client has stale
    data about the namespace distribution across the MDS cluster. If that's
    the case, fall back to directing the request based on the auth cap (which
    should always be accurate).

    Signed-off-by: Sage Weil

    Sage Weil
     

25 Jan, 2011

1 commit


24 Jan, 2011

2 commits

  • Teach cifs about network namespaces, so mounting uses adresses/routing
    visible from the container rather than from init context.

    A container is a chroot on steroids that changes more than just the root
    filesystem the new processes see. One thing containers can isolate is
    "network namespaces", meaning each container can have its own set of
    ethernet interfaces, each with its own own IP address and routing to the
    outside world. And if you open a socket in _userspace_ from processes
    within such a container, this works fine.

    But sockets opened from within the kernel still use a single global
    networking context in a lot of places, meaning the new socket's address
    and routing are correct for PID 1 on the host, but are _not_ what
    userspace processes in the container get to use.

    So when you mount a network filesystem from within in a container, the
    mount code in the CIFS driver uses the host's networking context and not
    the container's networking context, so it gets the wrong address, uses
    the wrong routing, and may even try to go out an interface that the
    container can't even access... Bad stuff.

    This patch copies the mount process's network context into the CIFS
    structure that stores the rest of the server information for that mount
    point, and changes the socket open code to use the saved network context
    instead of the global network context. I.E. "when you attempt to use
    these addresses, do so relative to THIS set of network interfaces and
    routing rules, not the old global context from back before we supported
    containers".

    The big long HOWTO sets up a test environment on the assumption you've
    never used ocntainers before. It basically says:

    1) configure and build a new kernel that has container support
    2) build a new root filesystem that includes the userspace container
    control package (LXC)
    3) package/run them under KVM (so you don't have to mess up your host
    system in order to play with containers).
    4) set up some containers under the KVM system
    5) set up contradictory routing in the KVM system and the container so
    that the host and the container see different things for the same address
    6) try to mount a CIFS share from both contexts so you can both force it
    to work and force it to fail.

    For a long drawn out test reproduction sequence, see:

    http://landley.livejournal.com/47024.html
    http://landley.livejournal.com/47205.html
    http://landley.livejournal.com/47476.html

    Signed-off-by: Rob Landley
    Reviewed-by: Jeff Layton
    Signed-off-by: Steve French

    Rob Landley
     
  • In fs/cifs/cifs_dfs_ref.c::cifs_dfs_do_automount() we have this code:

    ...
    mnt = ERR_PTR(-EINVAL);
    if (IS_ERR(tlink)) {
    mnt = ERR_CAST(tlink);
    goto free_full_path;
    }
    ses = tlink_tcon(tlink)->ses;

    rc = get_dfs_path(xid, ses, full_path + 1, cifs_sb->local_nls,
    &num_referrals, &referrals,
    cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MAP_SPECIAL_CHR);

    cifs_put_tlink(tlink);

    mnt = ERR_PTR(-ENOENT);
    ...

    The assignment of 'mnt = ERR_PTR(-EINVAL);' is completely pointless. If we
    take the 'if (IS_ERR(tlink))' branch we'll set 'mnt' again and we'll also
    do so if we do not take the branch. There is no way we'll ever use 'mnt'
    with the assigned 'ERR_PTR(-EINVAL)' value, so we may as well just remove
    the pointless assignment.

    Signed-off-by: Jesper Juhl
    Signed-off-by: Steve French

    Jesper Juhl
     

23 Jan, 2011

1 commit

  • Fix new fs/dcache.c kernel-doc warnings:

    Warning(fs/dcache.c:184): No description found for parameter 'dentry'
    Warning(fs/dcache.c:296): No description found for parameter 'parent'
    Warning(fs/dcache.c:1985): No description found for parameter 'dparent'
    Warning(fs/dcache.c:1985): Excess function parameter 'parent' description in 'd_validate'

    Signed-off-by: Randy Dunlap
    Cc: Alexander Viro
    Cc: Nick Piggin
    Signed-off-by: Linus Torvalds

    Randy Dunlap