10 Nov, 2012

1 commit


09 Nov, 2012

14 commits

  • Merge misc fixes from Andrew Morton:
    "Five fixes"

    * emailed patches from Andrew Morton : (5 patches)
    h8300: add missing L1_CACHE_SHIFT
    mm: bugfix: set current->reclaim_state to NULL while returning from kswapd()
    fanotify: fix missing break
    revert "epoll: support for disabling items, and a self-test app"
    checkpatch: improve network block comment style checking

    Linus Torvalds
     
  • Pull xfs bugfixes from Ben Myers:

    - fix for large transactions spanning multiple iclog buffers

    - zero the allocation_args structure on the stack before using it to
    determine whether to use a worker for allocation
    - move allocation stack switch to xfs_bmapi_allocate in order to
    prevent deadlock on AGF buffers

    - growfs no longer reads in garbage for new secondary superblocks

    - silence a build warning

    - ensure that invalid buffers never get written to disk while on free
    list

    - don't vmap inode cluster buffers during free

    - fix buffer shutdown reference count mismatch

    - fix reading of wrapped log data

    * tag 'for-linus-v3.7-rc5' of git://oss.sgi.com/xfs/xfs:
    xfs: fix reading of wrapped log data
    xfs: fix buffer shudown reference count mismatch
    xfs: don't vmap inode cluster buffers during free
    xfs: invalidate allocbt blocks moved to the free list
    xfs: silence uninitialised f.file warning.
    xfs: growfs: don't read garbage for new secondary superblocks
    xfs: move allocation stack switch up to xfs_bmapi_allocate
    xfs: introduce XFS_BMAPI_STACK_SWITCH
    xfs: zero allocation_args on the kernel stack
    xfs: only update the last_sync_lsn when a transaction completes

    Linus Torvalds
     
  • Anders Blomdell noted in 2010 that Fanotify lost events and provided a
    test case. Eric Paris confirmed it was a bug and posted a fix to the
    list

    https://groups.google.com/forum/?fromgroups=#!topic/linux.kernel/RrJfTfyW2BE

    but never applied it. Repeated attempts over time to actually get him
    to apply it have never had a reply from anyone who has raised it

    So apply it anyway

    Signed-off-by: Alan Cox
    Reported-by: Anders Blomdell
    Cc: Eric Paris
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Paris
     
  • Revert commit 03a7beb55b9f ("epoll: support for disabling items, and a
    self-test app") pending resolution of the issues identified by Michael
    Kerrisk, copied below.

    We'll revisit this for 3.8.

    : I've taken a look at this patch as it currently stands in 3.7-rc1, and
    : done a bit of testing. (By the way, the test program
    : tools/testing/selftests/epoll/test_epoll.c does not compile...)
    :
    : There are one or two places where the behavior seems a little strange,
    : so I have a question or two at the end of this mail. But other than
    : that, I want to check my understanding so that the interface can be
    : correctly documented.
    :
    : Just to go though my understanding, the problem is the following
    : scenario in a multithreaded application:
    :
    : 1. Multiple threads are performing epoll_wait() operations,
    : and maintaining a user-space cache that contains information
    : corresponding to each file descriptor being monitored by
    : epoll_wait().
    :
    : 2. At some point, a thread wants to delete (EPOLL_CTL_DEL)
    : a file descriptor from the epoll interest list, and
    : delete the corresponding record from the user-space cache.
    :
    : 3. The problem with (2) is that some other thread may have
    : previously done an epoll_wait() that retrieved information
    : about the fd in question, and may be in the middle of using
    : information in the cache that relates to that fd. Thus,
    : there is a potential race.
    :
    : 4. The race can't solved purely in user space, because doing
    : so would require applying a mutex across the epoll_wait()
    : call, which would of course blow thread concurrency.
    :
    : Right?
    :
    : Your solution is the EPOLL_CTL_DISABLE operation. I want to
    : confirm my understanding about how to use this flag, since
    : the description that has accompanied the patches so far
    : has been a bit sparse
    :
    : 0. In the scenario you're concerned about, deleting a file
    : descriptor means (safely) doing the following:
    : (a) Deleting the file descriptor from the epoll interest list
    : using EPOLL_CTL_DEL
    : (b) Deleting the corresponding record in the user-space cache
    :
    : 1. It's only meaningful to use this EPOLL_CTL_DISABLE in
    : conjunction with EPOLLONESHOT.
    :
    : 2. Using EPOLL_CTL_DISABLE without using EPOLLONESHOT in
    : conjunction is a logical error.
    :
    : 3. The correct way to code multithreaded applications using
    : EPOLL_CTL_DISABLE and EPOLLONESHOT is as follows:
    :
    : a. All EPOLL_CTL_ADD and EPOLL_CTL_MOD operations should
    : should EPOLLONESHOT.
    :
    : b. When a thread wants to delete a file descriptor, it
    : should do the following:
    :
    : [1] Call epoll_ctl(EPOLL_CTL_DISABLE)
    : [2] If the return status from epoll_ctl(EPOLL_CTL_DISABLE)
    : was zero, then the file descriptor can be safely
    : deleted by the thread that made this call.
    : [3] If the epoll_ctl(EPOLL_CTL_DISABLE) fails with EBUSY,
    : then the descriptor is in use. In this case, the calling
    : thread should set a flag in the user-space cache to
    : indicate that the thread that is using the descriptor
    : should perform the deletion operation.
    :
    : Is all of the above correct?
    :
    : The implementation depends on checking on whether
    : (events & ~EP_PRIVATE_BITS) == 0
    : This replies on the fact that EPOLL_CTL_AD and EPOLL_CTL_MOD always
    : set EPOLLHUP and EPOLLERR in the 'events' mask, and EPOLLONESHOT
    : causes those flags (as well as all others in ~EP_PRIVATE_BITS) to be
    : cleared.
    :
    : A corollary to the previous paragraph is that using EPOLL_CTL_DISABLE
    : is only useful in conjunction with EPOLLONESHOT. However, as things
    : stand, one can use EPOLL_CTL_DISABLE on a file descriptor that does
    : not have EPOLLONESHOT set in 'events' This results in the following
    : (slightly surprising) behavior:
    :
    : (a) The first call to epoll_ctl(EPOLL_CTL_DISABLE) returns 0
    : (the indicator that the file descriptor can be safely deleted).
    : (b) The next call to epoll_ctl(EPOLL_CTL_DISABLE) fails with EBUSY.
    :
    : This doesn't seem particularly useful, and in fact is probably an
    : indication that the user made a logic error: they should only be using
    : epoll_ctl(EPOLL_CTL_DISABLE) on a file descriptor for which
    : EPOLLONESHOT was set in 'events'. If that is correct, then would it
    : not make sense to return an error to user space for this case?

    Cc: Michael Kerrisk
    Cc: "Paton J. Lewis"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Commit 4439647 ("xfs: reset buffer pointers before freeing them") in
    3.0-rc1 introduced a regression when recovering log buffers that
    wrapped around the end of log. The second part of the log buffer at
    the start of the physical log was being read into the header buffer
    rather than the data buffer, and hence recovery was seeing garbage
    in the data buffer when it got to the region of the log buffer that
    was incorrectly read.

    Cc: # 3.0.x, 3.2.x, 3.4.x 3.6.x
    Reported-by: Torsten Kaiser
    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • When we shut down the filesystem, we have to unpin and free all the
    buffers currently active in the CIL. To do this we unpin and remove
    them in one operation as a result of a failed iclogbuf write. For
    buffers, we do this removal via a simultated IO completion of after
    marking the buffer stale.

    At the time we do this, we have two references to the buffer - the
    active LRU reference and the buf log item. The LRU reference is
    removed by marking the buffer stale, and the active CIL reference is
    by the xfs_buf_iodone() callback that is run by
    xfs_buf_do_callbacks() during ioend processing (via the bp->b_iodone
    callback).

    However, ioend processing requires one more reference - that of the
    IO that it is completing. We don't have this reference, so we free
    the buffer prematurely and use it after it is freed. For buffers
    marked with XBF_ASYNC, this leads to assert failures in
    xfs_buf_rele() on debug kernels because the b_hold count is zero.

    Fix this by making sure we take the necessary IO reference before
    starting IO completion processing on the stale buffer, and set the
    XBF_ASYNC flag to ensure that IO completion processing removes all
    the active references from the buffer to ensure it is fully torn
    down.

    Cc:
    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Inode buffers do not need to be mapped as inodes are read or written
    directly from/to the pages underlying the buffer. This fixes a
    regression introduced by commit 611c994 ("xfs: make XBF_MAPPED the
    default behaviour").

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • When we free a block from the alloc btree tree, we move it to the
    freelist held in the AGFL and mark it busy in the busy extent tree.
    This typically happens when we merge btree blocks.

    Once the transaction is committed and checkpointed, the block can
    remain on the free list for an indefinite amount of time. Now, this
    isn't the end of the world at this point - if the free list is
    shortened, the buffer is invalidated in the transaction that moves
    it back to free space. If the buffer is allocated as metadata from
    the free list, then all the modifications getted logged, and we have
    no issues, either. And if it gets allocated as userdata direct from
    the freelist, it gets invalidated and so will never get written.

    However, during the time it sits on the free list, pressure on the
    log can cause the AIL to be pushed and the buffer that covers the
    block gets pushed for write. IOWs, we end up writing a freed
    metadata block to disk. Again, this isn't the end of the world
    because we know from the above we are only writing to free space.

    The problem, however, is for validation callbacks. If the block was
    on old btree root block, then the level of the block is going to be
    higher than the current tree root, and so will fail validation.
    There may be other inconsistencies in the block as well, and
    currently we don't care because the block is in free space. Shutting
    down the filesystem because a freed block doesn't pass write
    validation, OTOH, is rather unfriendly.

    So, make sure we always invalidate buffers as they move from the
    free space trees to the free list so that we guarantee they never
    get written to disk while on the free list.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Phil White
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Uninitialised variable build warning introduced by 2903ff0 ("switch
    simple cases of fget_light to fdget"), gcc is not smart enough to
    work out that the variable is not used uninitialised, and the commit
    removed the initialisation at declaration that the old variable had.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • When updating new secondary superblocks in a growfs operation, the
    superblock buffer is read from the newly grown region of the
    underlying device. This is not guaranteed to be zero, so violates
    the underlying assumption that the unused parts of superblocks are
    zero filled. Get a new buffer for these secondary superblocks to
    ensure that the unused regions are zero filled correctly.

    Signed-off-by: Dave Chinner
    Reviewed-by: Carlos Maiolino
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Switching stacks are xfs_alloc_vextent can cause deadlocks when we
    run out of worker threads on the allocation workqueue. This can
    occur because xfs_bmap_btalloc can make multiple calls to
    xfs_alloc_vextent() and even if xfs_alloc_vextent() fails it can
    return with the AGF locked in the current allocation transaction.

    If we then need to make another allocation, and all the allocation
    worker contexts are exhausted because the are blocked waiting for
    the AGF lock, holder of the AGF cannot get it's xfs-alloc_vextent
    work completed to release the AGF. Hence allocation effectively
    deadlocks.

    To avoid this, move the stack switch one layer up to
    xfs_bmapi_allocate() so that all of the allocation attempts in a
    single switched stack transaction occur in a single worker context.
    This avoids the problem of an allocation being blocked waiting for
    a worker thread whilst holding the AGF.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Certain allocation paths through xfs_bmapi_write() are in situations
    where we have limited stack available. These are almost always in
    the buffered IO writeback path when convertion delayed allocation
    extents to real extents.

    The current stack switch occurs for userdata allocations, which
    means we also do stack switches for preallocation, direct IO and
    unwritten extent conversion, even those these call chains have never
    been implicated in a stack overrun.

    Hence, let's target just the single stack overun offended for stack
    switches. To do that, introduce a XFS_BMAPI_STACK_SWITCH flag that
    the caller can pass xfs_bmapi_write() to indicate it should switch
    stacks if it needs to do allocation.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Zero the kernel stack space that makes up the xfs_alloc_arg structures.

    Signed-off-by: Mark Tinguely
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Mark Tinguely
     
  • The log write code stamps each iclog with the current tail LSN in
    the iclog header so that recovery knows where to find the tail of
    thelog once it has found the head. Normally this is taken from the
    first item on the AIL - the log item that corresponds to the oldest
    active item in the log.

    The problem is that when the AIL is empty, the tail lsn is dervied
    from the the l_last_sync_lsn, which is the LSN of the last iclog to
    be written to the log. In most cases this doesn't happen, because
    the AIL is rarely empty on an active filesystem. However, when it
    does, it opens up an interesting case when the transaction being
    committed to the iclog spans multiple iclogs.

    That is, the first iclog is stamped with the l_last_sync_lsn, and IO
    is issued. Then the next iclog is setup, the changes copied into the
    iclog (takes some time), and then the l_last_sync_lsn is stamped
    into the header and IO is issued. This is still the same
    transaction, so the tail lsn of both iclogs must be the same for log
    recovery to find the entire transaction to be able to replay it.

    The problem arises in that the iclog buffer IO completion updates
    the l_last_sync_lsn with it's own LSN. Therefore, If the first iclog
    completes it's IO before the second iclog is filled and has the tail
    lsn stamped in it, it will stamp the LSN of the first iclog into
    it's tail lsn field. If the system fails at this point, log recovery
    will not see a complete transaction, so the transaction will no be
    replayed.

    The fix is simple - the l_last_sync_lsn is updated when a iclog
    buffer IO completes, and this is incorrect. The l_last_sync_lsn
    shoul dbe updated when a transaction is completed by a iclog buffer
    IO. That is, only iclog buffers that have transaction commit
    callbacks attached to them should update the l_last_sync_lsn. This
    means that the last_sync_lsn will only move forward when a commit
    record it written, not in the middle of a large transaction that is
    rolling through multiple iclog buffers.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     

07 Nov, 2012

8 commits

  • Pull gfs2 fixes from Steven Whitehouse:
    "Here are a number of GFS2 bug fixes. There are three from Andy Price
    which fix various issues spotted by automated code analysis. There
    are two from Lukas Czerner fixing my mistaken assumptions as to how
    FITRIM should work. Finally Ben Marzinski has fixed a bug relating to
    mmap and atime and also a bug relating to a locking issue in the
    transaction code."

    * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
    GFS2: Test bufdata with buffer locked and gfs2_log_lock held
    GFS2: Don't call file_accessed() with a shared glock
    GFS2: Fix FITRIM argument handling
    GFS2: Require user to provide argument for FITRIM
    GFS2: Clean up some unused assignments
    GFS2: Fix possible null pointer deref in gfs2_rs_alloc
    GFS2: Fix an unchecked error from gfs2_rs_alloc

    Linus Torvalds
     
  • In gfs2_trans_add_bh(), gfs2 was testing if a there was a bd attached to the
    buffer without having the gfs2_log_lock held. It was then assuming it would
    stay attached for the rest of the function. However, without either the log
    lock being held of the buffer locked, __gfs2_ail_flush() could detach bd at any
    time. This patch moves the locking before the test. If there isn't a bd
    already attached, gfs2 can safely allocate one and attach it before locking.
    There is no way that the newly allocated bd could be on the ail list,
    and thus no way for __gfs2_ail_flush() to detach it.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • file_accessed() was being called by gfs2_mmap() with a shared glock. If it
    needed to update the atime, it was crashing because it dirtied the inode in
    gfs2_dirty_inode() without holding an exclusive lock. gfs2_dirty_inode()
    checked if the caller was already holding a glock, but it didn't make sure that
    the glock was in the exclusive state. Now, instead of calling file_accessed()
    while holding the shared lock in gfs2_mmap(), file_accessed() is called after
    grabbing and releasing the glock to update the inode. If file_accessed() needs
    to update the atime, it will grab an exclusive lock in gfs2_dirty_inode().

    gfs2_dirty_inode() now also checks to make sure that if the calling process has
    already locked the glock, it has an exclusive lock.

    Signed-off-by: Benjamin Marzinski
    Signed-off-by: Steven Whitehouse

    Benjamin Marzinski
     
  • Currently implementation in gfs2 uses FITRIM arguments as it were in
    file system blocks units which is wrong. The FITRIM arguments
    (fstrim_range.start, fstrim_range.len and fstrim_range.minlen) are
    actually in bytes.

    Moreover, check for start argument beyond the end of file system, len
    argument being smaller than file system block and minlen argument being
    bigger than biggest resource group were missing.

    This commit converts the code to convert FITRIM argument to file system
    blocks and also adds appropriate checks mentioned above.

    All the problems were recognised by xfstests 251 and 260.

    Signed-off-by: Lukas Czerner
    Signed-off-by: Steven Whitehouse

    Lukas Czerner
     
  • When the fstrim_range argument is not provided by user in FITRIM ioctl
    we should just return EFAULT and not promoting bad behaviour by filling
    the structure in kernel. Let the user deal with it.

    Signed-off-by: Lukas Czerner
    Signed-off-by: Steven Whitehouse

    Lukas Czerner
     
  • Cleans up two cases where variables were assigned values but then never
    used again.

    Signed-off-by: Andrew Price
    Signed-off-by: Steven Whitehouse

    Andrew Price
     
  • Despite the return value from kmem_cache_zalloc() being checked, the
    error wasn't being returned until after a possible null pointer
    dereference. This patch returns the error immediately, allowing the
    removal of the error variable.

    Signed-off-by: Andrew Price
    Signed-off-by: Steven Whitehouse

    Andrew Price
     
  • Check the return value of gfs2_rs_alloc(ip) and avoid a possible null
    pointer dereference.

    Signed-off-by: Andrew Price
    Signed-off-by: Steven Whitehouse

    Andrew Price
     

05 Nov, 2012

1 commit

  • We do not need to lookup a hashed negative directory since we have
    already revalidated it before and have found it to be fine.

    This also prevents a crash in cifs_lookup() when it attempts to rehash
    the already hashed negative lookup dentry.

    The patch has been tested using the reproducer at
    https://bugzilla.redhat.com/show_bug.cgi?id=867344#c28

    Cc: # 3.6.x
    Reported-by: Vit Zahradka
    Signed-off-by: Sachin Prabhu

    Sachin Prabhu
     

03 Nov, 2012

2 commits

  • The userspace cifs.idmap program generally works with the wbclient libs
    to generate binary SIDs in userspace. That program defines the struct
    that holds these values as having a max of 15 subauthorities. The kernel
    idmapping code however limits that value to 5.

    When the kernel copies those values around though, it doesn't sanity
    check the num_subauths value handed back from userspace or from the
    server. It's possible therefore for userspace to hand us back a bogus
    num_subauths value (or one that's valid, but greater than 5) that could
    cause the kernel to walk off the end of the cifs_sid->sub_auths array.

    Fix this by defining a new routine for copying sids and using that in
    all of the places that copy it. If we end up with a sid that's longer
    than expected then this approach will just lop off the "extra" subauths,
    but that's basically what the code does today already. Better approaches
    might be to fix this code to reject SIDs with >5 subauths, or fix it
    to handle the subauths array dynamically.

    At the same time, change the kernel to check the length of the data
    returned by userspace. If it's shorter than struct cifs_sid, reject it
    and return -EIO. If that happens we'll end up with fields that are
    basically uninitialized.

    Long term, it might make sense to redefine cifs_sid using a flexarray at
    the end, to allow for variable-length subauth lists, and teach the code
    to handle the case where the subauths array being passed in from
    userspace is shorter than 5 elements.

    Note too, that I don't consider this a security issue since you'd need
    a compromised cifs.idmap program. If you have that, you can do all sorts
    of nefarious stuff. Still, this is probably reasonable for stable.

    Cc: stable@kernel.org
    Reviewed-by: Shirish Pargaonkar
    Signed-off-by: Jeff Layton

    Jeff Layton
     
  • Return errno - not an NFS4ERR_. This worked because NFS4ERR_ACCESS == EACCES.

    Signed-off-by: Weston Andros Adamson
    Signed-off-by: Trond Myklebust

    Weston Andros Adamson
     

02 Nov, 2012

1 commit


01 Nov, 2012

8 commits

  • Use nfs_sb_deactive_async instead of nfs_sb_deactive when in a workqueue
    context. This avoids a deadlock where rpc_shutdown_client loops forever
    in a workqueue kworker context, trying to kill all RPC tasks associated with
    the client, while one or more of these tasks have already been assigned to the
    same kworker (and will never run rpc_exit_task).

    This approach is needed because RPC tasks that have already been assigned
    to a kworker by queue_work cannot be canceled, as explained in the comment
    for workqueue.c:insert_wq_barrier.

    Signed-off-by: Weston Andros Adamson
    [Trond: add module_get/put.]
    Signed-off-by: Trond Myklebust

    Weston Andros Adamson
     
  • Since commit c7f404b ('vfs: new superblock methods to override
    /proc/*/mount{s,info}'), nfs_path() is used to generate the mounted
    device name reported back to userland.

    nfs_path() always generates a trailing slash when the given dentry is
    the root of an NFS mount, but userland may expect the original device
    name to be returned verbatim (as it used to be). Make this
    canonicalisation optional and change the callers accordingly.

    [jrnieder@gmail.com: use flag instead of bool argument]
    Reported-and-tested-by: Chris Hiestand
    Reference: http://bugs.debian.org/669314
    Signed-off-by: Ben Hutchings
    Cc: # v2.6.39+
    Signed-off-by: Jonathan Nieder
    Signed-off-by: Trond Myklebust

    Ben Hutchings
     
  • In very busy v3 environment, rpc.mountd can respond to the NULL
    procedure but not the MNT procedure in a timely manner causing
    the MNT procedure to time out. The problem is the mount system
    call returns EIO which causes the mount to fail, instead of
    ETIMEDOUT, which would cause the mount to be retried.

    This patch sets the RPC_TASK_SOFT|RPC_TASK_TIMEOUT flags to
    the rpc_call_sync() call in nfs_mount() which causes
    ETIMEDOUT to be returned on timed out connections.

    Signed-off-by: Steve Dickson
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org

    Scott Mayhew
     
  • The new layout pointer in pnfs_find_alloc_layout() may be NULL because of
    out of memory. we must do some check work, otherwise pnfs_free_layout_hdr()
    will go wrong because it can not deal with a NULL pointer.

    Signed-off-by: Yanchuan Nian
    Signed-off-by: Trond Myklebust

    Yanchuan Nian
     
  • The DNS resolver's use of the sunrpc cache involves a 'ttl' number
    (relative) rather that a timeout (absolute). This confused me when
    I wrote
    commit c5b29f885afe890f953f7f23424045cdad31d3e4
    "sunrpc: use seconds since boot in expiry cache"

    and I managed to break it. The effect is that any TTL is interpreted
    as 0, and nothing useful gets into the cache.

    This patch removes the use of get_expiry() - which really expects an
    expiry time - and uses get_uint() instead, treating the int correctly
    as a ttl.

    This fixes a regression that has been present since 2.6.37, causing
    certain NFS accesses in certain environments to incorrectly fail.

    Reported-by: Chuck Lever
    Tested-by: Chuck Lever
    Cc: stable@vger.kernel.org
    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust

    NeilBrown
     
  • If the state recovery machinery is triggered by the call to
    nfs4_async_handle_error() then we can deadlock.

    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org

    Trond Myklebust
     
  • If we do not release the sequence id in cases where we fail to get a
    session slot, then we can deadlock if we hit a recovery scenario.

    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org

    Trond Myklebust
     
  • Currently, we will schedule session recovery and then return to the
    caller of nfs4_handle_exception. This works for most cases, but causes
    a hang on the following test case:

    Client Server
    ------ ------
    Open file over NFS v4.1
    Write to file
    Expire client
    Try to lock file

    The server will return NFS4ERR_BADSESSION, prompting the client to
    schedule recovery. However, the client will continue placing lock
    attempts and the open recovery never seems to be scheduled. The
    simplest solution is to wait for session recovery to run before retrying
    the lock.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org

    Bryan Schumaker
     

31 Oct, 2012

3 commits

  • Jack Lin reports that the error return from dup3() for the RLIMIT_NOFILE
    case changed incorrectly after 3.6.

    The culprit is commit f33ff9927f42 ("take rlimit check to callers of
    expand_files()") which when it moved the "return -EMFILE" out to the
    caller, didn't notice that the dup3() had special code to turn the
    EMFILE return into EBADF.

    The replace_fd() helper that got added later then inherited the bug too.

    Reported-by: Jack Lin
    Signed-off-by: Al Viro
    [ Noted more bugs, wrote proper changelog, fixed up typos - Linus ]
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • Pull ext4 bugfix from Ted Ts'o:
    "This fixes the root cause of the ext4 data corruption bug which raised
    a ruckus on LWN, Phoronix, and Slashdot.

    This bug only showed up when non-standard mount options
    (journal_async_commit and/or journal_checksum) were enabled, and when
    the file system was not cleanly unmounted, but the root cause was the
    inode bitmap modifications was not being properly journaled.

    This could potentially lead to minor file system corruptions (pass 5
    complaints with the inode allocation bitmap) after an unclean shutdown
    under the wrong/unlucky workloads, but it turned into major failure if
    the journal_checksum and/or jouaral_async_commit was enabled."

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: fix unjournaled inode bitmap modification

    Linus Torvalds
     
  • Pull block driver update from Jens Axboe:
    "Distilled down variant, the rest will pass over to 3.8. I pulled it
    into the for-linus branch I had waiting for a pull request as well, in
    case you are wondering why there are new entries in here too. This
    also got rid of two reverts and the ones of the mtip32xx patches that
    went in later in the 3.6 cycle, so the series looks a bit cleaner."

    * 'for-linus' of git://git.kernel.dk/linux-block:
    loop: Make explicit loop device destruction lazy
    mtip32xx:Added appropriate timeout value for secure erase
    xen/blkback: Change xen_vbd's flush_support and discard_secure to have type unsigned int, rather than bool
    cciss: select CONFIG_CHECK_SIGNATURE
    cciss: remove unneeded memset()
    xen/blkback: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
    pktcdvd: update MAINTAINERS
    floppy: remove dr, reuse drive on do_floppy_init
    floppy: use common function to check if floppies can be registered
    floppy: properly handle failure on add_disk loop
    floppy: do put_disk on current dr if blk_init_queue fails
    floppy: don't call alloc_ordered_workqueue inside the alloc_disk loop
    xen/blkback: Fix compile warning
    block: Add blk_rq_pos(rq) to sort rq when plushing
    drivers/block: remove CONFIG_EXPERIMENTAL
    block: remove CONFIG_EXPERIMENTAL
    vfs: fix: don't increase bio_slab_max if krealloc() fails
    blkcg: stop iteration early if root_rl is the only request list
    blkcg: Fix use-after-free of q->root_blkg and q->root_rl.blkg

    Linus Torvalds
     

29 Oct, 2012

2 commits