05 May, 2011

2 commits


04 May, 2011

3 commits


03 May, 2011

3 commits

  • * 'for-linus' of git://git.infradead.org/ubifs-2.6:
    UBIFS: seek journal heads to the latest bud in replay
    UBIFS: do not free write-buffers when in R/O mode

    Linus Torvalds
     
  • This is the second fix of the following symptom:

    UBIFS error (pid 34456): could not find an empty LEB

    which sometimes happens after power cuts when we mount the file-system - UBIFS
    refuses it with the above error message which comes from the
    'ubifs_rcvry_gc_commit()' function. I can reproduce this using the integck test
    with the UBIFS power cut emulation enabled.

    Analysis of the problem.

    Currently UBIFS replay seeks the journal heads to the last _replayed_ bud.
    But the buds are replayed out-of-order, so the replay basically seeks journal
    heads to the "random" bud belonging to this head, and not to the _last_ one.

    The result of this is that the GC head may be seeked to a full LEB with no free
    space, or very little free space. And 'ubifs_rcvry_gc_commit()' tries to find a
    fully or mostly dirty LEB to match the current GC head (because we need to
    garbage-collect that dirty LEB at one go, because we do not have @c->gc_lnum).
    So 'ubifs_find_dirty_leb()' fails and we fall back to finding an empty LEB and
    also fail. As a result - recovery fails and mounting fails.

    This patch teaches the replay to initialize the GC heads exactly to the latest
    buds, i.e. the buds which have the largest sequence number in corresponding
    log reference nodes.

    Signed-off-by: Artem Bityutskiy
    Cc: stable@kernel.org

    Artem Bityutskiy
     
  • Currently UBIFS has a small optimization - it frees write-buffers when it is
    re-mounted from R/W mode to R/O mode. Of course, when it is mounted R/O, it
    does not allocate write-buffers as well.

    This optimization is nice but it leads to subtle problems and complications
    in recovery, which I can reproduce using the integck test. The symptoms are
    that after a power cut the file-system cannot be mounted if we first mount
    it R/O, and then re-mount R/W - 'ubifs_rcvry_gc_commit()' prints:

    UBIFS error (pid 34456): could not find an empty LEB

    Analysis of the problem.

    When mounting R/W, the reply process sets journal heads to buds [1], but
    when mounting R/O - it does not do this, because the write-buffers are not
    allocated. So 'ubifs_rcvry_gc_commit()' works completely differently for the
    same file-system but for the following 2 cases:

    1. mounting R/W after a power cut and recover
    2. mounting R/O after a power cut, re-mounting R/W and run deferred recovery

    In the former case, we have journal heads seeked to the a bud, in the latter
    case, they are non-seeked (wbuf->lnum == -1). So in the latter case we do not
    try to recover the GC LEB by garbage-collecting to the GC head, but we just
    try to find an empty LEB, and there may be no empty LEBs, so we just fail.
    On the other hand, in the former case (mount R/W), we are able to make a GC LEB
    (@c->gc_lnum) by garbage-collecting.

    Thus, let's remove this small nice optimization and always allocate
    write-buffers. This should not make too big difference - we have only 3
    of them, each of max. write unit size, which is usually 2KiB. So this is
    about 6KiB of RAM for the typical case, and only when mounted R/O.

    [1]: Note, currently the replay process is setting (seeking) the journal heads
    to _some_ buds, not necessarily to the buds which had been the journal heads
    before the power cut happened. This will be fixed separately.

    Signed-off-by: Artem Bityutskiy
    Cc: stable@kernel.org

    Artem Bityutskiy
     

29 Apr, 2011

2 commits

  • * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
    nfs: don't lose MS_SYNCHRONOUS on remount of noac mount
    NFS: Return meaningful status from decode_secinfo()
    NFSv4: Ensure we request the ordinary fileid when doing readdirplus
    NFSv4: Ensure that clientid and session establishment can time out
    SUNRPC: Allow RPC calls to return ETIMEDOUT instead of EIO
    NFSv4.1: Don't loop forever in nfs4_proc_create_session
    NFSv4: Handle NFS4ERR_WRONGSEC outside of nfs4_handle_exception()
    NFSv4.1: Don't update sequence number if rpc_task is not sent
    NFSv4.1: Ensure state manager thread dies on last umount
    SUNRPC: Fix the SUNRPC Kerberos V RPCSEC_GSS module dependencies
    NFS: Use correct variable for page bounds checking
    NFS: don't negotiate when user specifies sec flavor
    NFS: Attempt mount with default sec flavor first
    NFS: flav_array honors NFS_MAX_SECFLAVORS
    NFS: Fix infinite loop in gss_create_upcall()
    Don't mark_inode_dirty_sync() while holding lock
    NFS: Get rid of pointless test in nfs_commit_done
    NFS: Remove unused argument from nfs_find_best_sec()
    NFS: Eliminate duplicate call to nfs_mark_request_dirty
    NFS: Remove dead code from nfs_fs_mount()

    Linus Torvalds
     
  • Azurit reports large increases in system time after 2.6.36 when running
    Apache. It was bisected down to a892e2d7dcdfa6c76e6 ("vfs: use kmalloc()
    to allocate fdmem if possible").

    That patch caused the vfs to use kmalloc() for very large allocations and
    this is causing excessive work (and presumably excessive reclaim) within
    the page allocator.

    Fix it by falling back to vmalloc() earlier - when the allocation attempt
    would have been considered "costly" by reclaim.

    Reported-by: azurIt
    Tested-by: azurIt
    Acked-by: Changli Gao
    Cc: Americo Wang
    Cc: Jiri Slaby
    Acked-by: Eric Dumazet
    Cc: Mel Gorman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

28 Apr, 2011

3 commits

  • On a remount, the VFS layer will clear the MS_SYNCHRONOUS bit on the
    assumption that the flags on the mount syscall will have it set if the
    remounted fs is supposed to keep it.

    In the case of "noac" though, MS_SYNCHRONOUS is implied. A remount of
    such a mount will lose the MS_SYNCHRONOUS flag since "sync" isn't part
    of the mount options.

    Reported-by: Max Matveev
    Signed-off-by: Jeff Layton
    Cc: stable@kernel.org
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • When compiling, I was getting this warning:
    fs/nfs/nfs4xdr.c: In function ‘decode_secinfo’:
    fs/nfs/nfs4xdr.c:4839:6: warning: variable ‘status’ set but not used
    [-Wunused-but-set-variable]

    We were unconditionally returning 0 as long as there wasn't an error
    coming out of xdr_inline_decode(). We probably want to check the error
    status coming out of decode_op_hdr() and decode_secinfo_gss(), rather
    than assuming that everything is OK all the time.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • When readdir() returns a directory entry for the root of a mounted
    filesystem, Linux follows the old convention of returning the inode
    number of the covered directory (despite newer versions of POSIX declaring
    that this is a bug).
    To ensure this continues to work, the NFSv4 readdir implementation requests
    the 'mounted-on-fileid' from the server.

    However, readdirplus also needs to instantiate an inode for this entry, and
    for that, we also need to request the real fileid as per this patch.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

27 Apr, 2011

1 commit


26 Apr, 2011

18 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: cleanup error handling in inode.c
    Btrfs: put the right bio if we have an error
    Btrfs: free bitmaps properly when evicting the cache
    Btrfs: Free free_space item properly in btrfs_trim_block_group()
    btrfs: add missing spin_unlock to a rare exit path
    Btrfs: check return value of kmalloc()
    btrfs: fix wrong allocating flag when reading page
    Btrfs: fix missing mutex_unlock in btrfs_del_dir_entries_in_log()

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: do some plugging in the submit_bio threads

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
    CIFS: Fix memory over bound bug in cifs_parse_mount_options

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
    eCryptfs: Flush dirty pages in setattr
    eCryptfs: Handle failed metadata read in lookup
    eCryptfs: Add reference counting to lower files
    eCryptfs: dput dentries returned from dget_parent
    eCryptfs: Remove extra d_delete in ecryptfs_rmdir

    Linus Torvalds
     
  • Now that the whole dcache_hash_bucket crap is gone, go all the way and
    also remove the weird locking layering violations for locking the hash
    buckets. Add hlist_bl_lock/unlock helpers to move the locking into the
    list abstraction instead of requiring each caller to open code it.
    After all allowing for the bit locks is the whole point of these helpers
    over the plain hlist variant.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • After 57db4e8d73ef2b5e94a3f412108dff2576670a8a changed eCryptfs to
    write-back caching, eCryptfs page writeback updates the lower inode
    times due to the use of vfs_write() on the lower file.

    To preserve inode metadata changes, such as 'cp -p' does with
    utimensat(), we need to flush all dirty pages early in
    ecryptfs_setattr() so that the user-updated lower inode metadata isn't
    clobbered later in writeback.

    https://bugzilla.kernel.org/show_bug.cgi?id=33372

    Reported-by: Rocko
    Signed-off-by: Tyler Hicks

    Tyler Hicks
     
  • When failing to read the lower file's crypto metadata during a lookup,
    eCryptfs must continue on without throwing an error. For example, there
    may be a plaintext file in the lower mount point that the user wants to
    delete through the eCryptfs mount.

    If an error is encountered while reading the metadata in lookup(), the
    eCryptfs inode's size could be incorrect. We must be sure to reread the
    plaintext inode size from the metadata when performing an open() or
    setattr(). The metadata is already being read in those paths, so this
    adds minimal performance overhead.

    This patch introduces a flag which will track whether or not the
    plaintext inode size has been read so that an incorrect i_size can be
    fixed in the open() or setattr() paths.

    https://bugs.launchpad.net/bugs/509180

    Cc:
    Signed-off-by: Tyler Hicks

    Tyler Hicks
     
  • The error processing of several places is changed like setting the
    error number only at the error.

    Signed-off-by: Tsutomu Itoh
    Signed-off-by: Chris Mason

    Tsutomu Itoh
     
  • In btrfs_submit_direct_hook if the first btrfs_map_block fails we need to put
    the orig_bio, not bio.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • If our space cache is wrong, we do the right thing and free up everything that
    we loaded, however we don't reset the total_bitmaps counter or the thresholds or
    anything. So in btrfs_remove_free_space_cache make sure to call free_bitmap()
    if it's a bitmap, this will keep us from panicing when we check to make sure we
    don't have too many bitmaps. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • Since commit dc89e9824464e91fa0b06267864ceabe3186fd8b, we've changed
    to use a specific slab for alocation of free_space items.

    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Li Zefan
     
  • Signed-off-by: David Sterba
    Signed-off-by: Chris Mason

    David Sterba
     
  • The check on the return value of kmalloc() is added to some places.

    Signed-off-by: Tsutomu Itoh
    Signed-off-by: Chris Mason

    Tsutomu Itoh
     
  • the space cache use extent_readpages() to read free space information,
    so we can not use GFP_KERNEL flag to allocate memory, or it may lead
    to deadlock.

    Signed-off-by: Itaru Kitayama
    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Itaru Kitayama
     
  • It is necessary to unlock mutex_lock before it return an error when
    btrfs_alloc_path() fails.

    Signed-off-by: Tsutomu Itoh
    Signed-off-by: Chris Mason

    Tsutomu Itoh
     
  • For any given lower inode, eCryptfs keeps only one lower file open and
    multiplexes all eCryptfs file operations through that lower file. The
    lower file was considered "persistent" and stayed open from the first
    lookup through the lifetime of the inode.

    This patch keeps the notion of a single, per-inode lower file, but adds
    reference counting around the lower file so that it is closed when not
    currently in use. If the reference count is at 0 when an operation (such
    as open, create, etc.) needs to use the lower file, a new lower file is
    opened. Since the file is no longer persistent, all references to the
    term persistent file are changed to lower file.

    Locking is added around the sections of code that opens the lower file
    and assign the pointer in the inode info, as well as the code the fputs
    the lower file when all eCryptfs users are done with it.

    This patch is needed to fix issues, when mounted on top of the NFSv3
    client, where the lower file is left silly renamed until the eCryptfs
    inode is destroyed.

    Signed-off-by: Tyler Hicks

    Tyler Hicks
     
  • Call dput on the dentries previously returned by dget_parent() in
    ecryptfs_rename(). This is needed for supported eCryptfs mounts on top
    of the NFSv3 client.

    Signed-off-by: Tyler Hicks

    Tyler Hicks
     
  • vfs_rmdir() already calls d_delete() on the lower dentry. That was being
    duplicated in ecryptfs_rmdir() and caused a NULL pointer dereference
    when NFSv3 was the lower filesystem.

    Signed-off-by: Tyler Hicks

    Tyler Hicks
     

25 Apr, 2011

2 commits


24 Apr, 2011

4 commits

  • * dcache-cleanup:
    vfs: get rid of insane dentry hashing rules

    Linus Torvalds
     
  • * 'for-linus' of git://git.infradead.org/ubifs-2.6:
    UBIFS: fix master node recovery
    UBIFS: fix false assertion warning in case of I/O failures
    UBIFS: fix false space checking failure

    Linus Torvalds
     
  • The dentry hashing rules have been really quite complicated for a long
    while, in odd ways. That made functions like __d_drop() very fragile
    and non-obvious.

    In particular, whether a dentry was hashed or not was indicated with an
    explicit DCACHE_UNHASHED bit. That's despite the fact that the hash
    abstraction that the dentries use actually have a 'is this entry hashed
    or not' model (which is a simple test of the 'pprev' pointer).

    The reason that was done is because we used the normal 'is this entry
    unhashed' model to mark whether the dentry had _ever_ been hashed in the
    dentry hash tables, and that logic goes back many years (commit
    b3423415fbc2: "dcache: avoid RCU for never-hashed dentries").

    That, in turn, meant that __d_drop had totally different unhashing logic
    for the dentry hash table case and for the anonymous dcache case,
    because in order to use the "is this dentry hashed" logic as a flag for
    whether it had ever been on the RCU hash table, we had to unhash such a
    dentry differently so that we'd never think that it wasn't 'unhashed'
    and wouldn't be free'd correctly.

    That's just insane. It made the logic really hard to follow, when there
    were two different kinds of "unhashed" states, and one of them (the one
    that used "list_bl_unhashed()") really had nothing at all to do with
    being unhashed per se, but with a very subtle lifetime rule instead.

    So turn all of it around, and make it logical.

    Instead of having a DENTRY_UNHASHED bit in d_flags to indicate whether
    the dentry is on the hash chains or not, use the hash chain unhashed
    logic for that. Suddenly "d_unhashed()" just uses "list_bl_unhashed()",
    and everything makes sense.

    And for the lifetime rule, just use an explicit DENTRY_RCUACCEES bit.
    If we ever insert the dentry into the dentry hash table so that it is
    visible to RCU lookup, we mark it DENTRY_RCUACCESS to show that it now
    needs the RCU lifetime rules. Now suddently that test at dentry free
    time makes sense too.

    And because unhashing now is sane and doesn't depend on where the dentry
    got unhashed from (because the dentry hash chain details doesn't have
    some subtle side effects), we can re-unify the __d_drop() logic and use
    common code for the unhashing.

    Also fix one more open-coded hash chain bit_spin_lock() that I missed in
    the previous chain locking cleanup commit.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • It's a useless abstraction for 'hlist_bl_head', and it doesn't actually
    help anything - quite the reverse. All the users end up having to know
    about the hlist_bl_head details anyway, using 'struct hlist_bl_node *'
    etc. So it just makes the code look confusing.

    And the cost of it is extra '&b->head' syntactic noise, but more
    importantly it spuriously makes the hash table dentry list look
    different from the per-superblock DCACHE_DISCONNECTED dentry list.

    As a result, the code ended up using ad-hoc locking for one case and
    special helper functions for what is really another totally identical
    case in the very same function.

    Make it all look and work the same.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

22 Apr, 2011

2 commits