04 Mar, 2013

3 commits

  • Pull more VFS bits from Al Viro:
    "Unfortunately, it looks like xattr series will have to wait until the
    next cycle ;-/

    This pile contains 9p cleanups and fixes (races in v9fs_fid_add()
    etc), fixup for nommu breakage in shmem.c, several cleanups and a bit
    more file_inode() work"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    constify path_get/path_put and fs_struct.c stuff
    fix nommu breakage in shmem.c
    cache the value of file_inode() in struct file
    9p: if v9fs_fid_lookup() gets to asking server, it'd better have hashed dentry
    9p: make sure ->lookup() adds fid to the right dentry
    9p: untangle ->lookup() a bit
    9p: double iput() in ->lookup() if d_materialise_unique() fails
    9p: v9fs_fid_add() can't fail now
    v9fs: get rid of v9fs_dentry
    9p: turn fid->dlist into hlist
    9p: don't bother with private lock in ->d_fsdata; dentry->d_lock will do just fine
    more file_inode() open-coded instances
    selinux: opened file can't have NULL or negative ->f_path.dentry

    (In the meantime, the hlist traversal macros have changed, so this
    required a semantic conflict fixup for the newly hlistified fid->dlist)

    Linus Torvalds
     
  • Pull btrfs fixup from Chris Mason:
    "Geert and James both sent this one in, sorry guys"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    btrfs/raid56: Add missing #include

    Linus Torvalds
     
  • Pull new ImgTec Meta architecture from James Hogan:
    "This adds core architecture support for Imagination's Meta processor
    cores, followed by some later miscellaneous arch/metag cleanups and
    fixes which I kept separate to ease review:

    - Support for basic Meta 1 (ATP) and Meta 2 (HTP) core architecture
    - A few fixes all over, particularly for symbol prefixes
    - A few privilege protection fixes
    - Several cleanups (setup.c includes, split out a lot of
    metag_ksyms.c)
    - Fix some missing exports
    - Convert hugetlb to use vm_unmapped_area()
    - Copy device tree to non-init memory
    - Provide dma_get_sgtable()"

    * tag 'metag-v3.9-rc1-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag: (61 commits)
    metag: Provide dma_get_sgtable()
    metag: prom.h: remove declaration of metag_dt_memblock_reserve()
    metag: copy devicetree to non-init memory
    metag: cleanup metag_ksyms.c includes
    metag: move mm/init.c exports out of metag_ksyms.c
    metag: move usercopy.c exports out of metag_ksyms.c
    metag: move setup.c exports out of metag_ksyms.c
    metag: move kick.c exports out of metag_ksyms.c
    metag: move traps.c exports out of metag_ksyms.c
    metag: move irq enable out of irqflags.h on SMP
    genksyms: fix metag symbol prefix on crc symbols
    metag: hugetlb: convert to vm_unmapped_area()
    metag: export clear_page and copy_page
    metag: export metag_code_cache_flush_all
    metag: protect more non-MMU memory regions
    metag: make TXPRIVEXT bits explicit
    metag: kernel/setup.c: sort includes
    perf: Enable building perf tools for Meta
    metag: add boot time LNKGET/LNKSET check
    metag: add __init to metag_cache_probe()
    ...

    Linus Torvalds
     

03 Mar, 2013

12 commits

  • tilegx_defconfig:

    fs/btrfs/raid56.c: In function 'btrfs_alloc_stripe_hash_table':
    fs/btrfs/raid56.c:206:3: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration]
    fs/btrfs/raid56.c:206:9: warning: assignment makes pointer from integer without a cast [enabled by default]
    fs/btrfs/raid56.c:226:4: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Chris Mason

    Geert Uytterhoeven
     
  • Pull ext4 bug fixes from Ted Ts'o:
    "Various bug fixes for ext4. The most important is a fix for the new
    extent cache's slab shrinker which can cause significant, user-visible
    pauses when the system is under memory pressure."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    ext4: enable quotas before orphan cleanup
    ext4: don't allow quota mount options when quota feature enabled
    ext4: fix a warning from sparse check for ext4_dir_llseek
    ext4: convert number of blocks to clusters properly
    ext4: fix possible memory leak in ext4_remount()
    jbd2: fix ERR_PTR dereference in jbd2__journal_start
    ext4: use percpu counter for extent cache count
    ext4: optimize ext4_es_shrink()

    Linus Torvalds
     
  • Pull NFS client bugfixes from Trond Myklebust:
    "We've just concluded another Connectathon interoperability testing
    week, and so here are the fixes for the bugs that were discovered:

    - Don't allow NFS silly-renamed files to be deleted
    - Don't start the retransmission timer when out of socket space
    - Fix a couple of pnfs-related Oopses.
    - Fix one more NFSv4 state recovery deadlock
    - Don't loop forever when LAYOUTGET returns NFS4ERR_LAYOUTTRYLATER"

    * tag 'nfs-for-3.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    SUNRPC: One line comment fix
    NFSv4.1: LAYOUTGET EDELAY loops timeout to the MDS
    SUNRPC: add call to get configured timeout
    PNFS: set the default DS timeout to 60 seconds
    NFSv4: Fix another open/open_recovery deadlock
    nfs: don't allow nfs_find_actor to match inodes of the wrong type
    NFSv4.1: Hold reference to layout hdr in layoutget
    pnfs: fix resend_to_mds for directio
    SUNRPC: Don't start the retransmission timer when out of socket space
    NFS: Don't allow NFS silly-renamed files to be deleted, no signal

    Linus Torvalds
     
  • Pull btrfs update from Chris Mason:
    "The biggest feature in the pull is the new (and still experimental)
    raid56 code that David Woodhouse started long ago. I'm still working
    on the parity logging setup that will avoid inconsistent parity after
    a crash, so this is only for testing right now. But, I'd really like
    to get it out to a broader audience to hammer out any performance
    issues or other problems.

    scrub does not yet correct errors on raid5/6 either.

    Josef has another pass at fsync performance. The big change here is
    to combine waiting for metadata with waiting for data, which is a big
    latency win. It is also step one toward using atomics from the
    hardware during a commit.

    Mark Fasheh has a new way to use btrfs send/receive to send only the
    metadata changes. SUSE is using this to make snapper more efficient
    at finding changes between snapshosts.

    Snapshot-aware defrag is also included.

    Otherwise we have a large number of fixes and cleanups. Eric Sandeen
    wins the award for removing the most lines, and I'm hoping we steal
    this idea from XFS over and over again."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (118 commits)
    btrfs: fixup/remove module.h usage as required
    Btrfs: delete inline extents when we find them during logging
    btrfs: try harder to allocate raid56 stripe cache
    Btrfs: cleanup to make the function btrfs_delalloc_reserve_metadata more logic
    Btrfs: don't call btrfs_qgroup_free if just btrfs_qgroup_reserve fails
    Btrfs: remove reduplicate check about root in the function btrfs_clean_quota_tree
    Btrfs: return ENOMEM rather than use BUG_ON when btrfs_alloc_path fails
    Btrfs: fix missing deleted items in btrfs_clean_quota_tree
    btrfs: use only inline_pages from extent buffer
    Btrfs: fix wrong reserved space when deleting a snapshot/subvolume
    Btrfs: fix wrong reserved space in qgroup during snap/subv creation
    Btrfs: remove unnecessary dget_parent/dput when creating the pending snapshot
    btrfs: remove a printk from scan_one_device
    Btrfs: fix NULL pointer after aborting a transaction
    Btrfs: fix memory leak of log roots
    Btrfs: copy everything if we've created an inline extent
    btrfs: cleanup for open-coded alignment
    Btrfs: do not change inode flags in rename
    Btrfs: use reserved space for creating a snapshot
    clear chunk_alloc flag on retryable failure
    ...

    Linus Torvalds
     
  • When using quota feature we need to enable quotas before orphan cleanup
    so that changes happening during it are properly reflected in quota
    accounting.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • So far we silently ignored when quota mount options were set while quota
    feature was enabled. But this can create confusion in userspace when
    mount options are set but silently ignored and also creates opportunities
    for bugs when we don't properly test all quota types. Actually
    ext4_mark_dquot_dirty() forgets to test for quota feature so it was
    dependent on journaled quota options being set. OTOH ext4_orphan_cleanup()
    tries to enable journaled quota when quota options are specified which is
    wrong when quota feature is enabled.

    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • ext4_dir_llseek is only used as a callback function, and no one calls
    it directly. So make it as a static function in order to remove a
    warning message from sparse check.

    Signed-off-by: Zheng Liu
    Signed-off-by: "Theodore Ts'o"

    Zheng Liu
     
  • We're using macro EXT4_B2C() to convert number of blocks to number of
    clusters for bigalloc file systems. However, we should be using
    EXT4_NUM_B2C().

    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"
    Cc: stable@vger.kernel.org

    Lukas Czerner
     
  • 'orig_data' is malloced in ext4_remount() and should be freed
    before leaving from the error handling cases, otherwise it will
    cause memory leak.

    Signed-off-by: Wei Yongjun
    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Lukas Czerner
    Cc: stable@vger.kernel.org

    Wei Yongjun
     
  • If start_this_handle() failed handle will be initialized
    to ERR_PTR() and can not be dereferenced.

    paging request at fffffffffffffff6
    IP: [] jbd2__journal_start+0x18f/0x290
    PGD 200e067 PUD 200f067 PMD 0
    Oops: 0000 [#1] SMP
    Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
    CPU 0 journal commit I/O error

    Pid: 2694, comm: fio Not tainted 3.8.0-rc3+ #79 /DQ67SW
    RIP: 0010:[] [] jbd2__journal_start+0x18f/0x290
    RSP: 0018:ffff880233b8ba58 EFLAGS: 00010292
    RAX: 00000000ffffffe2 RBX: ffffffffffffffe2 RCX: 0000000000000006
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff82128f48
    RBP: ffff880233b8ba98 R08: 0000000000000000 R09: ffff88021440a6e0

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: "Theodore Ts'o"

    Dmitry Monakhov
     
  • The commit "binfmt_elf: cleanups"
    (f670d0ecda73b7438eec9ed108680bc5f5362ad8) removed an ifndef elf_map but
    this breaks compilation for metag which does define elf_map.

    This adds the ifndef back in as it was before, but does not affect the
    other cleanups made by that patch.

    Signed-off-by: James Hogan
    Cc: Alexander Viro
    Cc: linux-fsdevel@vger.kernel.org
    Acked-by: Mikael Pettersson

    James Hogan
     
  • Pull signal/compat fixes from Al Viro:
    "Fixes for several regressions introduced in the last signal.git pile,
    along with fixing bugs in truncate and ftruncate compat (on just about
    anything biarch at least one of those two had been done wrong)."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    compat: restore timerfd settime and gettime compat syscalls
    [regression] braino in "sparc: convert to ksignal"
    fix compat truncate/ftruncate
    switch lseek to COMPAT_SYSCALL_DEFINE
    lseek() and truncate() on sparc really need sign extension

    Linus Torvalds
     

02 Mar, 2013

9 commits

  • Use a percpu counter rather than atomic types for shrinker accounting.
    There's no need for ultimate accuracy in the shrinker, so this
    should come a little more cheaply. The percpu struct is somewhat
    large, but there was a big gap before the cache-aligned
    s_es_lru_lock anyway, and it fits nicely in there.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • Both compat syscalls got lost with 9d94b9e2 "switch timerfd compat syscalls
    to COMPAT_SYSCALL_DEFINE" because of a typo:
    COMPAT instead of CONFIG_COMPAT.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Al Viro

    Heiko Carstens
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Note that this thing does *not* contribute to inode refcount;
    it's pinned down by dentry.

    Signed-off-by: Al Viro

    Al Viro
     
  • Pull CIFS fixes from Steve French:
    "Four cifs fixes (including for kernel bug #53221 and samba bug #9519)"

    * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
    cifs: bugfix for unreclaimed writeback pages in cifs_writev_requeue()
    cifs: set MAY_SIGN when sec=krb5
    POSIX extensions disabled on client due to illegal O_EXCL flag sent to Samba
    cifs: ensure that cifs_get_root() only traverses directories

    Linus Torvalds
     
  • smatch analysis:

    fs/autofs4/waitq.c:46 autofs4_catatonic_mode() info: redundant null check on wq->name.name calling kfree()

    Signed-off-by: Tim Gardner
    Signed-off-by: Ian Kent
    Cc: autofs@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Tim Gardner
     
  • …t lock contexts for basic block

    Sparse complains:

    fs/autofs4/root.c:409:9: sparse: context imbalance in 'autofs4_d_automount' - different lock contexts for basic block

    This was introduced by commit f55fb0c24386 ("autofs4 - dont clear
    DCACHE_NEED_AUTOMOUNT on rootless mount")

    The function autofs4_d_automount can be left with the (&sbi->fs_lock)
    held if sbi->version <= 4 and simple_empty(dentry) == false so the
    warning seems valid.

    --> Add an spin_unlock in this case before we jump to done

    Unfortunately compile tested only.

    Reported-by: Fengguang Wu <fengguang.wu@intel.com>
    Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
    Acked-by: Ian Kent <raven@themaw.net>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Peter Huewe
     
  • We want to avoid module.h where posible, since it in turn includes
    nearly all of header space. This means removing it where it is not
    required, and using export.h where we are only exporting symbols via
    EXPORT_SYMBOL and friends.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: Chris Mason

    Paul Gortmaker
     
  • Apparently when we do inline extents we allow the data to overlap the last chunk
    of the btrfs_file_extent_item, which means that we can possibly have a
    btrfs_file_extent_item that isn't actually as large as a btrfs_file_extent_item.
    This messes with us when we try to overwrite the extent when logging new extents
    since we expect for it to be the right size. To fix this just delete the item
    and try to do the insert again which will give us the proper sized
    btrfs_file_extent_item. This fixes a panic where map_private_extent_buffer
    would blow up because we're trying to write past the end of the leaf. Thanks,

    Cc: stable@vger.kernel.org
    Signed-off-by: Josef Bacik

    Josef Bacik
     

01 Mar, 2013

16 commits

  • The stripe hash table is large, starting with allocation order 4 and can go as
    high as order 7 in case lock debugging is turned on and structure padding
    happens.

    Observed mount failure:

    mount: page allocation failure: order:7, mode:0x200050
    Pid: 8234, comm: mount Tainted: G W 3.8.0-default+ #267
    Call Trace:
    [] warn_alloc_failed+0xf3/0x140
    [] ? __alloc_pages_direct_compact+0x92/0x250
    [] __alloc_pages_nodemask+0x733/0x9d0
    [] ? cache_alloc_refill+0x3f8/0x840
    [] cache_alloc_refill+0x43c/0x840
    [] ? is_kernel_percpu_address+0x4b/0x90
    [] ? btrfs_alloc_stripe_hash_table+0x5c/0x130 [btrfs]
    [] kmem_cache_alloc_trace+0x247/0x270
    [] btrfs_alloc_stripe_hash_table+0x5c/0x130 [btrfs]
    [] open_ctree+0xb2f/0x1f90 [btrfs]
    [] ? string+0x49/0xe0
    [] ? vsnprintf+0x443/0x5d0
    [] btrfs_mount+0x526/0x600 [btrfs]
    [] ? cache_alloc_debugcheck_after+0x4c/0x200
    [] mount_fs+0x20/0xe0
    [] vfs_kern_mount+0x76/0x120
    [] do_mount+0x386/0x980
    [] ? strndup_user+0x5b/0x80
    [] sys_mount+0x90/0xe0
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: David Sterba
    Signed-off-by: Josef Bacik

    David Sterba
     
  • The original code is a little confusing and not clear, The right
    way to deal with the kernel code like this:
    [...]
    if (ret)
    goto out;
    [...]

    So i move the common clean_up code to the place labeled with
    out_fail, this will be easier to maintain.

    Signed-off-by: Wang Shilong
    Signed-off-by: Josef Bacik

    Wang Shilong
     
  • commit eb6b88d92c6df083dd09a8c471011e3788dfd7c6 leads into another bug.
    If it is just because qgroup_reserve fails, the function btrfs_qgroup_free
    should not be called, otherwise, it will cause the wrong quota accounting.

    Signed-off-by: Wang Shilong
    Signed-off-by: Josef Bacik

    Wang Shilong
     
  • The check work has been done just before the function btrfs_clean_quota_tree
    is called, it is not necessary to check it again, remove it.

    Signed-off-by: Wang Shilong
    Signed-off-by: Josef Bacik

    Wang Shilong
     
  • Return ENOMEM rather trigger BUG_ON, fix it.

    Signed-off-by: Wang Shilong
    Reviewed-by: Miao Xie
    Reviewed-by: Zach Brown
    Signed-off-by: Josef Bacik

    Wang Shilong
     
  • Steps to reproduce:

    i=0
    ncases=100

    mkfs.btrfs
    mount
    btrfs quota enable
    btrfs qgroup create 2/1
    while [ $i -le $ncases ]
    do
    btrfs qgroup create 1/$i
    btrfs qgroup assign 1/$i 2/1
    i=$(($i+1))
    done

    btrfs quota disable
    umount
    btrfsck

    You can also use the commands:
    btrfs-debug-tree | grep QGROUP

    You will find there are still items existed.The reasons why this happens
    is because the original code just checks slots[0]==0 and returns.
    We try to fix it by deleting the leaf one by one.

    Signed-off-by: Wang Shilong
    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Wang Shilong
     
  • When the system is under memory pressure, ext4_es_srhink() will get
    called very often. So optimize returning the number of items in the
    file system's extent status cache by keeping a per-filesystem count,
    instead of calculating it each time by scanning all of the inodes in
    the extent status cache.

    Also rename the slab used for the extent status cache to be
    "ext4_extent_status" so it's obviousl the slab in question is created
    by ext4.

    Signed-off-by: "Theodore Ts'o"
    Cc: Zheng Liu

    Theodore Ts'o
     
  • Pull nfsd changes from J Bruce Fields:
    "Miscellaneous bugfixes, plus:

    - An overhaul of the DRC cache by Jeff Layton. The main effect is
    just to make it larger. This decreases the chances of intermittent
    errors especially in the UDP case. But we'll need to watch for any
    reports of performance regressions.

    - Containerized nfsd: with some limitations, we now support
    per-container nfs-service, thanks to extensive work from Stanislav
    Kinsbursky over the last year."

    Some notes about conflicts, since there were *two* non-data semantic
    conflicts here:

    - idr_remove_all() had been added by a memory leak fix, but has since
    become deprecated since idr_destroy() does it for us now.

    - xs_local_connect() had been added by this branch to make AF_LOCAL
    connections be synchronous, but in the meantime Trond had changed the
    calling convention in order to avoid a RCU dereference.

    There were a couple of more obvious actual source-level conflicts due to
    the hlist traversal changes and one just due to code changes next to
    each other, but those were trivial.

    * 'for-3.9' of git://linux-nfs.org/~bfields/linux: (49 commits)
    SUNRPC: make AF_LOCAL connect synchronous
    nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum
    svcrpc: fix rpc server shutdown races
    svcrpc: make svc_age_temp_xprts enqueue under sv_lock
    lockd: nlmclnt_reclaim(): avoid stack overflow
    nfsd: enable NFSv4 state in containers
    nfsd: disable usermode helper client tracker in container
    nfsd: use proper net while reading "exports" file
    nfsd: containerize NFSd filesystem
    nfsd: fix comments on nfsd_cache_lookup
    SUNRPC: move cache_detail->cache_request callback call to cache_read()
    SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function
    SUNRPC: rework cache upcall logic
    SUNRPC: introduce cache_detail->cache_request callback
    NFS: simplify and clean cache library
    NFS: use SUNRPC cache creation and destruction helper for DNS cache
    nfsd4: free_stid can be static
    nfsd: keep a checksum of the first 256 bytes of request
    sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer
    sunrpc: fix comment in struct xdr_buf definition
    ...

    Linus Torvalds
     
  • Pull Ceph updates from Sage Weil:
    "A few groups of patches here. Alex has been hard at work improving
    the RBD code, layout groundwork for understanding the new formats and
    doing layering. Most of the infrastructure is now in place for the
    final bits that will come with the next window.

    There are a few changes to the data layout. Jim Schutt's patch fixes
    some non-ideal CRUSH behavior, and a set of patches from me updates
    the client to speak a newer version of the protocol and implement an
    improved hashing strategy across storage nodes (when the server side
    supports it too).

    A pair of patches from Sam Lang fix the atomicity of open+create
    operations. Several patches from Yan, Zheng fix various mds/client
    issues that turned up during multi-mds torture tests.

    A final set of patches expose file layouts via virtual xattrs, and
    allow the policies to be set on directories via xattrs as well
    (avoiding the awkward ioctl interface and providing a consistent
    interface for both kernel mount and ceph-fuse users)."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (143 commits)
    libceph: add support for HASHPSPOOL pool flag
    libceph: update osd request/reply encoding
    libceph: calculate placement based on the internal data types
    ceph: update support for PGID64, PGPOOL3, OSDENC protocol features
    ceph: update "ceph_features.h"
    libceph: decode into cpu-native ceph_pg type
    libceph: rename ceph_pg -> ceph_pg_v1
    rbd: pass length, not op for osd completions
    rbd: move rbd_osd_trivial_callback()
    libceph: use a do..while loop in con_work()
    libceph: use a flag to indicate a fault has occurred
    libceph: separate non-locked fault handling
    libceph: encapsulate connection backoff
    libceph: eliminate sparse warnings
    ceph: eliminate sparse warnings in fs code
    rbd: eliminate sparse warnings
    libceph: define connection flag helpers
    rbd: normalize dout() calls
    rbd: barriers are hard
    rbd: ignore zero-length requests
    ...

    Linus Torvalds
     
  • The client will currently try LAYOUTGETs forever if a server is returning
    NFS4ERR_LAYOUTTRYLATER or NFS4ERR_RECALLCONFLICT - even if the client no
    longer needs the layout (ie process killed, unmounted).

    This patch uses the DS timeout value (module parameter 'dataserver_timeo'
    via rpc layer) to set an upper limit of how long the client tries LATOUTGETs
    in this situation. Once the timeout is reached, IO is redirected to the MDS.

    This also changes how the client checks if a layout is on the clp list
    to avoid a double list_add.

    Signed-off-by: Weston Andros Adamson
    Signed-off-by: Trond Myklebust

    Weston Andros Adamson
     
  • The client should have 60 second default timeouts for DS operations, not 6
    seconds.

    NFS4_DEF_DS_TIMEO is used as "timeout in tenths of a second" in
    nfs_init_timeout_values (and is not used anywhere else).
    This matches up with the description of the module param dataserver_timeo.

    Signed-off-by: Weston Andros Adamson
    Signed-off-by: Trond Myklebust

    Weston Andros Adamson
     
  • If we don't release the open seqid before we wait for state recovery,
    then we may end up deadlocking the state recovery thread.
    This patch addresses a new deadlock that was introduced by
    commit c21443c2c792cd9b463646d982b0fe48aa6feb0f (NFSv4: Fix a reboot
    recovery race when opening a file)

    Reported-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Pull writeback fixes from Wu Fengguang:
    "Two writeback fixes

    - fix negative (setpoint - dirty) in 32bit archs

    - use down_read_trylock() in writeback_inodes_sb(_nr)_if_idle()"

    * tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    Negative (setpoint-dirty) in bdi_position_ratio()
    vfs: re-implement writeback_inodes_sb(_nr)_if_idle() and rename them

    Linus Torvalds
     
  • Pull block IO core bits from Jens Axboe:
    "Below are the core block IO bits for 3.9. It was delayed a few days
    since my workstation kept crashing every 2-8h after pulling it into
    current -git, but turns out it is a bug in the new pstate code (divide
    by zero, will report separately). In any case, it contains:

    - The big cfq/blkcg update from Tejun and and Vivek.

    - Additional block and writeback tracepoints from Tejun.

    - Improvement of the should sort (based on queues) logic in the plug
    flushing.

    - _io() variants of the wait_for_completion() interface, using
    io_schedule() instead of schedule() to contribute to io wait
    properly.

    - Various little fixes.

    You'll get two trivial merge conflicts, which should be easy enough to
    fix up"

    Fix up the trivial conflicts due to hlist traversal cleanups (commit
    b67bfe0d42ca: "hlist: drop the node parameter from iterators").

    * 'for-3.9/core' of git://git.kernel.dk/linux-block: (39 commits)
    block: remove redundant check to bd_openers()
    block: use i_size_write() in bd_set_size()
    cfq: fix lock imbalance with failed allocations
    drivers/block/swim3.c: fix null pointer dereference
    block: don't select PERCPU_RWSEM
    block: account iowait time when waiting for completion of IO request
    sched: add wait_for_completion_io[_timeout]
    writeback: add more tracepoints
    block: add block_{touch|dirty}_buffer tracepoint
    buffer: make touch_buffer() an exported function
    block: add @req to bio_{front|back}_merge tracepoints
    block: add missing block_bio_complete() tracepoint
    block: Remove should_sort judgement when flush blk_plug
    block,elevator: use new hashtable implementation
    cfq-iosched: add hierarchical cfq_group statistics
    cfq-iosched: collect stats from dead cfqgs
    cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats()
    blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock
    block: RCU free request_queue
    blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge()
    ...

    Linus Torvalds
     
  • The nodesize is capped at 64k and there are enough pages preallocated in
    extent_buffer::inline_pages. The fallback to kmalloc never happened
    because even on the smallest page size considered (4k) inline_pages
    covered the needs.

    Signed-off-by: David Sterba
    Signed-off-by: Josef Bacik

    David Sterba
     
  • When deleting a snapshot/subvolume, we need remove root ref/backref,
    dir entries and update the dir inode, so we must reserve free space
    for those operations.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie