23 Mar, 2010

1 commit


20 Mar, 2010

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (205 commits)
    ceph: update for write_inode API change
    ceph: reset osd after relevant messages timed out
    ceph: fix flush_dirty_caps race with caps migration
    ceph: include migrating caps in issued set
    ceph: fix osdmap decoding when pools include (removed) snaps
    ceph: return EBADF if waiting for caps on closed file
    ceph: set osd request message front length correctly
    ceph: reset front len on return to msgpool; BUG on mismatched front iov
    ceph: fix snaptrace decoding on cap migration between mds
    ceph: use single osd op reply msg
    ceph: reset bits on connection close
    ceph: remove bogus mds forward warning
    ceph: remove fragile __map_osds optimization
    ceph: fix connection fault STANDBY check
    ceph: invalidate_authorizer without con->mutex held
    ceph: don't clobber write return value when using O_SYNC
    ceph: fix client_request_forward decoding
    ceph: drop messages on unregistered mds sessions; cleanup
    ceph: fix comments, locking in destroy_inode
    ceph: move dereference after NULL test
    ...

    Fix trivial conflicts in Documentation/ioctl/ioctl-number.txt

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
    cifs: trivial white space
    [CIFS] checkpatch cleanup
    cifs: add cifs_revalidate_file
    cifs: add a CIFSSMBUnixQFileInfo function
    cifs: add a CIFSSMBQFileInfo function
    cifs: overhaul cifs_revalidate and rename to cifs_revalidate_dentry

    Linus Torvalds
     

19 Mar, 2010

7 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (30 commits)
    Btrfs: fix the inode ref searches done by btrfs_search_path_in_tree
    Btrfs: allow treeid==0 in the inode lookup ioctl
    Btrfs: return keys for large items to the search ioctl
    Btrfs: fix key checks and advance in the search ioctl
    Btrfs: buffer results in the space_info ioctl
    Btrfs: use __u64 types in ioctl.h
    Btrfs: fix search_ioctl key advance
    Btrfs: fix gfp flags masking in the compression code
    Btrfs: don't look at bio flags after submit_bio
    btrfs: using btrfs_stack_device_id() get devid
    btrfs: use memparse
    Btrfs: add a "df" ioctl for btrfs
    Btrfs: cache the extent state everywhere we possibly can V2
    Btrfs: cache ordered extent when completing io
    Btrfs: cache extent state in find_delalloc_range
    Btrfs: change the ordered tree to use a spinlock instead of a mutex
    Btrfs: finish read pages in the order they are submitted
    btrfs: fix btrfs_mkdir goto for no free objectids
    Btrfs: flush data on snapshot creation
    Btrfs: make df be a little bit more understandable
    ...

    Linus Torvalds
     
  • * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
    NFS: ensure bdi_unregister is called on mount failure.
    NFS: Avoid a deadlock in nfs_release_page
    NFSv4: Don't ignore the NFS_INO_REVAL_FORCED flag in nfs_revalidate_inode()
    nfs4: Make the v4 callback service hidden
    nfs: fix unlikely memory leak
    rpc client can not deal with ENOSOCK, so translate it into ENOCONN

    Linus Torvalds
     
  • * 'for-linus' of git://oss.sgi.com/xfs/xfs:
    xfs: don't warn about page discards on shutdown
    xfs: use scalable vmap API
    xfs: remove old vmap cache

    Linus Torvalds
     
  • This is used by the inode lookup ioctl to follow all the backrefs up
    to the subvol root. But the search being done would sometimes land one
    past the last item in the leaf instead of finding the backref.

    This changes the search to look for the highest possible backref and hop
    back one item. It also fixes a leaked path on failure to find the root.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • When a root id of 0 is sent to the inode lookup ioctl, it will
    use the root of the file we're ioctling and pass the root id
    back to userland along with the results.

    This allows userland to do searches based on that root later on.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The search ioctl was skipping large items entirely (ones that are too
    big for the results buffer). This changes things to at least copy
    the item header so that we can send information about the item back to
    userland.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The search ioctl was working well for finding tree roots, but using it for
    generic searches requires a few changes to how the keys are advanced.
    This treats the search control min fields for objectid, type and offset
    more like a key, where we drop the offset to zero once we bump the type,
    etc.

    The downside of this is that we are changing the min_type and min_offset
    fields during the search, and so the ioctl caller needs extra checks to make sure
    the keys in the result are the ones it wanted.

    This also changes key_in_sk to use btrfs_comp_cpu_keys, just to make
    things more readable.

    Signed-off-by: Chris Mason

    Chris Mason
     

18 Mar, 2010

2 commits

  • Use bitmap_weight() instead of doing hweight32() for each u32 element in
    the page.

    Signed-off-by: Akinobu Mita
    Cc: Anton Altaparmakov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • jffs2 uses rb_node = NULL; to zero rb_root.

    The problem with this is that 17d9ddc72fb8bba0d4f678 ("rbtree: Add
    support for augmented rbtrees") in the linux-next tree adds a new field
    to that struct which needs to be NULL as well. This patch uses RB_ROOT
    as the intializer so all of the relevant fields will be NULL'd.

    Signed-off-by: Venkatesh Pallipadi
    Cc: Eric Paris
    Acked-by: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Venkatesh Pallipadi
     

17 Mar, 2010

6 commits

  • If we are doing a forced shutdown, we can get lots of noise about
    delalloc pages being discarded. This is happens by design during a
    forced shutdown, so don't spam the logs with these messages.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • Re-apply a commit that had been reverted due to regressions
    that have since been fixed.

    From 95f8e302c04c0b0c6de35ab399a5551605eeb006 Mon Sep 17 00:00:00 2001
    From: Nick Piggin
    Date: Tue, 6 Jan 2009 14:43:09 +1100

    Implement XFS's large buffer support with the new vmap APIs. See the vmap
    rewrite (db64fe02) for some numbers. The biggest improvement that comes from
    using the new APIs is avoiding the global KVA allocation lock on every call.

    Signed-off-by: Nick Piggin
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Only modifications here were a minor reformat, plus making the patch
    apply given the new use of xfs_buf_is_vmapped().

    Modified-by: Alex Elder
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Alex Elder
     
  • Re-apply a commit that had been reverted due to regressions
    that have since been fixed.

    Original commit: d2859751cd0bf586941ffa7308635a293f943c17
    Author: Nick Piggin
    Date: Tue, 6 Jan 2009 14:40:44 +1100

    XFS's vmap batching simply defers a number (up to 64) of vunmaps,
    and keeps track of them in a list. To purge the batch, it just goes
    through the list and calls vunamp on each one. This is pretty poor:
    a global TLB flush is generally still performed on each vunmap, with
    the most expensive parts of the operation being the broadcast IPIs
    and locking involved in the SMP callouts, and the locking involved
    in the vmap management -- none of these are avoided by just batching
    up the calls. I'm actually surprised it ever made much difference.
    (Now that the lazy vmap allocator is upstream, this description is
    not quite right, but the vunmap batching still doesn't seem to do
    much).

    Rip all this logic out of XFS completely. I will improve vmap
    performance and scalability directly in subsequent patch.

    Signed-off-by: Nick Piggin
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    The only change I made was to use the "new" xfs_buf_is_vmapped()
    function in a place it had been open-coded in the original.

    Modified-by: Alex Elder
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Alex Elder
     
  • The space_info ioctl was using copy_to_user inside rcu_read_lock. This
    commit changes things to copy into a buffer first and then dump the
    result down to userland.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Signed-off-by: Sage Weil
    Signed-off-by: Chris Mason

    Sage Weil
     
  • key->type is u8, not u64.

    fs/btrfs/ioctl.c: In function 'copy_to_sk':
    fs/btrfs/ioctl.c:1024: warning: comparison is always true due to limited range of data type

    Signed-off-by: Sage Weil
    Signed-off-by: Chris Mason

    Sage Weil
     

16 Mar, 2010

1 commit

  • bdi_unregister is called by nfs_put_super which is only called by
    generic_shutdown_super if ->s_root is not NULL. So if we error out
    in a circumstance where we called nfs_bdi_register (i.e. server !=
    NULL) but have not set s_root, then we need to call bdi_unregister
    explicitly in nfs_get_sb and various other *_get_sb() functions.

    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust

    NeilBrown
     

15 Mar, 2010

21 commits

  • I fixed the indent level.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Steve French

    Dan Carpenter
     
  • GFP_FS must be masked out, NOFS can't be or'd in.

    Signed-off-by: Chris Mason

    Nick Piggin
     
  • After callling submit_bio, the bio can be freed at any time. The
    btrfs submission thread helper was checking the bio flags too late,
    which might not give the correct answer.

    When CONFIG_DEBUG_PAGE_ALLOC is turned on, it can lead to oopsen.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • We can use btrfs_stack_device_id() to get dev_item->devid

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Chris Mason

    Xiao Guangrong
     
  • Use memparse() instead of its own private implementation.

    Signed-off-by: Akinobu Mita
    Cc: Chris Mason
    Cc: linux-btrfs@vger.kernel.org
    Signed-off-by: Chris Mason

    Akinobu Mita
     
  • df is a very loaded question in btrfs. This gives us a way to get the per-space
    usage information so we can tell exactly what is in use where. This will help
    us figure out ENOSPC problems, and help users better understand where their disk
    space is going.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • This patch just goes through and fixes everybody that does

    lock_extent()
    blah
    unlock_extent()

    to use

    lock_extent_bits()
    blah
    unlock_extent_cached()

    and pass around a extent_state so we only have to do the searches once per
    function. This gives me about a 3 mb/s boots on my random write test. I have
    not converted some things, like the relocation and ioctl's, since they aren't
    heavily used and the relocation stuff is in the middle of being re-written. I
    also changed the clear_extent_bit() to only unset the cached state if we are
    clearing EXTENT_LOCKED and related stuff, so we can do things like this

    lock_extent_bits()
    clear delalloc bits
    unlock_extent_cached()

    without losing our cached state. I tested this thoroughly and turned on
    LEAK_DEBUG to make sure we weren't leaking extent states, everything worked out
    fine.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • When finishing io we run btrfs_dec_test_ordered_pending, and then immediately
    run btrfs_lookup_ordered_extent, but btrfs_dec_test_ordered_pending does that
    already, so we're searching twice when we don't have to. This patch lets us
    pass a btrfs_ordered_extent in to btrfs_dec_test_ordered_pending so if we do
    complete io on that ordered extent we can just use the one we found then instead
    of having to do another btrfs_lookup_ordered_extent. This made my fio job with
    the other patch go from 24 mb/s to 29 mb/s.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • This patch makes us cache the extent state we find in find_delalloc_range since
    we'll have to lock the extent later on in the function. This will keep us from
    re-searching for the rang when we try to lock the extent.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • The ordered tree used to need a mutex, but currently all we use it for is to
    protect the rb_tree, and a spin_lock is just fine for that. Using a spin_lock
    instead makes dbench run a little faster, 58 mb/s instead of 51 mb/s, and have
    less latency, 3445.138 ms instead of 3820.633 ms.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • The endio is done at reverse order of bio vectors.

    That means for a sequential read, the page first submitted will finish
    last in a bio. Considering we will do checksum (making cache hot) for
    every page, this does introduce delay (and chance to squeeze cache used
    soon) for pages submitted at the begining.

    I don't observe obvious performance difference with below patch at my
    simple test, but seems more natural to finish read in the order they are
    submitted.

    Signed-off-by: Shaohua Li
    Signed-off-by: Chris Mason

    Chris Mason
     
  • btrfs_mkdir() must jump to the place of ending transaction after
    btrfs_find_free_objectid() failed. Or this transaction can't end.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • Flush any delalloc extents when we create a snapshot, so that recently
    written file data is always included in the snapshot.

    A later commit will add the ability to snapshot without the flush, but
    most people expect flushing.

    Signed-off-by: Sage Weil
    Signed-off-by: Chris Mason

    Sage Weil
     
  • The way we report df usage is way confusing for everybody, including some other
    utilities (bacula for one). So this patch makes df a little bit more
    understandable. First we make used actually count the total amount of used
    space in all space info's. This will give us a real view of how much disk space
    is in use. Second, for blocks available, only count data space. This makes
    things like bacula work because it says 0 when you can no longer write anymore
    data to the disk. I think this is a nice compromise, since you will end up with
    something like the following

    [root@alpha ~]# df -h
    Filesystem Size Used Avail Use% Mounted on
    /dev/mapper/VolGroup-lv_root
    148G 30G 111G 21% /
    /dev/sda1 194M 116M 68M 64% /boot
    tmpfs 985M 12K 985M 1% /dev/shm
    /dev/mapper/VolGroup-LogVol02
    145G 140G 0 100% /mnt/btrfs-test

    Compare this with btrfsctl -i output

    [root@alpha btrfs-progs-unstable]# ./btrfsctl -i /mnt/btrfs-test/
    Metadata, DUP: total=4.62GB, used=2.46GB
    System, DUP: total=8.00MB, used=24.00KB
    Data: total=134.80GB, used=134.80GB
    Metadata: total=8.00MB, used=0.00
    System: total=4.00MB, used=0.00
    operation complete

    This way we show that there is no more data space to be used, but we have
    another 5GB of space left for metadata. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • When we scan devices in a multi-device filesystem, we memorize the original
    name. If the device gets a new name, later scans don't update the
    in-kernel structures related to it, and we're not able to mount the
    filesystem.

    This patch updates device name during scaning.

    Signed-off-by: TARUISI Hiroaki
    Signed-off-by: Chris Mason

    TARUISI Hiroaki
     
  • The btrfs defrag ioctl was limited to doing the entire file. This
    commit adds a new interface that can defrag a specific range inside
    the file.

    It can also force compression on the file, allowing you to selectively
    compress individual files after they were created, even when mount -o
    compress isn't turned on.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The btrfs defrag ioctl had some bugs around delalloc accounting, and it
    wasn't properly skipping pages that were not in the mapping.

    It wasn't properly clearing the page checked flag, which could make the
    writeback code ignore the page forever while pinning it as dirty.

    This commit fixes those problems and makes defrag a little smarter. It
    skips holes and it doesn't waste time defragging large extents. If a
    tiny extent comes before a very large extent, it will defrag both of
    them to make sure the tiny extent ends up next to something big.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The submit_bio helper thread can decide to loop back around to
    service more bios. This commit forces it to unplug first, which helps
    reduce the latency seen by submitters.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Since theres not a good way to make sure the user sees the original default root
    tree id, and not to mention it's 5 so is way different than any other volume,
    just make subvol=0 mount the original default root. This makes it a bit easier
    for users to handle in the long run. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • This patch needs to go along with my previous patch. This lets us set the
    default dir item's location to whatever root we want to use as our default
    mounting subvol. With this we don't have to use mount -o subvol=
    anymore to mount a different subvol, we can just set the new one and it will
    just magically work. I've done some moderate testing with this, mostly just
    switching the default mount around, mounting subvols and the default mount at
    the same time and such, everything seems to work. Thanks,

    Older kernels would generally be able to still mount the filesystem with the
    default subvolume set, but it would result in a different volume being mounted,
    which could be an even more unpleasant suprise for users. So if you set your
    default subvolume, you can't go back to older kernels. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • This work is in preperation for being able to set a different root as the
    default mounting root.

    There is currently a problem with how we mount subvolumes. We cannot currently
    mount a subvolume of a subvolume, you can only mount subvolumes/snapshots of the
    default subvolume. So say you take a snapshot of the default subvolume and call
    it snap1, and then take a snapshot of snap1 and call it snap2, so now you have

    /
    /snap1
    /snap1/snap2

    as your available volumes. Currently you can only mount / and /snap1,
    you cannot mount /snap1/snap2. To fix this problem instead of passing
    subvolid= you must pass in subvolid=, where is
    the tree id that gets spit out via the subvolume listing you get from
    the subvolume listing patches (btrfs filesystem list). This allows us
    to mount /, /snap1 and /snap1/snap2 as the root volume.

    In addition to the above, we also now read the default dir item in the
    tree root to get the root key that it points to. For now this just
    points at what has always been the default subvolme, but later on I plan
    to change it to point at whatever root you want to be the new default
    root, so you can just set the default mount and not have to mount with
    -o subvolid=. I tested this out with the above scenario and it
    worked perfectly. Thanks,

    mount -o subvol operates inside the selected subvolid. For example:

    mount -o subvol=snap1,subvolid=256 /dev/xxx /mnt

    /mnt will have the snap1 directory for the subvolume with id
    256.

    mount -o subvol=snap /dev/xxx /mnt

    /mnt will be the snap directory of whatever the default subvolume
    is.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik