05 Jun, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits)
    btrfs: fix uninitialized variable warning
    btrfs: add helper for fs_info->closing
    Btrfs: add mount -o inode_cache
    btrfs: scrub: add explicit plugging
    btrfs: use btrfs_ino to access inode number
    Btrfs: don't save the inode cache if we are deleting this root
    btrfs: false BUG_ON when degraded
    Btrfs: don't save the inode cache in non-FS roots
    Btrfs: make sure we don't overflow the free space cache crc page
    Btrfs: fix uninit variable in the delayed inode code
    btrfs: scrub: don't reuse bios and pages
    Btrfs: leave spinning on lookup and map the leaf
    Btrfs: check for duplicate entries in the free space cache
    Btrfs: don't try to allocate from a block group that doesn't have enough space
    Btrfs: don't always do readahead
    Btrfs: try not to sleep as much when doing slow caching
    Btrfs: kill BTRFS_I(inode)->block_group
    Btrfs: don't look at the extent buffer level 3 times in a row
    Btrfs: map the node block when looking for readahead targets
    Btrfs: set range_start to the right start in count_range_bits
    ...

    Linus Torvalds
     

04 Jun, 2011

13 commits

  • With Linus' tree, today's linux-next build (powercp ppc64_defconfig)
    produced this warning:

    fs/btrfs/delayed-inode.c: In function 'btrfs_delayed_update_inode':
    fs/btrfs/delayed-inode.c:1598:6: warning: 'ret' may be used
    uninitialized in this function

    Introduced by commit 16cdcec736cd ("btrfs: implement delayed inode items
    operation").

    This fixes a bug in btrfs_update_inode(): if the returned value from
    btrfs_delayed_update_inode is a nonzero garbage, inode stat data are not
    updated and several call paths may hit a BUG_ON or fail with strange
    code.

    Reported-by: Stephen Rothwell
    Signed-off-by: David Sterba

    David Sterba
     
  • wrap checking of filesystem 'closing' flag and fix a few missing memory
    barriers.

    Signed-off-by: David Sterba

    David Sterba
     
  • This makes the inode map cache default to off until we
    fix the overflow problem when the free space crcs don't fit
    inside a single page.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • With the removal of the implicit plugging scrub ends up doing more and
    smaller I/O than necessary. This patch adds explicit plugging per chunk.

    Signed-off-by: Arne Jansen
    Signed-off-by: Chris Mason

    Arne Jansen
     
  • commit 4cb5300bc ("Btrfs: add mount -o auto_defrag") accesses inode
    number directly while it should use the helper with the new inode
    number allocator.

    Signed-off-by: David Sterba
    Signed-off-by: Chris Mason

    David Sterba
     
  • With xfstest 254 I can panic the box every time with the inode number caching
    stuff on. This is because we clean the inodes out when we delete the subvolume,
    but then we write out the inode cache which adds an inode to the subvolume inode
    tree, and then when it gets evicted again the root gets added back on the dead
    roots list and is deleted again, so we have a double free. To stop this from
    happening just return 0 if refs is 0 (and we're not the tree root since tree
    root always has refs of 0). With this fix 254 no longer panics. Thanks,

    Signed-off-by: Josef Bacik
    Tested-by: David Sterba
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • In degraded mode the struct btrfs_device of missing devs don't have
    device->name set. A kstrdup of NULL correctly returns NULL. Don't
    BUG in this case.

    Signed-off-by: Arne Jansen
    Signed-off-by: Chris Mason

    Arne Jansen
     
  • This adds extra checks to make sure the inode map we are caching really
    belongs to a FS root instead of a special relocation tree. It
    prevents crashes during balancing operations.

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    liubo
     
  • The free space cache uses only one page for crcs right now,
    which means we can't have a cache file bigger than the
    crcs we can fit in the first page. This adds a check to
    enforce that restriction.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The nitems counter needs to start at zero

    Signed-off-by: Chris Mason

    Chris Mason
     
  • The current scrub implementation reuses bios and pages as often as possible,
    allocating them only on start and releasing them when finished. This leads
    to more problems with the block layer than it's worth. The elevator gets
    confused when there are more pages added to the bio than bi_size suggests.
    This patch completely rips out the reuse of bios and pages and allocates
    them freshly for each submit.

    Signed-off-by: Arne Jansen
    Signed-off-by: Chris Maosn

    Arne Jansen
     
  • * 'for-linus' of git://git.kernel.dk/linux-block:
    block: Use hlist_entry() for io_context.cic_list.first
    cfq-iosched: Remove bogus check in queue_fail path
    xen/blkback: potential null dereference in error handling
    xen/blkback: don't call vbd_size() if bd_disk is NULL
    block: blkdev_get() should access ->bd_disk only after success
    CFQ: Fix typo and remove unnecessary semicolon
    block: remove unwanted semicolons
    Revert "block: Remove extra discard_alignment from hd_struct."
    nbd: adjust 'max_part' according to part_shift
    nbd: limit module parameters to a sane value
    nbd: pass MSG_* flags to kernel_recvmsg()
    block: improve the bio_add_page() and bio_add_pc_page() descriptions

    Linus Torvalds
     
  • * 'linux-next' of git://git.infradead.org/ubifs-2.6:
    UBIFS: fix-up free space earlier
    UBIFS: intialize LPT earlier
    UBIFS: assert no fixup when writing a node
    UBIFS: fix clean znode counter corruption in error cases
    UBIFS: fix memory leak on error path
    UBIFS: fix shrinker object count reports
    UBIFS: fix recovery broken by the previous recovery fix
    UBIFS: amend ubifs_recover_leb interface
    UBIFS: introduce a "grouped" journal head flag
    UBIFS: supress false error messages

    Linus Torvalds
     

03 Jun, 2011

6 commits

  • The free space fixup is currently initiated during mount after the call to
    ubifs_write_master() which results in a write to PEBs; this has been observed
    with the patch 'assert no fixup when writing a node' applied:

    Move the free space fixup on mount to before the calls to
    ubifs_recover_inl_heads() and ubifs_write_master(). This results in no
    assertions with the previously mentioned patch applied.

    Artem: tweaked the patch a bit

    Signed-off-by: Ben Gardiner
    Reviewed-by: Matthew L. Creech
    Signed-off-by: Artem Bityutskiy

    Ben Gardiner
     
  • The current 'mount_ubifs()' implementation does not initialize the LPT until the
    the master node is marked dirty. Move the LPT initialization to before marking
    the master node dirty. This is a preparation for the next patch which will move
    the free-space-fixup check to before marking the master node dirty, because we
    have to fix-up the free space before doing any writes.

    Artem: massaged the patch and commit message.

    Signed-off-by: Ben Gardiner
    Reviewed-by: Matthew L. Creech
    Signed-off-by: Artem Bityutskiy

    Ben Gardiner
     
  • The current free space fixup can result in some writing to the UBI volume
    when the space_fixup flag is set.

    To catch instances where UBIFS is writing to the NAND while the space_fixup
    flag is set, add an assert to ubifs_write_node().

    Artem: tweaked the patch, added similar assertion to the write buffer
    write path.

    Signed-off-by: Ben Gardiner
    Reviewed-by: Matthew L. Creech
    Signed-off-by: Artem Bityutskiy

    Ben Gardiner
     
  • UBIFS maintains per-filesystem and global clean znode counters
    ('c->clean_zn_cnt' and 'ubifs_clean_zn_cnt'). It is important to maintain
    correct values there since the shrinker relies on 'ubifs_clean_zn_cnt'.

    However, in case of failures during commit the counters were corrupted. E.g.,
    if a failure happens in the middle of 'write_index()', then some nodes in the
    commit list ('c->cnext') are marked as clean, and some are marked as dirty. And
    the 'ubifs_destroy_tnc_subtree()' frees does not retrun correct count, and we
    end up with non-zero 'c->clean_zn_cnt' when unmounting. This means that if we
    have 2 file-sytem and one of them fails, and we unmount it,
    'ubifs_clean_zn_cnt' stays incorrect and confuses the shrinker.

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy
     
  • UBIFS leaks memory on error path in 'ubifs_jnl_update()' in case of write
    failure because it forgets to free the 'struct ubifs_dent_node *dent' object.
    Although the object is small, the alignment can make it large - e.g., 2KiB
    if the min. I/O unit is 2KiB.

    Signed-off-by: Artem Bityutskiy
    Cc: stable@kernel.org

    Artem Bityutskiy
     
  • Sometimes VM asks the shrinker to return amount of objects it can shrink,
    and we return the ubifs_clean_zn_cnt in that case. However, it is possible
    that this counter is negative for a short period of time, due to the way
    UBIFS TNC code updates it. And I can observe the following warnings sometimes:

    shrink_slab: ubifs_shrinker+0x0/0x2b7 [ubifs] negative objects to delete nr=-8541616642706119788

    This patch makes sure UBIFS never returns negative count of objects.

    Signed-off-by: Artem Bityutskiy
    Cc: stable@kernel.org

    Artem Bityutskiy
     

01 Jun, 2011

5 commits

  • Unfortunately, the recovery fix d1606a59b6be4ea392eabd40d1250aa1eeb19efb
    (UBIFS: fix extremely rare mount failure) broke recovery. This commit make
    UBIFS drop the last min. I/O unit in all journal heads, but this is needed only
    for the GC head. And this does not work for non-GC heads. For example, if
    suppose we have min. I/O units A and B, and A contains a valid node X, which
    was fsynced, and then a group of nodes Y which spans the rest of A and B. In
    this case we'll drop not only Y, but also X, which is obviously incorrect.

    This patch fixes the issue and additionally makes recovery to drop last min.
    I/O unit only for the GC head, and leave things as they have been for ages for
    the other heads - this is safer.

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy
     
  • Instead of passing "grouped" parameter to 'ubifs_recover_leb()' which tells
    whether the nodes are grouped in the LEB to recover, pass the journal head
    number and let 'ubifs_recover_leb()' look at the journal head's 'grouped' flag.

    This patch is a preparation to a further fix where we'll need to know the
    journal head number for other purposes.

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy
     
  • Journal heads are different in a way how UBIFS writes nodes there. All normal
    journal heads receive grouped nodes, while the GC journal heads receives
    ungrouped nodes. This patch adds a 'grouped' flag to 'struct ubifs_jhead' which
    describes this property.

    This patch is a preparation to a further recovery fix.

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy
     
  • Commit ab51afe05273741f72383529ef488aa1ea598ec6 was a good clean-up, but
    it introduced a regression - now UBIFS prints scary error messages during
    recovery on all corrupted nodes, even though the corruptions are expected
    (due to a power cut). This patch fixes the issue.

    Additionally fix a typo in a commentary introduced by the same commit.

    Signed-off-by: Artem Bityutskiy

    Artem Bityutskiy
     
  • d4dc210f69 (block: don't block events on excl write for non-optical
    devices) added dereferencing of bdev->bd_disk to test
    GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE; however, bdev->bd_disk can be
    %NULL if open failed which can lead to an oops.

    Test the flag after testing open was successful, not before.

    Signed-off-by: Tejun Heo
    Reported-by: David Miller
    Tested-by: David Miller
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Tejun Heo
     

30 May, 2011

15 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • The dentry_unhash push-down series missed that shink_dcache_parent needs to
    be called prior to rmdir or dir rename to clear DCACHE_REFERENCED and
    allow efficient dentry reclaim.

    Reported-by: Dave Chinner
    Signed-off-by: Sage Weil
    Signed-off-by: Al Viro

    Sage Weil
     
  • It was not a good idea to start dereferencing disk->queue from
    the fs sysfs strategy for displaying discard alignment. We ran
    into first a NULL pointer deref, and after fixing that we sometimes
    see unvalid disk->queue pointer values.

    Since discard is the only one of the bunch actually looking into
    the queue, just revert the change.

    This reverts commit 23ceb5b7719e9276d4fa72a3ecf94dd396755276.

    Conflicts:
    fs/partitions/check.c

    Jens Axboe
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
    eCryptfs: Remove ecryptfs_header_cache_2
    eCryptfs: Cleanup and optimize ecryptfs_lookup_interpose()
    eCryptfs: Return useful code from contains_ecryptfs_marker
    eCryptfs: Fix new inode race condition
    eCryptfs: Cleanup inode initialization code
    eCryptfs: Consolidate inode functions into inode.c

    Linus Torvalds
     
  • * 'pnfs-submit' of git://git.open-osd.org/linux-open-osd: (32 commits)
    pnfs-obj: pg_test check for max_io_size
    NFSv4.1: define nfs_generic_pg_test
    NFSv4.1: use pnfs_generic_pg_test directly by layout driver
    NFSv4.1: change pg_test return type to bool
    NFSv4.1: unify pnfs_pageio_init functions
    pnfs-obj: objlayout_encode_layoutcommit implementation
    pnfs: encode_layoutcommit
    pnfs-obj: report errors and .encode_layoutreturn Implementation.
    pnfs: encode_layoutreturn
    pnfs: layoutret_on_setattr
    pnfs: layoutreturn
    pnfs-obj: osd raid engine read/write implementation
    pnfs: support for non-rpc layout drivers
    pnfs-obj: define per-inode private structure
    pnfs: alloc and free layout_hdr layoutdriver methods
    pnfs-obj: objio_osd device information retrieval and caching
    pnfs-obj: decode layout, alloc/free lseg
    pnfs-obj: pnfs_osd XDR client implementation
    pnfs-obj: pnfs_osd XDR definitions
    pnfs-obj: objlayoutdriver module skeleton
    ...

    Linus Torvalds
     
  • Now that ecryptfs_lookup_interpose() is no longer using
    ecryptfs_header_cache_2 to read in metadata, the kmem_cache can be
    removed and the ecryptfs_header_cache_1 kmem_cache can be renamed to
    ecryptfs_header_cache.

    Signed-off-by: Tyler Hicks

    Tyler Hicks
     
  • ecryptfs_lookup_interpose() has turned into spaghetti code over the
    years. This is an effort to clean it up.

    - Shorten overly descriptive variable names such as ecryptfs_dentry
    - Simplify gotos and error paths
    - Create helper function for reading plaintext i_size from metadata

    It also includes an optimization when reading i_size from the metadata.
    A complete page-sized kmem_cache_alloc() was being done to read in 16
    bytes of metadata. The buffer for that is now statically declared.

    Signed-off-by: Tyler Hicks

    Tyler Hicks
     
  • Instead of having the calling functions translate the true/false return
    code to either 0 or -EINVAL, have contains_ecryptfs_marker() return 0 or
    -EINVAL so that the calling functions can just reuse the return code.

    Also, rename the function to ecryptfs_validate_marker() to avoid callers
    mistakenly thinking that it returns true/false codes.

    Signed-off-by: Tyler Hicks

    Tyler Hicks
     
  • Only unlock and d_add() new inodes after the plaintext inode size has
    been read from the lower filesystem. This fixes a race condition that
    was sometimes seen during a multi-job kernel build in an eCryptfs mount.

    https://bugzilla.kernel.org/show_bug.cgi?id=36002

    Signed-off-by: Tyler Hicks
    Reported-by: David
    Tested-by: David

    Tyler Hicks
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
    arch/tile: more /proc and /sys file support

    Linus Torvalds
     
  • * 'for-2.6.40' of git://linux-nfs.org/~bfields/linux: (22 commits)
    nfsd: make local functions static
    NFSD: Remove unused variable from nfsd4_decode_bind_conn_to_session()
    NFSD: Check status from nfsd4_map_bcts_dir()
    NFSD: Remove setting unused variable in nfsd_vfs_read()
    nfsd41: error out on repeated RECLAIM_COMPLETE
    nfsd41: compare request's opcnt with session's maxops at nfsd4_sequence
    nfsd v4.1 lOCKT clientid field must be ignored
    nfsd41: add flag checking for create_session
    nfsd41: make sure nfs server process OPEN with EXCLUSIVE4_1 correctly
    nfsd4: fix wrongsec handling for PUTFH + op cases
    nfsd4: make fh_verify responsibility of nfsd_lookup_dentry caller
    nfsd4: introduce OPDESC helper
    nfsd4: allow fh_verify caller to skip pseudoflavor checks
    nfsd: distinguish functions of NFSD_MAY_* flags
    svcrpc: complete svsk processing on cb receive failure
    svcrpc: take advantage of tcp autotuning
    SUNRPC: Don't wait for full record to receive tcp data
    svcrpc: copy cb reply instead of pages
    svcrpc: close connection if client sends short packet
    svcrpc: note network-order types in svc_process_calldir
    ...

    Linus Torvalds
     
  • * 'nfs-for-2.6.40' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
    SUNRPC: Support for RPC over AF_LOCAL transports
    SUNRPC: Remove obsolete comment
    SUNRPC: Use AF_LOCAL for rpcbind upcalls
    SUNRPC: Clean up use of curly braces in switch cases
    NFS: Revert NFSROOT default mount options
    SUNRPC: Rename xs_encode_tcp_fragment_header()
    nfs,rcu: convert call_rcu(nfs_free_delegation_callback) to kfree_rcu()
    nfs41: Correct offset for LAYOUTCOMMIT
    NFS: nfs_update_inode: print current and new inode size in debug output
    NFSv4.1: Fix the handling of NFS4ERR_SEQ_MISORDERED errors
    NFSv4: Handle expired stateids when the lease is still valid
    SUNRPC: Deal with the lack of a SYN_SENT sk->sk_state_change callback...

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
    Squashfs: Fix sanity check patches on big-endian systems

    Linus Torvalds
     
  • Commit 1495f230fa77 ("vmscan: change shrinker API by passing
    shrink_control struct") changed the API of ->shrink(), but missed ubifs
    and cifs instances.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • Implement pg_test vector to test for max IO sizes. We calculate
    a max_io_size member only once, and cache it in lseg so to not
    do so on every page insert.

    Signed-off-by: Boaz Harrosh
    [simplify logic]
    Signed-off-by: Benny Halevy

    Boaz Harrosh